The Authorship Attribution Project

The ITPS leads a multi-disciplinary project with the Computer Science, English, and History departments, which develops novel methodologies for (semi) automated software-based identification of the creator(s) of historical documents, whose authorship is either unknown or disputed. The project uses advanced natural language processing and machine learning techniques to identify and learn the writing styles of known eighteenth century authors. It then compares the style of the writer of an unattributed document to the known authors’ styles, identifying a potential match. The project has clarified much of the Paine Canon, and contributed numerous new works to it, thereby adding to the field of computer author attribution methodology. This project recently began widening its scope beyond Thomas Paine in order to pursue a wider corpus of writers in the late eighteenth century, especially involving newspaper publication in the 1790s.

