Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model

dc.contributor.authorRastas Iiro
dc.contributor.authorRyan Yann
dc.contributor.authorTiihonen Iiro
dc.contributor.authorQaraei Mohammedreza
dc.contributor.authorRepo Liina
dc.contributor.authorBabbar Rohit
dc.contributor.authorMäkelä Eetu
dc.contributor.authorTolonen Mikko
dc.contributor.authorGinter Filip
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=kieli- ja käännöstieteiden laitos|en=School of Languages and Translation Studies|
dc.contributor.organization-code1.2.246.10.2458963.20.56461112866
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id176709131
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/176709131
dc.date.accessioned2022-11-29T14:57:15Z
dc.date.available2022-11-29T14:57:15Z
dc.description.abstractIn this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.
dc.format.pagerange68
dc.format.pagerange77
dc.identifier.isbn978-1-955917-42-1
dc.identifier.olddbid190049
dc.identifier.oldhandle10024/173140
dc.identifier.urihttps://www.utupub.fi/handle/11111/31474
dc.identifier.urlhttps://aclanthology.org/2022.lchange-1.7.pdf
dc.identifier.urnURN:NBN:fi-fe2022110164053
dc.language.isoen
dc.okm.affiliatedauthorRastas, Iiro
dc.okm.affiliatedauthorRepo, Liina
dc.okm.affiliatedauthorGinter, Filip
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.conferenceWorkshop on Computational Approaches to Historical Language Change
dc.source.identifierhttps://www.utupub.fi/handle/10024/173140
dc.titleExplainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model
dc.title.bookProceedings of the 3rd Workshop on Computational Approaches to Historical Language Change
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
2022_Rastas_Explainable_Publ_ACL.pdf
Size:
410.27 KB
Format:
Adobe Portable Document Format