Fine-grained Named Entity Annotation for Finnish

dc.contributor.authorLuoma Jouni
dc.contributor.authorChang Li-Hsin
dc.contributor.authorGinter Filip
dc.contributor.authorPyysalo Sampo
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.contributor.organization-code2610301
dc.converis.publication-id56909867
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/56909867
dc.date.accessioned2022-10-28T13:57:19Z
dc.date.available2022-10-28T13:57:19Z
dc.description.abstract<p>We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages. We combine and extend two NER corpora recently introduced for Finnish and revise their custom annotation scheme through a combination of automatic and manual processing steps. The resulting corpus consists of nearly 500,000 tokens annotated for over 50,000 mentions categorized into the 18 OntoNotes name and numeric entity types. We evaluate this resource and demonstrate its compatibility with the English OntoNotes annotations by training state-of-the-art mono-, bi- and multilingual deep learning models, finding both that the corpus allows highly accurate recognition of OntoNotes types at 93\% F-score and that a comparable level of tagging accuracy can be achieved by a bilingual Finnish-English NER model.<br /></p>
dc.format.pagerange135
dc.format.pagerange144
dc.identifier.isbn978-91-7929-614-8
dc.identifier.issn1650-3686
dc.identifier.jour-issn1650-3686
dc.identifier.olddbid185409
dc.identifier.oldhandle10024/168503
dc.identifier.urihttps://www.utupub.fi/handle/11111/42144
dc.identifier.urlhttps://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=14
dc.identifier.urnURN:NBN:fi-fe2021093048859
dc.language.isoen
dc.okm.affiliatedauthorLuoma, Jouni
dc.okm.affiliatedauthorChang, Li-Hsin
dc.okm.affiliatedauthorGinter, Filip
dc.okm.affiliatedauthorPyysalo, Sampo
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countrySwedenen_GB
dc.publisher.countryRuotsifi_FI
dc.publisher.country-codeSE
dc.relation.conferenceNordic Conference on Computational Linguistics
dc.relation.ispartofjournalLinköping Electronic Conference Proceedings
dc.relation.ispartofseriesLinköping Electronic Conference Proceedings
dc.relation.volume178
dc.source.identifierhttps://www.utupub.fi/handle/10024/168503
dc.titleFine-grained Named Entity Annotation for Finnish
dc.title.bookProceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
dc.year.issued2021

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
ecp2021178014.pdf
Size:
226.8 KB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF