Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

dc.contributor.authorBassignana Elisa
dc.contributor.authorGinter Filip
dc.contributor.authorPyysalo Sampo
dc.contributor.authorRob van der Goot
dc.contributor.authorPlank Barbara
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id380758650
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/380758650
dc.date.accessioned2025-08-28T02:54:21Z
dc.date.available2025-08-28T02:54:21Z
dc.description.abstract<p>Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose MULTI-CROSSRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. MULTICROSSRE is a machine translated version of CrossRE (Bassignana and Plank, 2022a), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and—as sanity check—over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.</p>
dc.format.pagerange80
dc.format.pagerange85
dc.identifier.isbn978-99-1621-999-7
dc.identifier.issn1736-8197
dc.identifier.jour-issn1736-8197
dc.identifier.olddbid209904
dc.identifier.oldhandle10024/192931
dc.identifier.urihttps://www.utupub.fi/handle/11111/49770
dc.identifier.urlhttps://aclanthology.org/2023.nodalida-1.9
dc.identifier.urnURN:NBN:fi-fe2025082792536
dc.language.isoen
dc.okm.affiliatedauthorGinter, Filip
dc.okm.affiliatedauthorPyysalo, Sampo
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryEstoniaen_GB
dc.publisher.countryVirofi_FI
dc.publisher.country-codeEE
dc.relation.conferenceNordic Conference on Computational Linguistics
dc.relation.ispartofjournalNEALT proceedings series
dc.relation.ispartofseriesNEALT proceedings series
dc.relation.volume52
dc.source.identifierhttps://www.utupub.fi/handle/10024/192931
dc.titleMulti-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction
dc.title.bookProceedings of The 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
dc.year.issued2023

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
2023.nodalida-1.9.pdf
Size:
474.73 KB
Format:
Adobe Portable Document Format