Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction
| dc.contributor.author | Bassignana Elisa | |
| dc.contributor.author | Ginter Filip | |
| dc.contributor.author | Pyysalo Sampo | |
| dc.contributor.author | Rob van der Goot | |
| dc.contributor.author | Plank Barbara | |
| dc.contributor.organization | fi=data-analytiikka|en=Data-analytiikka| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.68940835793 | |
| dc.converis.publication-id | 380758650 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/380758650 | |
| dc.date.accessioned | 2025-08-28T02:54:21Z | |
| dc.date.available | 2025-08-28T02:54:21Z | |
| dc.description.abstract | <p>Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose MULTI-CROSSRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. MULTICROSSRE is a machine translated version of CrossRE (Bassignana and Plank, 2022a), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and—as sanity check—over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.</p> | |
| dc.format.pagerange | 80 | |
| dc.format.pagerange | 85 | |
| dc.identifier.isbn | 978-99-1621-999-7 | |
| dc.identifier.issn | 1736-8197 | |
| dc.identifier.jour-issn | 1736-8197 | |
| dc.identifier.olddbid | 209904 | |
| dc.identifier.oldhandle | 10024/192931 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/49770 | |
| dc.identifier.url | https://aclanthology.org/2023.nodalida-1.9 | |
| dc.identifier.urn | URN:NBN:fi-fe2025082792536 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Ginter, Filip | |
| dc.okm.affiliatedauthor | Pyysalo, Sampo | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.internationalcopublication | international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A4 Conference Article | |
| dc.publisher.country | Estonia | en_GB |
| dc.publisher.country | Viro | fi_FI |
| dc.publisher.country-code | EE | |
| dc.relation.conference | Nordic Conference on Computational Linguistics | |
| dc.relation.ispartofjournal | NEALT proceedings series | |
| dc.relation.ispartofseries | NEALT proceedings series | |
| dc.relation.volume | 52 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/192931 | |
| dc.title | Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction | |
| dc.title.book | Proceedings of The 24th Nordic Conference on Computational Linguistics (NoDaLiDa) | |
| dc.year.issued | 2023 |
Tiedostot
1 - 1 / 1