Finnish Paraphrase Corpus

dc.contributor.authorKanerva Jenna
dc.contributor.authorGinter Filip
dc.contributor.authorChang Li-Hsin
dc.contributor.authorRastas Iiro
dc.contributor.authorSkantsi Valtteri
dc.contributor.authorKilpeläinen Jemina
dc.contributor.authorKupari Hanna-Mari
dc.contributor.authorSaarni Jenna
dc.contributor.authorSevón Maija
dc.contributor.authorTarkka Otto
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.contributor.organization-code2610301
dc.converis.publication-id53727016
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/53727016
dc.date.accessioned2025-08-28T02:08:14Z
dc.date.available2025-08-28T02:08:14Z
dc.description.abstract<p>In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.</p>
dc.format.pagerange288
dc.format.pagerange298
dc.identifier.isbn978-91-7929-614-8
dc.identifier.issn1650-3686
dc.identifier.jour-issn1650-3686
dc.identifier.olddbid208638
dc.identifier.oldhandle10024/191665
dc.identifier.urihttps://www.utupub.fi/handle/11111/58156
dc.identifier.urlhttps://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29
dc.identifier.urnURN:NBN:fi-fe2021093048687
dc.language.isoen
dc.okm.affiliatedauthorKanerva, Jenna
dc.okm.affiliatedauthorGinter, Filip
dc.okm.affiliatedauthorChang, Li-Hsin
dc.okm.affiliatedauthorRastas, Iiro
dc.okm.affiliatedauthorSkantsi, Valtteri
dc.okm.affiliatedauthorKilpeläinen, Jemina
dc.okm.affiliatedauthorKupari, Hanna-Mari
dc.okm.affiliatedauthorSaarni, Jenna
dc.okm.affiliatedauthorSevon, Maija
dc.okm.affiliatedauthorTarkka, Otto
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countrySwedenen_GB
dc.publisher.countryRuotsifi_FI
dc.publisher.country-codeSE
dc.relation.conferenceNordic Conference on Computational Linguistics
dc.relation.ispartofjournalLinköping Electronic Conference Proceedings
dc.relation.ispartofseriesLinköping Electronic Conference Proceedings
dc.relation.volume178
dc.source.identifierhttps://www.utupub.fi/handle/10024/191665
dc.titleFinnish Paraphrase Corpus
dc.title.bookProceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
dc.year.issued2021

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
ecp2021178029.pdf
Size:
260.02 KB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF