Finnish Paraphrase Corpus
Skantsi Valtteri; Ginter Filip; Kupari Hanna-Mari; Saarni Jenna; Chang Li-Hsin; Kanerva Jenna; Tarkka Otto; Rastas Iiro; Kilpeläinen Jemina; Sevón Maija
Finnish Paraphrase Corpus
Skantsi Valtteri
Ginter Filip
Kupari Hanna-Mari
Saarni Jenna
Chang Li-Hsin
Kanerva Jenna
Tarkka Otto
Rastas Iiro
Kilpeläinen Jemina
Sevón Maija
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021093048687
https://urn.fi/URN:NBN:fi-fe2021093048687
Tiivistelmä
In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.
Kokoelmat
- Rinnakkaistallenteet [19207]