Finnish Paraphrase Corpus
Kanerva Jenna; Ginter Filip; Chang Li-Hsin; Rastas Iiro; Skantsi Valtteri; Kilpeläinen Jemina; Kupari Hanna-Mari; Saarni Jenna; Sevón Maija; Tarkka Otto
Finnish Paraphrase Corpus
Kanerva Jenna
Ginter Filip
Chang Li-Hsin
Rastas Iiro
Skantsi Valtteri
Kilpeläinen Jemina
Kupari Hanna-Mari
Saarni Jenna
Sevón Maija
Tarkka Otto
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021093048687
https://urn.fi/URN:NBN:fi-fe2021093048687
Tiivistelmä
In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.
Kokoelmat
- Rinnakkaistallenteet [27094]