Finnish Paraphrase Corpus

Publisher's PDF
ecp2021178029.pdf - 260.02 KB
Lataukset27

Verkkojulkaisu

DOI

Tiivistelmä

In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.

Sarja

Linköping Electronic Conference Proceedings

item.page.okmtext