Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771–1910

Lataukset84

Verkkojulkaisu

DOI

Tiivistelmä

We present the results of text reuse de- tection, based on the corpus of scanned and OCR-recognized Finnish newspapers and journals from 1771 to 1910. Our study draws on BLAST, a software cre- ated for comparing and aligning biologi- cal sequences. We show different types of text reuse in this corpus, and also present a comparison to the software Passim, de- veloped at the Northeastern University in Boston, for text reuse detection. 

Sarja

NEALT Proceedings Series

item.page.okmtext