Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771–1910
Pysyvä osoite
Verkkojulkaisu
DOI
Tiivistelmä
We present the results of text reuse de- tection, based on the corpus of scanned and OCR-recognized Finnish newspapers and journals from 1771 to 1910. Our study draws on BLAST, a software cre- ated for comparing and aligning biologi- cal sequences. We show different types of text reuse in this corpus, and also present a comparison to the software Passim, de- veloped at the Northeastern University in Boston, for text reuse detection.
Sarja
NEALT Proceedings Series