Corpus Linguistics and Eighteenth Century Collections Online (ECCO)

dc.contributor.authorTolonen Mikko
dc.contributor.authorMäkelä Eetu
dc.contributor.authorIjaz Ali
dc.contributor.authorLahti Leo
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=väestötutkimuskeskus|en=Centre for Population Health Research (POP Centre)|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.contributor.organization-code2607008
dc.converis.publication-id66578597
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/66578597
dc.date.accessioned2022-10-28T14:08:49Z
dc.date.available2022-10-28T14:08:49Z
dc.description.abstract<p>Eighteenth Century Collections Online (ECCO) is the most comprehensive dataset available in machine-readable form for eighteenth-century printed texts. It plays a crucial role in studies of eighteenth-century language and it has vast potential for corpus linguistics. At the same time, it is an unbalanced corpus that poses a series of different problems. The aim of this paper is to offer a general overview of ECCO for corpus linguistics by analysing, for example, its publication countries and languages. We will also analyse the role of the substantial number of reprints and new editions in the data, discuss genres and the estimates of Optical Character Recognition (OCR) quality. Our conclusion is that whereas ECCO provides a valuable source for corpus linguistics, scholars need to pay attention to historical source criticism. We have highlighted key aspects that need to be taken into consideration when considering its possible uses.<br></p>
dc.format.pagerange19
dc.format.pagerange34
dc.identifier.eissn2243-4712
dc.identifier.jour-issn2243-4712
dc.identifier.olddbid186545
dc.identifier.oldhandle10024/169639
dc.identifier.urihttps://www.utupub.fi/handle/11111/38863
dc.identifier.urlhttps://ricl.aelinco.es/index.php/ricl/article/view/161
dc.identifier.urnURN:NBN:fi-fe2021093048945
dc.language.isoen
dc.okm.affiliatedauthorLahti, Leo
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherAsociacion Espanola de Linguistica de Corpus
dc.publisher.countrySpainen_GB
dc.publisher.countryEspanjafi_FI
dc.publisher.country-codeES
dc.relation.doi10.32714/ricl.09.01.03
dc.relation.ispartofjournalResearch in Corpus Linguistics
dc.relation.issue1
dc.relation.volume9
dc.source.identifierhttps://www.utupub.fi/handle/10024/169639
dc.titleCorpus Linguistics and Eighteenth Century Collections Online (ECCO)
dc.year.issued2021

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
161-Article Text-1107-1-10-20210427.pdf
Size:
2.43 MB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF