Analyzing the unrestricted web: The finnish corpus of online registers

dc.contributor.authorSkantsi Valtteri
dc.contributor.authorLaippala Veronika
dc.contributor.organizationfi=kieli- ja käännöstieteiden laitos|en=School of Languages and Translation Studies|
dc.contributor.organization-code1.2.246.10.2458963.20.56461112866
dc.converis.publication-id179300015
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/179300015
dc.date.accessioned2025-08-28T02:48:59Z
dc.date.available2025-08-28T02:48:59Z
dc.description.abstractThis article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers - situationally defined text varieties such as news and blogs - on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.
dc.identifier.eissn1502-4717
dc.identifier.jour-issn0332-5865
dc.identifier.olddbid209754
dc.identifier.oldhandle10024/192781
dc.identifier.urihttps://www.utupub.fi/handle/11111/49414
dc.identifier.urlhttps://doi.org/10.1017/S0332586523000021
dc.identifier.urnURN:NBN:fi-fe2023042538614
dc.language.isoen
dc.okm.affiliatedauthorSkantsi, Valtteri
dc.okm.affiliatedauthorLaippala, Veronika
dc.okm.discipline6121 Languagesen_GB
dc.okm.discipline6121 Kielitieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherCAMBRIDGE UNIV PRESS
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumberPII S0332586523000021
dc.relation.doi10.1017/S0332586523000021
dc.relation.ispartofjournalNordic Journal of Linguistics
dc.source.identifierhttps://www.utupub.fi/handle/10024/192781
dc.titleAnalyzing the unrestricted web: The finnish corpus of online registers
dc.year.issued2023

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
analyzing-the-unrestricted-web-the-finnish-corpus-of-online-registers.pdf
Size:
976.09 KB
Format:
Adobe Portable Document Format