Dependency profiles as a tool for big data analysis of linguistic constructions: A case study of emoticons

dc.contributor.authorLaippala V.
dc.contributor.authorKyröläinen A.
dc.contributor.authorKanerva J.
dc.contributor.authorLuotolahti J.
dc.contributor.authorGinter F.
dc.contributor.organizationfi=digitaalinen kielentutkimus, espanja, italia, kiina, ranska, saksa|en=Digital Language Studies, Chinese, French, German, Italian, Spanish|
dc.contributor.organizationfi=kotimaiset kielet ja niiden sukukielet|en=Finnish, Finno-Ugric and Scandinavian languages|
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.36764574459
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.contributor.organization-code2606803
dc.converis.publication-id27582771
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/27582771
dc.date.accessioned2025-08-28T02:40:18Z
dc.date.available2025-08-28T02:40:18Z
dc.description.abstract<p>This study presents a methodological toolbox for big data analysis of linguistic constructions by introducing dependency profiles, i.e., co-occurrences of linguistic elements with syntax information. These were operationalized by reconstructing sentences as delexicalized syntactic biarcs, subtrees of dependency analyses. As a case study, we utilize these dependency profiles to explore usage patterns associated with emoticons, the graphic representations of facial expressions. These are said to be characteristic of Computer-Mediated Communication, but typically studied only in restricted corpora. To analyze the 3.7-billion token Finnish Internet Parsebank we use as data, we apply clustering and support vector machines. The results show that emoticons are associated with three typical usage patterns: stream of the writer’s consciousness, narrative constructions and elements guiding the interaction and expressing the writer’s reactions by means of interjections and discourse particles. Additionally, the more frequent emoticons, such as :), are used differently than the less frequent ones, such as ^_^.<br /></p>
dc.format.pagerange127
dc.format.pagerange153
dc.identifier.eissn1736-8987
dc.identifier.jour-issn1736-8987
dc.identifier.olddbid209493
dc.identifier.oldhandle10024/192520
dc.identifier.urihttps://www.utupub.fi/handle/11111/46262
dc.identifier.urnURN:NBN:fi-fe2021042717516
dc.language.isoen
dc.okm.affiliatedauthorLaippala, Veronika
dc.okm.affiliatedauthorKyröläinen, Aki
dc.okm.affiliatedauthorKanerva, Jenna
dc.okm.affiliatedauthorDataimport, Suomen kieli ja suom-ugrilainen kielent
dc.okm.affiliatedauthorGinter, Filip
dc.okm.discipline6121 Languagesen_GB
dc.okm.discipline6121 Kielitieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherUniversity of Tartu Press
dc.publisher.countryEstoniaen_GB
dc.publisher.countryVirofi_FI
dc.publisher.country-codeEE
dc.relation.doi10.12697/jeful.2017.8.2.05
dc.relation.ispartofjournalEesti ja soome-ugri keeleteaduse ajakiri
dc.relation.issue2
dc.relation.volume8
dc.source.identifierhttps://www.utupub.fi/handle/10024/192520
dc.titleDependency profiles as a tool for big data analysis of linguistic constructions: A case study of emoticons
dc.year.issued2017

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Esuka_161-326-1-SM.pdf
Size:
659.47 KB
Format:
Adobe Portable Document Format