Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations

Henriksson, Erik; Myntti, Amanda; Hellström, Saara; Erten-Johansson, Selcen; Eskelinen, Anni; Repo, Liina; Laippala, Veronika

From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations

Henriksson, Erik
Myntti, Amanda
Hellström, Saara
Erten-Johansson, Selcen
Eskelinen, Anni
Repo, Liina
Laippala, Veronika
Katso/Avaa
2024.nlp4dh-1.30.pdf (729.8Kb)
Lataukset: 

doi:10.18653/v1/2024.nlp4dh-1.30
URI
https://doi.org/10.18653/v1/2024.nlp4dh-1.30
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025082786567
Tiivistelmä

In corpus linguistics, registers–language varieties suited to different contexts–have traditionally been defined by their situations of use, yet recent studies reveal significant situational variation within registers. Previous quantitative studies, however, have been limited to English, leaving this variation in other languages largely unexplored. To address this gap, we apply a quantitative situational analysis to a large multilingual web register corpus, using large language models (LLMs) to annotate texts in English, Finnish, French, Swedish, and Turkish for 23 situational parameters. Using clustering techniques, we identify six situational text types, such as “Advice”, “Opinion” and “Marketing”, each characterized by distinct situational features. We explore the relationship between these text types and traditional register categories, finding partial alignment, though no register maps perfectly onto a single cluster. These results support the quantitative approach to situational analysis and are consistent with earlier findings for English. Cross-linguistic comparisons show that language accounts for only a small part of situational variation within registers, suggesting registers are situationally similar across languages. This study demonstrates the utility of LLMs in multilingual register analysis and deepens our understanding of situational variation within registers.

Kokoelmat
  • Rinnakkaistallenteet [27094]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste