Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4

dc.contributor.authorTarkka, Otto
dc.contributor.authorKoljonen, Jaakko
dc.contributor.authorKorhonen, Markus
dc.contributor.authorLaine, Juuso
dc.contributor.authorMartiskainen, Kristian
dc.contributor.authorElo, Kimmo
dc.contributor.authorLaippala, Veronika
dc.contributor.organizationfi=digitaalinen kielentutkimus, espanja, italia, kiina, ranska, saksa|en=Digital Language Studies, Chinese, French, German, Italian, Spanish|
dc.contributor.organizationfi=eduskuntatutkimuksen keskus|en=Centre for Parliamentary Studies|
dc.contributor.organization-code1.2.246.10.2458963.20.36764574459
dc.contributor.organization-code1.2.246.10.2458963.20.38771386471
dc.converis.publication-id457172276
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/457172276
dc.date.accessioned2025-08-28T02:46:49Z
dc.date.available2025-08-28T02:46:49Z
dc.description.abstract<p>Annotating datasets can often be prohibitively expensive and laborious. Emotion annotation specifically has been shown to be a difficult task in which even trained annotators rarely reach high agreement. With the introduction of ChatGPT, GPT-4 and other Large Language Models (LLMs), however, a new line of research has emerged that explores the possibilities of automated data annotation. In this paper, we apply GPT-4 to the task of annotating a dataset, which is subsequently used to train a BERT model for emotion analysis of Finnish parliamentary speeches. In our experiment, GPT-4 performs on par with trained annotators and the annotations it produces can be used to train a classifier that reaches micro F1 of 0.690. We compare this model to two other models that are trained on machine translated datasets and find that the model trained on GPT-4 annotated data outperforms them. Our paper offers new insight into the possibilities that LLMs have to offer for the analysis of parliamentary corpora.</p>
dc.format.pagerange70
dc.format.pagerange76
dc.identifier.eisbn978-2-493814-24-1
dc.identifier.jour-issn2522-2686
dc.identifier.olddbid209684
dc.identifier.oldhandle10024/192711
dc.identifier.urihttps://www.utupub.fi/handle/11111/49286
dc.identifier.urlhttps://aclanthology.org/2024.parlaclarin-1.11.pdf
dc.identifier.urnURN:NBN:fi-fe2025082792453
dc.language.isoen
dc.okm.affiliatedauthorTarkka, Otto
dc.okm.affiliatedauthorKoljonen, Jaakko
dc.okm.affiliatedauthorKorhonen, Markus
dc.okm.affiliatedauthorLaine, Juuso
dc.okm.affiliatedauthorMartiskainen, Kristian
dc.okm.affiliatedauthorElo, Kimmo
dc.okm.affiliatedauthorLaippala, Veronika
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline6121 Languagesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline6121 Kielitieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryFranceen_GB
dc.publisher.countryRanskafi_FI
dc.publisher.country-codeFR
dc.relation.conferenceParlaCLARIN Workshop
dc.relation.ispartofjournalLREC Proceedings
dc.source.identifierhttps://www.utupub.fi/handle/10024/192711
dc.titleAutomated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4
dc.title.bookProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) : ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora
dc.year.issued2024

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
2024.parlaclarin-1.11_CC-BY-NC.pdf
Size:
238.64 KB
Format:
Adobe Portable Document Format