An Approach to the Frugal Use of Human Annotators to Scale up Auto-coding for Text Classification Tasks

dc.contributor.authorChen Li'An
dc.contributor.authorSuominen Hanna
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.converis.publication-id178631603
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/178631603
dc.date.accessioned2025-08-28T01:22:42Z
dc.date.available2025-08-28T01:22:42Z
dc.description.abstract<p>Human annotation for establishing the training data is often a very costly process in natural language processing (NLP) tasks, which has led to frugal NLP approaches becoming an important research topic. Many research teams struggle to complete projects with limited funding, labor, and computational resources. Driven by the Move-Step analytic framework theorized in the applied linguistics field, our study offers a rigorous approach to the frugal use of two human annotators to scale up autocoding for text classification tasks. We applied the Linear Support Vector Machine algorithm to text classification of a job ad corpus. Our Cohen’s Kappa for inter-rater agreement and Area Under the Curve (AUC) values reached averages of 0.76 and 0.80, respectively. The calculated time consumption for our human training process was 36 days. The results indicated that even the strategic and frugal use of only two human annotators could enable the efficient training of classifiers with reasonably good performance. This study does not aim to provide generalizability of the results. Rather, it is proposed that the annotation strategies arising from this study be considered by our readers only if they are fit for one’s specific research purposes.</p>
dc.format.pagerange12
dc.format.pagerange21
dc.identifier.jour-issn1834-7037
dc.identifier.olddbid207464
dc.identifier.oldhandle10024/190491
dc.identifier.urihttps://www.utupub.fi/handle/11111/51415
dc.identifier.urlhttps://aclanthology.org/2021.alta-1.2/
dc.identifier.urnURN:NBN:fi-fe2023022128008
dc.language.isoen
dc.okm.affiliatedauthorSuominen, Hanna
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryAustraliaen_GB
dc.publisher.countryAustraliafi_FI
dc.publisher.country-codeAU
dc.relation.conferenceAustralasian Language Technology Association Workshop
dc.relation.ispartofjournalProceedings of the Australasian Language Technology Workshop
dc.relation.ispartofseriesProceedings of the australasian language technology workshop
dc.source.identifierhttps://www.utupub.fi/handle/10024/190491
dc.titleAn Approach to the Frugal Use of Human Annotators to Scale up Auto-coding for Text Classification Tasks
dc.title.bookProceedings of the 19th Workshop of the Australasian Language Technology Association
dc.year.issued2021

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
LiAnChen_HannaSuominen_2021_Approach_to_the_Frugal_Use_of_Human.pdf
Size:
442.44 KB
Format:
Adobe Portable Document Format