TallVocabL2Fi: A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary

dc.contributor.authorRobertson Frankie
dc.contributor.authorChang Li-Hsin
dc.contributor.authorSöyrinki Sini
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id177263675
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/177263675
dc.date.accessioned2022-12-15T03:31:43Z
dc.date.available2022-12-15T03:31:43Z
dc.description.abstract<p>Previous work concerning measurement of second language learners has tended to focus on the knowledge of small numbers of words, often geared towards measuring vocabulary size. This paper presents a “tall” dataset containing information about a few learners’ knowledge of many words, suitable for evaluating Vocabulary Inventory Prediction (VIP) techniques, including those based on Computerised Adaptive Testing (CAT). In comparison to previous comparable datasets, the learners are from varied backgrounds, so as to reduce the risk of overfitting when used for machine learning based VIP. The dataset contains both a self-rating test and a translation test, used to derive a measure of reliability for learner responses. The dataset creation process is documented, and the relationship between variables concerning the participants, such as their completion time, their language ability level, and the triangulated reliability of their self-assessment responses, are analysed. The word list is constructed by taking into account the extensive derivation morphology of Finnish, and infrequent words are included in order to account for explanatory variables beyond word frequency.</p>
dc.format.pagerange6377
dc.format.pagerange6386
dc.identifier.isbn979-10-95546-72-6
dc.identifier.jour-issn2522-2686
dc.identifier.olddbid190612
dc.identifier.oldhandle10024/173703
dc.identifier.urihttps://www.utupub.fi/handle/11111/30482
dc.identifier.urlhttps://aclanthology.org/2022.lrec-1.685/
dc.identifier.urnURN:NBN:fi-fe2022121571605
dc.language.isoen
dc.okm.affiliatedauthorChang, Li-Hsin
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryFranceen_GB
dc.publisher.countryRanskafi_FI
dc.publisher.country-codeFR
dc.relation.conferenceInternational Conference on Language Resources and Evaluation
dc.relation.ispartofjournalLREC Proceedings
dc.relation.ispartofseriesLREC Proceedings
dc.source.identifierhttps://www.utupub.fi/handle/10024/173703
dc.titleTallVocabL2Fi: A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary
dc.title.bookProceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
2022.lrec-1.685.pdf
Size:
543.93 KB
Format:
Adobe Portable Document Format