Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Hakala, Tero; Lindh-Knuutila, Tiina; Hulten, Annika; Lehtonen, Minna; Salmelin, Riitta

Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

dc.contributor.author	Hakala, Tero
dc.contributor.author	Lindh-Knuutila, Tiina
dc.contributor.author	Hulten, Annika
dc.contributor.author	Lehtonen, Minna
dc.contributor.author	Salmelin, Riitta
dc.contributor.organization	fi=logopedia\|en=Speech-Language Pathology\|
dc.contributor.organization-code	1.2.246.10.2458963.20.46679761984
dc.converis.publication-id	458252538
dc.converis.url	https://research.utu.fi/converis/portal/Publication/458252538
dc.date.accessioned	2025-08-27T23:34:45Z
dc.date.available	2025-08-27T23:34:45Z
dc.description.abstract	This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350-500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.
dc.format.pagerange	863
dc.identifier.eissn	2641-4368
dc.identifier.jour-issn	2641-4368
dc.identifier.olddbid	204231
dc.identifier.oldhandle	10024/187258
dc.identifier.uri	https://www.utupub.fi/handle/11111/52362
dc.identifier.url	https://doi.org/10.1162/nol_a_00149
dc.identifier.urn	URN:NBN:fi-fe2025082786361
dc.language.iso	en
dc.okm.affiliatedauthor	Lehtonen, Minna
dc.okm.discipline	515 Psychology	en_GB
dc.okm.discipline	515 Psykologia	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	MIT PRESS
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.publisher.place	CAMBRIDGE
dc.relation.doi	10.1162/nol_a_00149
dc.relation.ispartofjournal	Neurobiology of language
dc.relation.issue	4
dc.relation.volume	5
dc.source.identifier	https://www.utupub.fi/handle/10024/187258
dc.title	Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words
dc.year.issued	2024

Tiedostot

Näytetään 1 - 1 / 1

Name:: nol_a_00149.pdf
Size:: 2.57 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet