Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words
Hakala, Tero; Lindh-Knuutila, Tiina; Hulten, Annika; Lehtonen, Minna; Salmelin, Riitta
Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words
Hakala, Tero
Lindh-Knuutila, Tiina
Hulten, Annika
Lehtonen, Minna
Salmelin, Riitta
MIT PRESS
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025082786361
https://urn.fi/URN:NBN:fi-fe2025082786361
Tiivistelmä
This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350-500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.
Kokoelmat
- Rinnakkaistallenteet [29337]
