Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Hakala, Tero; Lindh-Knuutila, Tiina; Hulten, Annika; Lehtonen, Minna; Salmelin, Riitta

Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Hakala, Tero; Lindh-Knuutila, Tiina; Hulten, Annika; Lehtonen, Minna; Salmelin, Riitta

Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Hakala, Tero

Lindh-Knuutila, Tiina

Hulten, Annika

Lehtonen, Minna

Salmelin, Riitta

Katso/Avaa

nol_a_00149.pdf (2.572Mb)

Lataukset:

MIT PRESS

doi:10.1162/nol_a_00149

URI

https://doi.org/10.1162/nol_a_00149

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025082786361

Tiivistelmä

This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350-500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.

Kokoelmat

Rinnakkaistallenteet [29335]