Näytä suppeat kuvailutiedot

Crouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values

Maurits Luke; Honkola Terhi; Rota Jadranka; Vesakoski Outi; Syrjänen Kaj; Leino Unni

dc.contributor.authorMaurits Luke
dc.contributor.authorHonkola Terhi
dc.contributor.authorRota Jadranka
dc.contributor.authorVesakoski Outi
dc.contributor.authorSyrjänen Kaj
dc.contributor.authorLeino Unni
dc.date.accessioned2022-10-28T14:25:01Z
dc.date.available2022-10-28T14:25:01Z
dc.identifier.urihttps://www.utupub.fi/handle/10024/171232
dc.description.abstractIn recent years, techniques such as Bayesian inference of phylogeny have become a standard part of the quantitative linguistic toolkit. While these tools successfully model the tree-like component of a linguistic dataset, real-world datasets generally include a combination of tree-like and nontree-like signals. Alongside developing techniques for modeling nontree-like data, an important requirement for future quantitative work is to build a principled understanding of this structural complexity of linguistic datasets. Some techniques exist for exploring the general structure of a linguistic dataset, such as NeighborNets, delta scores, and Q-residuals; however, these methods are not without limitations or drawbacks. In general, the question of what kinds of historical structure a linguistic dataset can contain and how these might be detected or measured remains critically underexplored from an objective, quantitative perspective. In this article, we propose TIGER values, a metric that estimates the internal consistency of a genetic dataset, as an additional metric for assessing how tree-like a linguistic dataset is. We use TIGER values to explore simulated language data ranging from very tree-like to completely unstructured, and also use them to analyze a cognate-coded basic vocabulary dataset of Uralic languages. As a point of comparison for the TIGER values, we also explore the same data using delta scores, Q-residuals, and NeighborNets. Our results suggest that TIGER values are capable of both ranking tree-like datasets according to their degree of treelikeness, as well as distinguishing datasets with tree-like structure from datasets with a nontree-like structure. Consequently, we argue that TIGER values serve as a useful metric for measuring the historical heterogeneity of datasets. Our results also highlight the complexities in measuring treelikeness from linguistic data, and how the metrics approach this question from different perspectives.
dc.language.isoen
dc.publisherOXFORD UNIV PRESS
dc.titleCrouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values
dc.identifier.urlhttps://academic.oup.com/jole/article/6/2/99/6428504
dc.identifier.urnURN:NBN:fi-fe2022012711043
dc.relation.volume6
dc.contributor.organizationfi=mat.-luonn.t. tdk yhteiset|en=Mat.-luonn.t. tdk yhteiset|
dc.contributor.organizationfi=biologian laitoksen yhteiset|en=Department of Biology|
dc.contributor.organizationfi=ekologia ja evoluutiobiologia|en=Ecology and Evolutionary Biology|
dc.contributor.organizationfi=suomen kieli ja suomalais-ugrilainen kielentutkimus|en=Department of Finnish and Finno-Ugric Languages|
dc.contributor.organization-code2606400
dc.contributor.organization-code2602110
dc.contributor.organization-code2606402
dc.contributor.organization-code2606000
dc.converis.publication-id68678681
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/68678681
dc.format.pagerange118
dc.format.pagerange99
dc.identifier.eissn2058-458X
dc.identifier.jour-issn2058-4571
dc.okm.affiliatedauthorHonkola, Terhi
dc.okm.affiliatedauthorSyrjänen, Kaj
dc.okm.affiliatedauthorVesakoski, Outi
dc.okm.discipline6121 Languagesen_GB
dc.okm.discipline6121 Kielitieteetfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeJournal article
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.doi10.1093/jole/lzab004
dc.relation.ispartofjournalJournal of Language Evolution
dc.relation.issue2
dc.year.issued2021


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

Näytä suppeat kuvailutiedot