Crouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values

Syrjänen Kaj; Maurits Luke; Leino Unni; Honkola Terhi; Rota Jadranka; Vesakoski Outi

Crouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values

dc.contributor.author	Syrjänen Kaj
dc.contributor.author	Maurits Luke
dc.contributor.author	Leino Unni
dc.contributor.author	Honkola Terhi
dc.contributor.author	Rota Jadranka
dc.contributor.author	Vesakoski Outi
dc.contributor.organization	fi=ekologia ja evoluutiobiologia\|en=Ecology and Evolutionary Biology \|
dc.contributor.organization-code	1.2.246.10.2458963.20.20415010352
dc.converis.publication-id	68678681
dc.converis.url	https://research.utu.fi/converis/portal/Publication/68678681
dc.date.accessioned	2022-10-28T14:25:01Z
dc.date.available	2022-10-28T14:25:01Z
dc.description.abstract	In recent years, techniques such as Bayesian inference of phylogeny have become a standard part of the quantitative linguistic toolkit. While these tools successfully model the tree-like component of a linguistic dataset, real-world datasets generally include a combination of tree-like and nontree-like signals. Alongside developing techniques for modeling nontree-like data, an important requirement for future quantitative work is to build a principled understanding of this structural complexity of linguistic datasets. Some techniques exist for exploring the general structure of a linguistic dataset, such as NeighborNets, delta scores, and Q-residuals; however, these methods are not without limitations or drawbacks. In general, the question of what kinds of historical structure a linguistic dataset can contain and how these might be detected or measured remains critically underexplored from an objective, quantitative perspective. In this article, we propose TIGER values, a metric that estimates the internal consistency of a genetic dataset, as an additional metric for assessing how tree-like a linguistic dataset is. We use TIGER values to explore simulated language data ranging from very tree-like to completely unstructured, and also use them to analyze a cognate-coded basic vocabulary dataset of Uralic languages. As a point of comparison for the TIGER values, we also explore the same data using delta scores, Q-residuals, and NeighborNets. Our results suggest that TIGER values are capable of both ranking tree-like datasets according to their degree of treelikeness, as well as distinguishing datasets with tree-like structure from datasets with a nontree-like structure. Consequently, we argue that TIGER values serve as a useful metric for measuring the historical heterogeneity of datasets. Our results also highlight the complexities in measuring treelikeness from linguistic data, and how the metrics approach this question from different perspectives.
dc.format.pagerange	118
dc.format.pagerange	99
dc.identifier.eissn	2058-458X
dc.identifier.jour-issn	2058-458X
dc.identifier.olddbid	188138
dc.identifier.oldhandle	10024/171232
dc.identifier.uri	https://www.utupub.fi/handle/11111/39801
dc.identifier.url	https://academic.oup.com/jole/article/6/2/99/6428504
dc.identifier.urn	URN:NBN:fi-fe2022012711043
dc.language.iso	en
dc.okm.affiliatedauthor	Syrjänen, Kaj
dc.okm.affiliatedauthor	Honkola, Terhi
dc.okm.affiliatedauthor	Vesakoski, Outi
dc.okm.discipline	6121 Languages	en_GB
dc.okm.discipline	6121 Kielitieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	OXFORD UNIV PRESS
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.doi	10.1093/jole/lzab004
dc.relation.ispartofjournal	Journal of Language Evolution
dc.relation.issue	2
dc.relation.volume	6
dc.source.identifier	https://www.utupub.fi/handle/10024/171232
dc.title	Crouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values
dc.year.issued	2021

Tiedostot

Näytetään 1 - 1 / 1

Name:: lzab004.pdf
Size:: 1.1 MB
Format:: Adobe Portable Document Format
Description:: Publisher´s pdf

Lataa

Kokoelmat

Rinnakkaistallenteet