On the distribution of isometric log-ratio coordinates under extra-multinomial count data

dc.contributor.authorKartiosuo, Noora
dc.contributor.authorVirta, Joni
dc.contributor.authorNevalainen, Jaakko
dc.contributor.authorRaitakari, Olli
dc.contributor.authorAuranen, Kari
dc.contributor.organizationfi=InFLAMES Lippulaiva|en=InFLAMES Flagship|
dc.contributor.organizationfi=kliininen laitos|en=Department of Clinical Medicine|
dc.contributor.organizationfi=sydäntutkimuskeskus|en=Cardiovascular Medicine (CAPC)|
dc.contributor.organizationfi=tilastotiede|en=Statistics|
dc.contributor.organizationfi=tyks, vsshp|en=tyks, varha|
dc.contributor.organizationfi=väestötutkimuskeskus|en=Centre for Population Health Research (POP Centre)|
dc.contributor.organization-code1.2.246.10.2458963.20.35734063924
dc.contributor.organization-code1.2.246.10.2458963.20.42133013740
dc.contributor.organization-code1.2.246.10.2458963.20.42471027641
dc.contributor.organization-code1.2.246.10.2458963.20.68445910604
dc.converis.publication-id499223265
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/499223265
dc.date.accessioned2025-08-28T00:19:40Z
dc.date.available2025-08-28T00:19:40Z
dc.description.abstractCompositional data can be mapped from the simplex to the Euclidean space through the isometric log-ratio (ilr) transformation. When the underlying counts follow a multinomial distribution, the distribution of the ensuing ilr coordinates has been shown to be asymptotically multivariate normal. We derive conditions under which the asymptotic normality of the ilr coordinates holds under a compound multinomial distribution inducing overdispersion in the counts. We derive a normal approximation and investigate its practical applicability under extra-multinomial variation using a simulation study under the Dirichlet-multinomial distribution. The approximation works well, except with a small total count or high amount of overdispersion. Our work is motivated by microbiome data, which exhibit extra-multinomial variation and are increasingly treated as compositions. We conclude that if empirical data analysis relies on the normality of ilr coordinates, it may be advisable to choose a taxonomic level with less sparsity so that the distribution of taxon-specific class probabilities remains unimodal.
dc.identifier.eissn1613-9798
dc.identifier.jour-issn0932-5026
dc.identifier.olddbid205524
dc.identifier.oldhandle10024/188551
dc.identifier.urihttps://www.utupub.fi/handle/11111/55037
dc.identifier.urlhttps://doi.org/10.1007/s00362-025-01732-8
dc.identifier.urnURN:NBN:fi-fe2025082790971
dc.language.isoen
dc.okm.affiliatedauthorKartiosuo, Noora
dc.okm.affiliatedauthorVirta, Joni
dc.okm.affiliatedauthorRaitakari, Olli
dc.okm.affiliatedauthorAuranen, Kari
dc.okm.affiliatedauthorDataimport, Kliinisen laitoksen yhteiset
dc.okm.affiliatedauthorDataimport, tyks, vsshp
dc.okm.discipline112 Statistics and probabilityen_GB
dc.okm.discipline112 Tilastotiedefi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherSpringer Science and Business Media LLC
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.publisher.placeNEW YORK
dc.relation.articlenumber113
dc.relation.doi10.1007/s00362-025-01732-8
dc.relation.ispartofjournalStatistical Papers
dc.relation.issue5
dc.relation.volume66
dc.source.identifierhttps://www.utupub.fi/handle/10024/188551
dc.titleOn the distribution of isometric log-ratio coordinates under extra-multinomial count data
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s00362-025-01732-8.pdf
Size:
3.54 MB
Format:
Adobe Portable Document Format