Subgroup detection in genotype data using invariant coordinate selection

Fischer D; Honkatukia M; Tuiskula-Haavisto M; Nordhausen K; Cavero D; Preisinger R; Vilkki J

Subgroup detection in genotype data using invariant coordinate selection

dc.contributor.author	Fischer D
dc.contributor.author	Honkatukia M
dc.contributor.author	Tuiskula-Haavisto M
dc.contributor.author	Nordhausen K
dc.contributor.author	Cavero D
dc.contributor.author	Preisinger R
dc.contributor.author	Vilkki J
dc.contributor.organization	fi=tilastotiede\|en=Statistics\|
dc.contributor.organization-code	1.2.246.10.2458963.20.42133013740
dc.converis.publication-id	20518737
dc.converis.url	https://research.utu.fi/converis/portal/Publication/20518737
dc.date.accessioned	2022-10-27T12:22:00Z
dc.date.available	2022-10-27T12:22:00Z
dc.description.abstract	Background: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst's toolbox and hence are only rarely applied.Results: We present a modern dimension reduction method called 'Invariant Coordinate Selection' (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods.Conclusion: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline.
dc.identifier.jour-issn	1471-2105
dc.identifier.olddbid	175022
dc.identifier.oldhandle	10024/158116
dc.identifier.uri	https://www.utupub.fi/handle/11111/35313
dc.identifier.urn	URN:NBN:fi-fe2021042716724
dc.language.iso	en
dc.okm.affiliatedauthor	Nordhausen, Klaus
dc.okm.discipline	112 Statistics and probability	en_GB
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	BIOMED CENTRAL LTD
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.articlenumber	ARTN 173
dc.relation.doi	10.1186/s12859-017-1589-9
dc.relation.ispartofjournal	BMC Bioinformatics
dc.relation.volume	18
dc.source.identifier	https://www.utupub.fi/handle/10024/158116
dc.title	Subgroup detection in genotype data using invariant coordinate selection
dc.year.issued	2017

Tiedostot

Näytetään 1 - 1 / 1

Name:: s12859-017-1589-9.pdf
Size:: 1.29 MB
Format:: Adobe Portable Document Format
Description:: Publisher's version

Lataa

Kokoelmat

Rinnakkaistallenteet