Subgroup detection in genotype data using invariant coordinate selection

dc.contributor.authorFischer D
dc.contributor.authorHonkatukia M
dc.contributor.authorTuiskula-Haavisto M
dc.contributor.authorNordhausen K
dc.contributor.authorCavero D
dc.contributor.authorPreisinger R
dc.contributor.authorVilkki J
dc.contributor.organizationfi=tilastotiede|en=Statistics|
dc.contributor.organization-code1.2.246.10.2458963.20.42133013740
dc.converis.publication-id20518737
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/20518737
dc.date.accessioned2022-10-27T12:22:00Z
dc.date.available2022-10-27T12:22:00Z
dc.description.abstractBackground: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst's toolbox and hence are only rarely applied.Results: We present a modern dimension reduction method called 'Invariant Coordinate Selection' (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods.Conclusion: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline.
dc.identifier.jour-issn1471-2105
dc.identifier.olddbid175022
dc.identifier.oldhandle10024/158116
dc.identifier.urihttps://www.utupub.fi/handle/11111/35313
dc.identifier.urnURN:NBN:fi-fe2021042716724
dc.language.isoen
dc.okm.affiliatedauthorNordhausen, Klaus
dc.okm.discipline112 Statistics and probabilityen_GB
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline1182 Biochemistry, cell and molecular biologyen_GB
dc.okm.discipline112 Tilastotiedefi_FI
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline1182 Biokemia, solu- ja molekyylibiologiafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherBIOMED CENTRAL LTD
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumberARTN 173
dc.relation.doi10.1186/s12859-017-1589-9
dc.relation.ispartofjournalBMC Bioinformatics
dc.relation.volume18
dc.source.identifierhttps://www.utupub.fi/handle/10024/158116
dc.titleSubgroup detection in genotype data using invariant coordinate selection
dc.year.issued2017

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s12859-017-1589-9.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format
Description:
Publisher's version