Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

Okser, Sebastian

Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

dc.contributor	Matemaattis-luonnontieteellinen tiedekunta / Faculty of Mathematics and Natural Sciences, Department of Information Technology	-
dc.contributor.author	Okser, Sebastian
dc.contributor.department	fi=Tulevaisuuden teknologioiden laitos\|en=Department of Future Technologies\|
dc.contributor.faculty	fi=Matemaattis-luonnontieteellinen tiedekunta\|en=Faculty of Mathematics and Natural Sciences\|	-
dc.date.accessioned	2015-07-29T10:09:19Z
dc.date.available	2015-07-29T10:09:19Z
dc.date.issued	2015-08-19
dc.description.abstract	Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have aﬀorded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to eﬀectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including ﬁlter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be eﬀective at predicting the disease phenotypes, but also doing so eﬃciently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.	-
dc.description.accessibilityfeature	ei tietoa saavutettavuudesta
dc.description.notification	Siirretty Doriasta
dc.format.content	fulltext
dc.identifier	ISBN 978-952-12-3245-9	-
dc.identifier.olddbid	127441
dc.identifier.oldhandle	10024/113043
dc.identifier.uri	https://www.utupub.fi/handle/11111/28881
dc.identifier.urn	URN:ISBN:978-952-12-3245-9	-
dc.language.iso	eng	-
dc.publisher	Turku Centre for Computer Science
dc.relation.ispartofseries	TUCS Dissertations
dc.relation.issn	1239-1883
dc.relation.numberinseries	201	-
dc.source.identifier	https://www.utupub.fi/handle/10024/113043
dc.title	Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases	-
dc.type.ontasot	fi=Artikkeliväitöskirja\|en=Doctoral dissertation (article-based)\|

Tiedostot

Näytetään 1 - 1 / 1

Name:: TUCSD201Okser_digi.pdf
Size:: 4.45 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Väitöskirjat