Regularized Machine Learning in the Genetic Prediction of Complex Traits

Sebastian Okser; Tapio Pahikkala; Antti Airola; Tapio Salakoski; Samuli Ripatti; Tero Aittokallio

Regularized Machine Learning in the Genetic Prediction of Complex Traits

dc.contributor.author	Sebastian Okser
dc.contributor.author	Tapio Pahikkala
dc.contributor.author	Antti Airola
dc.contributor.author	Tapio Salakoski
dc.contributor.author	Samuli Ripatti
dc.contributor.author	Tero Aittokallio
dc.contributor.organization	fi=tietojenkäsittelytiede\|en=Computer Science\|
dc.contributor.organization-code	2606803
dc.converis.publication-id	3938862
dc.converis.url	https://research.utu.fi/converis/portal/Publication/3938862
dc.date.accessioned	2022-10-27T12:25:57Z
dc.date.available	2022-10-27T12:25:57Z
dc.description.abstract	<p> Compared to univariate analysis of genome-wide association (GWA) studies, machine learning–based models have been shown to provide improved means of learning such multilocus panels of genetic variants and their interactions that are most predictive of complex phenotypic traits. Many applications of predictive modeling rely on effective variable selection, often implemented through model regularization, which penalizes the model complexity and enables predictions in individuals outside of the training dataset. However, the different regularization approaches may also lead to considerable differences, especially in the number of genetic variants needed for maximal predictive accuracy, as illustrated here in examples from both disease classification and quantitative trait prediction. We also highlight the potential pitfalls of the regularized machine learning models, related to issues such as model overfitting to the training data, which may lead to over-optimistic prediction results, as well as identifiability of the predictive variants, which is important in many medical applications. While genetic risk prediction for human diseases is used as a motivating use case, we argue that these models are also widely applicable in nonhuman applications, such as animal and plant breeding, where accurate genotype-to-phenotype modeling is needed. Finally, we discuss some key future advances, open questions and challenges in this developing field, when moving toward low-frequency variants and cross-phenotype interactions.</p>
dc.identifier.jour-issn	1553-7390
dc.identifier.olddbid	175463
dc.identifier.oldhandle	10024/158557
dc.identifier.uri	https://www.utupub.fi/handle/11111/30124
dc.identifier.urn	URN:NBN:fi-fe2021042715436
dc.okm.affiliatedauthor	Okser, Sebastian
dc.okm.affiliatedauthor	Airola, Antti
dc.okm.affiliatedauthor	Salakoski, Tapio
dc.okm.affiliatedauthor	Aittokallio, Tero
dc.okm.affiliatedauthor	Pahikkala, Tapio
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A2 Scientific Article
dc.relation.doi	10.1371/journal.pgen.1004754
dc.relation.ispartofjournal	PLoS Genetics
dc.relation.issue	11
dc.relation.volume	10
dc.source.identifier	https://www.utupub.fi/handle/10024/158557
dc.title	Regularized Machine Learning in the Genetic Prediction of Complex Traits
dc.year.issued	2014

Tiedostot

Näytetään 1 - 1 / 1

Name:: journal.pgen.1004754.pdf
Size:: 509.75 KB
Format:: Adobe Portable Document Format
Description:: Regularized Machine Learning in the Genetic Prediction of Complex Traits. Okser S et al. PLOS Genetics. 2014. 10(11) DOI: 10.1371/journal.pgen.1004754

Lataa

Kokoelmat

Rinnakkaistallenteet