Regularized Machine Learning in the Genetic Prediction of Complex Traits

dc.contributor.authorSebastian Okser
dc.contributor.authorTapio Pahikkala
dc.contributor.authorAntti Airola
dc.contributor.authorTapio Salakoski
dc.contributor.authorSamuli Ripatti
dc.contributor.authorTero Aittokallio
dc.contributor.organizationfi=kieli- ja puheteknologia|en=Language and Speech Technology|
dc.contributor.organizationfi=matemaattis-luonnontieteellinen tiedekunta|en=Faculty of Science|
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.23479734818
dc.contributor.organization-code1.2.246.10.2458963.20.36798383026
dc.contributor.organization-code1.2.246.10.2458963.20.47465613983
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.contributor.organization-code2606803
dc.converis.publication-id3938862
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/3938862
dc.date.accessioned2022-10-27T12:25:57Z
dc.date.available2022-10-27T12:25:57Z
dc.description.abstract<p> Compared to univariate analysis of genome-wide association (GWA) studies, machine learning&ndash;based models have been shown to provide improved means of learning such multilocus panels of genetic variants and their interactions that are most predictive of complex phenotypic traits. Many applications of predictive modeling rely on effective variable selection, often implemented through model regularization, which penalizes the model complexity and enables predictions in individuals outside of the training dataset. However, the different regularization approaches may also lead to considerable differences, especially in the number of genetic variants needed for maximal predictive accuracy, as illustrated here in examples from both disease classification and quantitative trait prediction. We also highlight the potential pitfalls of the regularized machine learning models, related to issues such as model overfitting to the training data, which may lead to over-optimistic prediction results, as well as identifiability of the predictive variants, which is important in many medical applications. While genetic risk prediction for human diseases is used as a motivating use case, we argue that these models are also widely applicable in nonhuman applications, such as animal and plant breeding, where accurate genotype-to-phenotype modeling is needed. Finally, we discuss some key future advances, open questions and challenges in this developing field, when moving toward low-frequency variants and cross-phenotype interactions.</p>
dc.identifier.jour-issn1553-7390
dc.identifier.olddbid175463
dc.identifier.oldhandle10024/158557
dc.identifier.urihttps://www.utupub.fi/handle/11111/30124
dc.identifier.urnURN:NBN:fi-fe2021042715436
dc.okm.affiliatedauthorOkser, Sebastian
dc.okm.affiliatedauthorAirola, Antti
dc.okm.affiliatedauthorSalakoski, Tapio
dc.okm.affiliatedauthorAittokallio, Tero
dc.okm.affiliatedauthorPahikkala, Tapio
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA2 Scientific Article
dc.relation.doi10.1371/journal.pgen.1004754
dc.relation.ispartofjournalPLoS Genetics
dc.relation.issue11
dc.relation.volume10
dc.source.identifierhttps://www.utupub.fi/handle/10024/158557
dc.titleRegularized Machine Learning in the Genetic Prediction of Complex Traits
dc.year.issued2014

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
journal.pgen.1004754.pdf
Size:
509.75 KB
Format:
Adobe Portable Document Format
Description:
Regularized Machine Learning in the Genetic Prediction of Complex Traits. Okser S et al. PLOS Genetics. 2014. 10(11) DOI: 10.1371/journal.pgen.1004754