Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

dc.contributor.authorDrouard Gabin
dc.contributor.authorMykkänen Juha
dc.contributor.authorHeiskanen Jarkko
dc.contributor.authorPohjonen Joona
dc.contributor.authorRuohonen Saku
dc.contributor.authorPahkala Katja
dc.contributor.authorLehtimäki Terho
dc.contributor.authorWang Xiaoling
dc.contributor.authorOllikainen Miina
dc.contributor.authorRipatti Samuli
dc.contributor.authorPirinen Matti
dc.contributor.authorRaitakari Olli
dc.contributor.authorKaprio Jaakko
dc.contributor.organizationfi=InFLAMES Lippulaiva|en=InFLAMES Flagship|
dc.contributor.organizationfi=sydäntutkimuskeskus|en=Cardiovascular Medicine (CAPC)|
dc.contributor.organizationfi=tyks, vsshp|en=tyks, varha|
dc.contributor.organizationfi=väestötutkimuskeskus|en=Centre for Population Health Research (POP Centre)|
dc.contributor.organization-code1.2.246.10.2458963.20.35734063924
dc.contributor.organization-code1.2.246.10.2458963.20.42471027641
dc.contributor.organization-code1.2.246.10.2458963.20.68445910604
dc.contributor.organization-code2607008
dc.converis.publication-id393445661
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/393445661
dc.date.accessioned2025-08-28T00:02:04Z
dc.date.available2025-08-28T00:02:04Z
dc.description.abstract<p>Background: Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios.</p><p>Methods: We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning.</p><p>Results: Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively.</p><p>Conclusions: By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.</p>
dc.identifier.eissn1472-6947
dc.identifier.jour-issn1472-6947
dc.identifier.olddbid205057
dc.identifier.oldhandle10024/188084
dc.identifier.urihttps://www.utupub.fi/handle/11111/53805
dc.identifier.urlhttps://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-024-02521-3
dc.identifier.urnURN:NBN:fi-fe2025082790816
dc.language.isoen
dc.okm.affiliatedauthorMykkänen, Juha
dc.okm.affiliatedauthorHeiskanen, Jarkko
dc.okm.affiliatedauthorRuohonen, Saku
dc.okm.affiliatedauthorPahkala, Katja
dc.okm.affiliatedauthorRaitakari, Olli
dc.okm.affiliatedauthorDataimport, tyks, vsshp
dc.okm.discipline3121 Internal medicineen_GB
dc.okm.discipline3121 Sisätauditfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherBioMed Central
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumber116
dc.relation.doi10.1186/s12911-024-02521-3
dc.relation.ispartofjournalBMC Medical Informatics and Decision Making
dc.relation.issue1
dc.relation.volume24
dc.source.identifierhttps://www.utupub.fi/handle/10024/188084
dc.titleExploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data
dc.year.issued2024

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s12911-024-02521-3.pdf
Size:
5.19 MB
Format:
Adobe Portable Document Format