Comparative Evaluation of Machine Learning Models for BMI Prediction from Gut Microbiome Data

dc.contributor.authorHelenius, Vilho
dc.contributor.departmentfi=Tietotekniikan laitos|en=Department of Computing|
dc.contributor.facultyfi=Teknillinen tiedekunta|en=Faculty of Technology|
dc.contributor.studysubjectfi=Tietotekniikka|en=Information and Communication Technology|
dc.date.accessioned2026-05-18T19:31:24Z
dc.date.issued2026-04-07
dc.description.abstractThis thesis investigates the feasibility of predicting body mass index (BMI) from gut microbiome composition using modern machine learning approaches. The human gut microbiome has been increasingly linked to metabolic health, but the extent to which microbial community profiles can predict host phenotypes such as BMI remains an open question. Using a large-scale dataset of 9,709 gut microbiome samples, this study compares the predictive performance of several machine learning models representing different modeling paradigms for tabular data. The evaluated models include Lasso regression as a linear baseline, gradient-boosted decision tree methods (XGBoost and CatBoost), and a recently proposed transformer-based foundation model for tabular data, TabPFN. In addition to model comparison, the study examines how dataset size and feature set composition influence predictive performance. The results indicate that BMI prediction from gut microbiome data remains challenging, with overall predictive performance being moderate across all models. Among the evaluated methods, TabPFN achieved the highest predictive accuracy on the full dataset, suggesting that transformer-based foundation models may offer advantages in large-scale microbiome prediction tasks. However, tree-based ensemble models performed competitively and exhibited stronger performance in small-sample regimes. Additional experiments showed that microbiome features provide substantially more predictive signal for BMI than basic demographic variables alone, while the combination of microbiome and host metadata produced the best overall performance. Overall, the findings highlight both the potential and the limitations of current machine learning approaches for microbiome-based phenotype prediction. While meaningful predictive signal can be extracted from microbiome composition, the results suggest that microbiome data alone is insufficient for accurate prediction of complex metabolic traits such as BMI.
dc.format.extent64
dc.identifier.urihttps://www.utupub.fi/handle/11111/60757
dc.identifier.urnURN:NBN:fi-fe2026051847687
dc.language.isoeng
dc.rightsfi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|
dc.rights.accessrightsavoin
dc.subjectGut microbiome
dc.subjectmachine learning
dc.subjectBMI prediction
dc.subjecttabular data
dc.subjecttransformer models
dc.subjectTabPFN
dc.titleComparative Evaluation of Machine Learning Models for BMI Prediction from Gut Microbiome Data
dc.type.ontasotfi=Diplomityö|en=Master's thesis|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Helenius_Vilho_opinnayte.pdf
Size:
3.34 MB
Format:
Adobe Portable Document Format