Comparative Evaluation of Machine Learning Models for BMI Prediction from Gut Microbiome Data

avoin
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Lataukset11

Verkkojulkaisu

DOI

Tiivistelmä

This thesis investigates the feasibility of predicting body mass index (BMI) from gut microbiome composition using modern machine learning approaches. The human gut microbiome has been increasingly linked to metabolic health, but the extent to which microbial community profiles can predict host phenotypes such as BMI remains an open question. Using a large-scale dataset of 9,709 gut microbiome samples, this study compares the predictive performance of several machine learning models representing different modeling paradigms for tabular data. The evaluated models include Lasso regression as a linear baseline, gradient-boosted decision tree methods (XGBoost and CatBoost), and a recently proposed transformer-based foundation model for tabular data, TabPFN. In addition to model comparison, the study examines how dataset size and feature set composition influence predictive performance. The results indicate that BMI prediction from gut microbiome data remains challenging, with overall predictive performance being moderate across all models. Among the evaluated methods, TabPFN achieved the highest predictive accuracy on the full dataset, suggesting that transformer-based foundation models may offer advantages in large-scale microbiome prediction tasks. However, tree-based ensemble models performed competitively and exhibited stronger performance in small-sample regimes. Additional experiments showed that microbiome features provide substantially more predictive signal for BMI than basic demographic variables alone, while the combination of microbiome and host metadata produced the best overall performance. Overall, the findings highlight both the potential and the limitations of current machine learning approaches for microbiome-based phenotype prediction. While meaningful predictive signal can be extracted from microbiome composition, the results suggest that microbiome data alone is insufficient for accurate prediction of complex metabolic traits such as BMI.

item.page.okmtext