Chronological age estimation from human microbiomes with transformer-based Robust Principal Component Analysis

dc.contributor.authorMyers, Tyler
dc.contributor.authorSong, Se Jin
dc.contributor.authorChen, Yang
dc.contributor.authorDe Pessemier, Britta
dc.contributor.authorKhatib, Lora
dc.contributor.authorMcdonald, Daniel
dc.contributor.authorHuang, Shi
dc.contributor.authorGallo, Richard
dc.contributor.authorCallewaert, Chris
dc.contributor.authorHavulinna, Aki S.
dc.contributor.authorLahti, Leo
dc.contributor.authorRoeselers, Guus
dc.contributor.authorLaiola, Manolo
dc.contributor.authorShetty, Sudarshan A.
dc.contributor.authorKelley, Scott T.
dc.contributor.authorKnight, Rob
dc.contributor.authorBartko, Andrew
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id500029572
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/500029572
dc.date.accessioned2026-01-21T13:35:36Z
dc.date.available2026-01-21T13:35:36Z
dc.description.abstractDeep learning for microbiome analysis has shown potential for understanding microbial communities and human phenotypes. Here, we propose an approach, Transformer-based Robust Principal Component Analysis(TRPCA), which leverages the strengths of transformer architectures and interpretability of Robust Principal Component Analysis. To investigate benefits of TRPCA over conventional machine learning models, we benchmarked performance on age prediction from three body sites(skin, oral, gut), with 16S rRNA gene amplicon(16S) and whole-genome sequencing(WGS) data. We demonstrated prediction of age from longitudinal samples and combined classification and regression tasks via multi-task learning(MTL). TRPCA improves age prediction accuracy from human microbiome samples, achieving the largest reduction in Mean Absolute Error for WGS skin (MAE: 8.03, 28% reduction) and 16S skin (MAE: 5.09, 14% reduction) samples, compared to conventional approaches. Additionally, TRPCA's MTL approach achieves an accuracy of 89% for birth country prediction across 5 countries, while improving age prediction from WGS stool samples. Notably, TRPCA uncovers a link between subject and error prediction through residual analysis for paired samples across sequencing method (16S/WGS) and body site(oral/gut). These findings highlight TRPCA's utility in improving age prediction while maintaining feature-level interpretability, and elucidating connections between individuals and microbiomes.
dc.identifier.eissn2399-3642
dc.identifier.olddbid213137
dc.identifier.oldhandle10024/196155
dc.identifier.urihttps://www.utupub.fi/handle/11111/54789
dc.identifier.urlhttps://www.nature.com/articles/s42003-025-08590-y
dc.identifier.urnURN:NBN:fi-fe202601217192
dc.language.isoen
dc.okm.affiliatedauthorLahti, Leo
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline3111 Biomedicineen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline3111 Biolääketieteetfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherNATURE PORTFOLIO
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumber1159
dc.relation.doi10.1038/s42003-025-08590-y
dc.relation.ispartofjournalCommunications Biology
dc.relation.volume8
dc.source.identifierhttps://www.utupub.fi/handle/10024/196155
dc.titleChronological age estimation from human microbiomes with transformer-based Robust Principal Component Analysis
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s42003-025-08590-y.pdf
Size:
5.11 MB
Format:
Adobe Portable Document Format