Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Machine learning approaches in microbiome research: challenges and best practices

Papoutsoglou Georgios; Tarazona Sonia; Lopes Marta B.; Klammsteiner Thomas; Ibrahimi Eliana; Eckenberger Julia; Novielli Pierfrancesco; Tonda Alberto; Simeon Andrea; Shigdel Rajesh; Béreux Stéphane; Vitali Giacomo; Tangaro Sabina; Lahti Leo; Temko Andriy; Claesson Marcus J.; Berland Magali

Machine learning approaches in microbiome research: challenges and best practices

Papoutsoglou Georgios
Tarazona Sonia
Lopes Marta B.
Klammsteiner Thomas
Ibrahimi Eliana
Eckenberger Julia
Novielli Pierfrancesco
Tonda Alberto
Simeon Andrea
Shigdel Rajesh
Béreux Stéphane
Vitali Giacomo
Tangaro Sabina
Lahti Leo
Temko Andriy
Claesson Marcus J.
Berland Magali
Katso/Avaa
fmicb-14-1261889.pdf (4.088Mb)
Lataukset: 

Frontiers Research Foundation
doi:10.3389/fmicb.2023.1261889
URI
https://doi.org/10.3389/fmicb.2023.1261889
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025082788470
Tiivistelmä

Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.

Kokoelmat
  • Rinnakkaistallenteet [29335]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste