Prokaryote growth temperature prediction with machine learning
Reunamo, Akseli (2021-09-06)
Prokaryote growth temperature prediction with machine learning
Reunamo, Akseli
(06.09.2021)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021092947520
https://urn.fi/URN:NBN:fi-fe2021092947520
Tiivistelmä
Archaea and bacteria can be divided into four groups based on their growth temperature
adaptation: mesophiles, thermophiles, hyperthermophiles, and psychrophiles. The thermostability of proteins is a sum of multiple different physical forces such as van der Waals
interactions, chemical polarity, and ionic interactions. Genes causing the adaptation have
not been identified and this thesis aims to identify temperature adaptation linked genes
and predict temperature adaptation based on the absence or presence of genes. A dataset
of 4361 genes from 711 prokaryotes was analyzed with four different machine learning
algorithms: neural network, random forest, gradient boosting machine, and logistic regression. Logistic regression was chosen to be an explanatory and predictive model based
on micro averaged AUC and Occam’s razor principle. Logistic regression was able to
predict temperature adaptation with good performance. Machine learning is a powerful
predictor for temperature adaptation and less than 200 genes were needed for the prediction of each adaptation. This technique can be used to predict the adaptation of uncultivated prokaryotes. However, the statistical importance of genes connected to temperature
adaptation was not verified and this thesis did not provide much additional support for
previously proposed temperature adaptation linked genes.
adaptation: mesophiles, thermophiles, hyperthermophiles, and psychrophiles. The thermostability of proteins is a sum of multiple different physical forces such as van der Waals
interactions, chemical polarity, and ionic interactions. Genes causing the adaptation have
not been identified and this thesis aims to identify temperature adaptation linked genes
and predict temperature adaptation based on the absence or presence of genes. A dataset
of 4361 genes from 711 prokaryotes was analyzed with four different machine learning
algorithms: neural network, random forest, gradient boosting machine, and logistic regression. Logistic regression was chosen to be an explanatory and predictive model based
on micro averaged AUC and Occam’s razor principle. Logistic regression was able to
predict temperature adaptation with good performance. Machine learning is a powerful
predictor for temperature adaptation and less than 200 genes were needed for the prediction of each adaptation. This technique can be used to predict the adaptation of uncultivated prokaryotes. However, the statistical importance of genes connected to temperature
adaptation was not verified and this thesis did not provide much additional support for
previously proposed temperature adaptation linked genes.