Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks

dc.contributor.authorSaldivar-Espinoza Bryan
dc.contributor.authorMacip Guillem
dc.contributor.authorGarcia-Segura Pol
dc.contributor.authorMestres-Truyol Júlia
dc.contributor.authorPuigbò Pere
dc.contributor.authorCereto-Massague Adrià
dc.contributor.authorPujadas Gerard
dc.contributor.authorGarcia-Vallve Santiago
dc.contributor.organizationfi=biologian laitos|en=Department of Biology|
dc.contributor.organization-code1.2.246.10.2458963.20.77193996913
dc.converis.publication-id178071596
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/178071596
dc.date.accessioned2025-08-27T22:05:28Z
dc.date.available2025-08-27T22:05:28Z
dc.description.abstractPredicting SARS-CoV-2 mutations is difficult, but predicting recurrent mutations driven by the host, such as those caused by host deaminases, is feasible. We used machine learning to predict which positions from the SARS-CoV-2 genome will hold a recurrent mutation and which mutations will be the most recurrent. We used data from April 2021 that we separated into three sets: a training set, a validation set, and an independent test set. For the test set, we obtained a specificity value of 0.69, a sensitivity value of 0.79, and an Area Under the Curve (AUC) of 0.8, showing that the prediction of recurrent SARS-CoV-2 mutations is feasible. Subsequently, we compared our predictions with updated data from January 2022, showing that some of the false positives in our prediction model become true positives later on. The most important variables detected by the model's Shapley Additive exPlanation (SHAP) are the nucleotide that mutates and RNA reactivity. This is consistent with the SARS-CoV-2 mutational bias pattern and the preference of some host deaminases for specific sequences and RNA secondary structures. We extend our investigation by analyzing the mutations from the variants of concern Alpha, Beta, Delta, Gamma, and Omicron. Finally, we analyzed amino acid changes by looking at the predicted recurrent mutations in the M-pro and spike proteins.
dc.identifier.eissn1422-0067
dc.identifier.jour-issn1661-6596
dc.identifier.olddbid201615
dc.identifier.oldhandle10024/184642
dc.identifier.urihttps://www.utupub.fi/handle/11111/48650
dc.identifier.urlhttps://www.mdpi.com/1422-0067/23/23/14683
dc.identifier.urnURN:NBN:fi-fe202301265894
dc.language.isoen
dc.okm.affiliatedauthorPuigbo, Pedro
dc.okm.discipline1182 Biochemistry, cell and molecular biologyen_GB
dc.okm.discipline1182 Biokemia, solu- ja molekyylibiologiafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherMDPI
dc.publisher.countrySwitzerlanden_GB
dc.publisher.countrySveitsifi_FI
dc.publisher.country-codeCH
dc.relation.articlenumber14683
dc.relation.doi10.3390/ijms232314683
dc.relation.ispartofjournalInternational Journal of Molecular Sciences
dc.relation.issue23
dc.relation.volume23
dc.source.identifierhttps://www.utupub.fi/handle/10024/184642
dc.titlePrediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
ijms-23-14683.pdf
Size:
2.77 MB
Format:
Adobe Portable Document Format