Neural Network and Random Forest Models in Protein Function Prediction

dc.contributor.authorHakala Kai
dc.contributor.authorKaewphan Suwisa
dc.contributor.authorBjörne Jari
dc.contributor.authorMehryary Farrokh
dc.contributor.authorMoen Hans
dc.contributor.authorTolvanen Martti
dc.contributor.authorSalakoski Tapio
dc.contributor.authorGinter Filip
dc.contributor.organizationfi=kieli- ja puheteknologia|en=Language and Speech Technology|
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organization-code1.2.246.10.2458963.20.47465613983
dc.contributor.organization-code2606803
dc.converis.publication-id51384893
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/51384893
dc.date.accessioned2025-08-27T21:33:13Z
dc.date.available2025-08-27T21:33:13Z
dc.description.abstract<p>Over the past decade, the demand for automated protein function prediction has increased due to the volume of newly sequenced proteins. In this paper, we address the function prediction task by developing an ensemble system automatically assigning Gene Ontology (GO) terms to the given input protein sequence. We develop an ensemble system which combines the GO predictions made by random forest (RF) and neural network (NN) classifiers. Both RF and NN models rely on features derived from BLAST sequence alignments, taxonomy and protein signature analysis tools. In addition, we report on experiments with a NN model that directly analyzes the amino acid sequence as its sole input, using a convolutional layer. The Swiss-Prot database is used as the training and evaluation data. In the CAFA3 evaluation, which relies on experimental verification of the functional predictions, our submitted ensemble model demonstrates competitive performance ranking among top-10 best-performing systems out of over 100 submitted systems. In this paper, we evaluate and further improve the CAFA3-submitted system. Our machine learning models together with the data pre-processing and feature generation tools are publicly available as an open source software at https://github.com/TurkuNLP/CAFA3.<br></p>
dc.format.pagerange1772
dc.format.pagerange1781
dc.identifier.eissn1557-9964
dc.identifier.jour-issn1545-5963
dc.identifier.olddbid200603
dc.identifier.oldhandle10024/183630
dc.identifier.urihttps://www.utupub.fi/handle/11111/46113
dc.identifier.urlhttps://doi.org/10.1109/TCBB.2020.3044230
dc.identifier.urnURN:NBN:fi-fe2021042822819
dc.language.isoen
dc.okm.affiliatedauthorHakala, Kai
dc.okm.affiliatedauthorKaewphan, Suwisa
dc.okm.affiliatedauthorBjörne, Jari
dc.okm.affiliatedauthorMehryary, Farrokh
dc.okm.affiliatedauthorMoen, Hans
dc.okm.affiliatedauthorTolvanen, Martti
dc.okm.affiliatedauthorSalakoski, Tapio
dc.okm.affiliatedauthorGinter, Filip
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.doi10.1109/TCBB.2020.3044230
dc.relation.ispartofjournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
dc.relation.issue3
dc.relation.volume19
dc.source.identifierhttps://www.utupub.fi/handle/10024/183630
dc.titleNeural Network and Random Forest Models in Protein Function Prediction
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
09291483.pdf
Size:
438.56 KB
Format:
Adobe Portable Document Format
Description:
Publishers's PDF