Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

Stock M; Pahikkala T; Airola A; Waegeman W; De Baets B

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

dc.contributor.author	Stock M
dc.contributor.author	Pahikkala T
dc.contributor.author	Airola A
dc.contributor.author	Waegeman W
dc.contributor.author	De Baets B
dc.contributor.organization	fi=tietojenkäsittelytiede\|en=Computer Science\|
dc.contributor.organization-code	1.2.246.10.2458963.20.23479734818
dc.converis.publication-id	37642437
dc.converis.url	https://research.utu.fi/converis/portal/Publication/37642437
dc.date.accessioned	2025-08-27T23:34:20Z
dc.date.available	2025-08-27T23:34:20Z
dc.description.abstract	Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings.In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore.
dc.format.pagerange	271
dc.identifier.eissn	1477-4054
dc.identifier.jour-issn	1467-5463
dc.identifier.olddbid	204217
dc.identifier.oldhandle	10024/187244
dc.identifier.uri	https://www.utupub.fi/handle/11111/52379
dc.identifier.urn	URN:NBN:fi-fe2021042824507
dc.language.iso	en
dc.okm.affiliatedauthor	Pahikkala, Tapio
dc.okm.affiliatedauthor	Airola, Antti
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Oxford University Press
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.doi	10.1093/bib/bby095
dc.relation.ispartofjournal	Briefings in Bioinformatics
dc.relation.issue	1
dc.relation.volume	21
dc.source.identifier	https://www.utupub.fi/handle/10024/187244
dc.title	Algebraic shortcuts for leave-one-out cross-validation in supervised network inference
dc.year.issued	2020

Tiedostot

Näytetään 1 - 1 / 1

Name:: 242321.1.full.pdf
Size:: 744.4 KB
Format:: Adobe Portable Document Format
Description:: Final draft

Lataa

Kokoelmat

Rinnakkaistallenteet