Evaluation metrics and statistical tests for machine learning

Rainio, Oona; Teuho, Jarmo; Klén, Riku

Evaluation metrics and statistical tests for machine learning

dc.contributor.author	Rainio, Oona
dc.contributor.author	Teuho, Jarmo
dc.contributor.author	Klén, Riku
dc.contributor.organization	fi=tyks, vsshp\|en=tyks, varha\|
dc.contributor.organization-code	1.2.246.10.2458963.20.14646305228
dc.converis.publication-id	387398800
dc.converis.url	https://research.utu.fi/converis/portal/Publication/387398800
dc.date.accessioned	2025-08-27T22:38:02Z
dc.date.available	2025-08-27T22:38:02Z
dc.description.abstract	Research on different machine learning (ML) has become incredibly popular during the past few decades. However, for some researchers not familiar with statistics, it might be difficult to understand how to evaluate the performance of ML models and compare them with each other. Here, we introduce the most common evaluation metrics used for the typical supervised ML tasks including binary, multi-class, and multi-label classification, regression, image segmentation, object detection, and information retrieval. We explain how to choose a suitable statistical test for comparing models, how to obtain enough values of the metric for testing, and how to perform the test and interpret its results. We also present a few practical examples about comparing convolutional neural networks used to classify X-rays with different lung infections and detect cancer tumors in positron emission tomography images.
dc.identifier.olddbid	202506
dc.identifier.oldhandle	10024/185533
dc.identifier.uri	https://www.utupub.fi/handle/11111/47020
dc.identifier.url	https://doi.org/10.1038/s41598-024-56706-x
dc.identifier.urn	URN:NBN:fi-fe2025082789812
dc.language.iso	en
dc.okm.affiliatedauthor	Rainio, Oona
dc.okm.affiliatedauthor	Teuho, Jarmo
dc.okm.affiliatedauthor	Klén, Riku
dc.okm.affiliatedauthor	Dataimport, tyks, vsshp
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Nature Research
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.articlenumber	6086
dc.relation.doi	10.1038/s41598-024-56706-x
dc.relation.ispartofjournal	Scientific Reports
dc.relation.volume	14
dc.source.identifier	https://www.utupub.fi/handle/10024/185533
dc.title	Evaluation metrics and statistical tests for machine learning
dc.year.issued	2024

Tiedostot

Näytetään 1 - 1 / 1

Name:: s41598-024-56706-x.pdf
Size:: 1.42 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet