Evaluation metrics and statistical tests for machine learning

dc.contributor.authorRainio, Oona
dc.contributor.authorTeuho, Jarmo
dc.contributor.authorKlén, Riku
dc.contributor.organizationfi=PET-keskus|en=Turku PET Centre|
dc.contributor.organizationfi=tyks, vsshp|en=tyks, varha|
dc.contributor.organization-code1.2.246.10.2458963.20.14646305228
dc.converis.publication-id387398800
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/387398800
dc.date.accessioned2025-08-27T22:38:02Z
dc.date.available2025-08-27T22:38:02Z
dc.description.abstractResearch on different machine learning (ML) has become incredibly popular during the past few decades. However, for some researchers not familiar with statistics, it might be difficult to understand how to evaluate the performance of ML models and compare them with each other. Here, we introduce the most common evaluation metrics used for the typical supervised ML tasks including binary, multi-class, and multi-label classification, regression, image segmentation, object detection, and information retrieval. We explain how to choose a suitable statistical test for comparing models, how to obtain enough values of the metric for testing, and how to perform the test and interpret its results. We also present a few practical examples about comparing convolutional neural networks used to classify X-rays with different lung infections and detect cancer tumors in positron emission tomography images.
dc.identifier.jour-issn2045-2322
dc.identifier.olddbid202506
dc.identifier.oldhandle10024/185533
dc.identifier.urihttps://www.utupub.fi/handle/11111/47020
dc.identifier.urlhttps://doi.org/10.1038/s41598-024-56706-x
dc.identifier.urnURN:NBN:fi-fe2025082789812
dc.language.isoen
dc.okm.affiliatedauthorRainio, Oona
dc.okm.affiliatedauthorTeuho, Jarmo
dc.okm.affiliatedauthorKlén, Riku
dc.okm.affiliatedauthorDataimport, tyks, vsshp
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline217 Medical engineeringen_GB
dc.okm.discipline3126 Surgery, anesthesiology, intensive care, radiologyen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline217 Lääketieteen tekniikkafi_FI
dc.okm.discipline3126 Kirurgia, anestesiologia, tehohoito, radiologiafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherNature Research
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumber6086
dc.relation.doi10.1038/s41598-024-56706-x
dc.relation.ispartofjournalScientific Reports
dc.relation.volume14
dc.source.identifierhttps://www.utupub.fi/handle/10024/185533
dc.titleEvaluation metrics and statistical tests for machine learning
dc.year.issued2024

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s41598-024-56706-x.pdf
Size:
1.42 MB
Format:
Adobe Portable Document Format