The spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers

dc.contributor.authorAntti Airola
dc.contributor.authorJonne Pohjankukka
dc.contributor.authorJohanna Torppa
dc.contributor.authorMaarit Middleton
dc.contributor.authorVesa Nykänen
dc.contributor.authorJukka Heikkonen
dc.contributor.authorTapio Pahikkala
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organization-code1.2.246.10.2458963.20.23479734818
dc.converis.publication-id37570873
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/37570873
dc.date.accessioned2022-10-28T13:12:55Z
dc.date.available2022-10-28T13:12:55Z
dc.description.abstract<p>Machine learning based classification methods are widely used in geoscience applications, including mineral prospectivity mapping. Typical characteristics of the data, such as small number of positive instances, imbalanced class distributions and lack of verified negative instances make ROC analysis and cross-validation natural choices for classifier evaluation. However, recent literature has identified two sources of bias, that can affect reliability of area under ROC curve estimation via cross-validation on spatial data. The pooling procedure performed by methods such as leave-one-out can introduce a substantial negative bias to results. At the same time, spatial dependencies leading to spatial autocorrelation can result in overoptimistic results, if not corrected for. In this work, we introduce the spatial leave-pair-out cross-validation method, that corrects for both of these biases simultaneously. The methodology is used to benchmark a number of classification methods on mineral prospectivity mapping data from the Central Lapland greenstone belt. The evaluation highlights the dangers of obtaining misleading results on spatial data and demonstrates how these problems can be avoided. Further, the results show the advantages of simple linear models for this classification task.<br /></p>
dc.format.pagerange730
dc.format.pagerange747
dc.identifier.eissn1573-756X
dc.identifier.jour-issn1384-5810
dc.identifier.olddbid180534
dc.identifier.oldhandle10024/163628
dc.identifier.urihttps://www.utupub.fi/handle/11111/31115
dc.identifier.urlhttps://link.springer.com/article/10.1007/s10618-018-00607-x
dc.identifier.urnURN:NBN:fi-fe2021042821808
dc.language.isoen
dc.okm.affiliatedauthorAirola, Antti
dc.okm.affiliatedauthorPohjankukka, Jonne
dc.okm.affiliatedauthorHeikkonen, Jukka
dc.okm.affiliatedauthorPahikkala, Tapio
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline1171 Geosciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline1171 Geotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherSpringer New York LLC
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.doi10.1007/s10618-018-00607-x
dc.relation.ispartofjournalData Mining and Knowledge Discovery
dc.relation.issue3
dc.relation.volume33
dc.source.identifierhttps://www.utupub.fi/handle/10024/163628
dc.titleThe spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers
dc.year.issued2019

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Airola2018_Article_TheSpatialLeave-pair-outCross-.pdf
Size:
2.57 MB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF