Addressing imbalanced data for machine learning based mineral prospectivity mapping

dc.contributor.authorFarahnakian, Fahimeh
dc.contributor.authorSheikh, Javad
dc.contributor.authorZelioli, Luca
dc.contributor.authorNidhi, Dipak
dc.contributor.authorSeppä, Iiro
dc.contributor.authorIlo, Rami
dc.contributor.authorNevalainen, Paavo
dc.contributor.authorHeikkonen, Jukka
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id458935530
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/458935530
dc.date.accessioned2025-08-28T02:04:07Z
dc.date.available2025-08-28T02:04:07Z
dc.description.abstract<p>Effective Mineral Prospectivity Mapping (MPM) relies on the ability of Machine Learning (ML) models to extract meaningful patterns from geophysical data. However, in mineral exploration, identifying the presence of mineral deposits is often a rare event compared with the overall geological landscape. This rarity leads to a highly imbalanced dataset, where positive instances (mineralized samples) are considerably less frequent than negative instances (non-mineralized samples). Imbalanced data can potentially bias ML models towards the majority class, leading to inaccurate predictions for the minority class (mineralized samples) which are of primary interest. To address this challenge, we proposed two-level methods in this study. At the data level, we employed imbalanced data handling techniques that operate on the training dataset and change the class distribution. At the algorithmic level, we adjust the decision threshold of a model to balance the trade-off between false positives and false negatives. Experimental results are collected on a geophysical data from Lapland, Finland. The dataset exhibits a significant class imbalance, comprising 17 positive samples contrasted with 1.84×106 negative samples. We investigate the effect of the handling imbalanced data on the performance of four ML models including Multi-Layer Perceptron (MLP), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). From the results, we found that the MLP model achieved the best overall performance, with total accuracy of 97.13% on balanced data using synthetic minority oversampling method. Random forest and DT also performed well, with accuracies of 88.34% and 89.35%, respectively. The implemented methodology of this work is integrated in QGIS as a new toolkit which is called EIS Toolkit<a href="https://www.sciencedirect.com/science/article/pii/S0169136824004037?dgcid=raven_sd_search_email#fn1"><sup>1</sup></a> for MPM.<br></p>
dc.identifier.eissn1872-7360
dc.identifier.jour-issn0169-1368
dc.identifier.olddbid208533
dc.identifier.oldhandle10024/191560
dc.identifier.urihttps://www.utupub.fi/handle/11111/57983
dc.identifier.urlhttps://doi.org/10.1016/j.oregeorev.2024.106270
dc.identifier.urnURN:NBN:fi-fe2025082788016
dc.language.isoen
dc.okm.affiliatedauthorFarahnakian, Fahimeh
dc.okm.affiliatedauthorSheikh, Javad
dc.okm.affiliatedauthorZelioli, Luca
dc.okm.affiliatedauthorNidhi, Dipak
dc.okm.affiliatedauthorSeppä, Iiro
dc.okm.affiliatedauthorIlo, Rami
dc.okm.affiliatedauthorNevalainen, Paavo
dc.okm.affiliatedauthorHeikkonen, Jukka
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline1171 Geosciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline1171 Geotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA2 Scientific Article
dc.publisherElsevier BV
dc.publisher.countryNetherlandsen_GB
dc.publisher.countryAlankomaatfi_FI
dc.publisher.country-codeNL
dc.relation.articlenumber106270
dc.relation.doi10.1016/j.oregeorev.2024.106270
dc.relation.ispartofjournalOre Geology Reviews
dc.relation.volume174
dc.source.identifierhttps://www.utupub.fi/handle/10024/191560
dc.titleAddressing imbalanced data for machine learning based mineral prospectivity mapping
dc.year.issued2024

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
1-s2.0-S0169136824004037-main.pdf
Size:
10.33 MB
Format:
Adobe Portable Document Format