Synthetic minority oversampling of vital statistics data with generative adversarial networks

dc.contributor.authorAki Koivu
dc.contributor.authorMikko Sairanen
dc.contributor.authorAntti Airola
dc.contributor.authorTapio Pahikkala
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organization-code1.2.246.10.2458963.20.23479734818
dc.converis.publication-id48930675
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/48930675
dc.date.accessioned2022-10-28T12:22:29Z
dc.date.available2022-10-28T12:22:29Z
dc.description.abstract<div><div>Objective</div><p>Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data. This is usual for vital statistics data, in which the outcome incidence dictates the amount of positive observations. In this article, we developed a novel neural network-based oversampling method called actGAN (activation-specific generative adversarial network) that can derive useful synthetic observations in terms of increasing prediction performance in this context.</p></div><div><div>Materials and Methods</div><p>From vital statistics data, the outcome of early stillbirth was chosen to be predicted based on demographics, pregnancy history, and infections. The data contained 363 560 live births and 139 early stillbirths, resulting in class imbalance of 99.96% and 0.04%. The hyperparameters of actGAN and a baseline method SMOTE-NC (Synthetic Minority Over-sampling Technique-Nominal Continuous) were tuned with Bayesian optimization, and both were compared against a cost-sensitive learning-only approach.</p></div><div><div>Results</div><p>While SMOTE-NC provided mixed results, actGAN was able to improve true positive rate at a clinically significant false positive rate and area under the curve from the receiver-operating characteristic curve consistently.</p></div><div><div>Discussion</div><p>Including an activation-specific output layer to a generator network of actGAN enables the addition of information about the underlying data structure, which overperforms the nominal mechanism of SMOTE-NC.</p></div><div><div>Conclusions</div><p>actGAN provides an improvement to the prediction performance for our learning task. Our developed method could be applied to other mixed-type data prediction tasks that are known to be afflicted by class imbalance and limited data availability.</p></div>
dc.identifier.eissn1527-974X
dc.identifier.jour-issn1067-5027
dc.identifier.olddbid176213
dc.identifier.oldhandle10024/159307
dc.identifier.urihttps://www.utupub.fi/handle/11111/31269
dc.identifier.urlhttps://doi.org/10.1093/jamia/ocaa127
dc.identifier.urnURN:NBN:fi-fe2021042824359
dc.language.isoen
dc.okm.affiliatedauthorPahikkala, Tapio
dc.okm.affiliatedauthorKoivu, Aki
dc.okm.affiliatedauthorAirola, Antti
dc.okm.discipline112 Statistics and probabilityen_GB
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherOxford University Press
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumberocaa127
dc.relation.doi10.1093/jamia/ocaa127
dc.relation.ispartofjournalJournal of the American Medical Informatics Association
dc.source.identifierhttps://www.utupub.fi/handle/10024/159307
dc.titleSynthetic minority oversampling of vital statistics data with generative adversarial networks
dc.year.issued2020

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
ocaa127.pdf
Size:
406.41 KB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF