Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection

dc.contributor.authorValtonen Laura
dc.contributor.authorMäkinen Saku J
dc.contributor.authorKirjavainen Johanna
dc.contributor.organizationfi=kone- ja materiaalitekniikan laitos|en=Department of Mechanical and Materials Engineering|
dc.contributor.organizationfi=tuotantotalous|en=Industrial Engineering|
dc.contributor.organization-code1.2.246.10.2458963.20.60030805372
dc.contributor.organization-code2610200
dc.converis.publication-id176969243
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/176969243
dc.date.accessioned2022-11-29T15:50:41Z
dc.date.available2022-11-29T15:50:41Z
dc.description.abstractMachine learning (ML) enables the analysis of large datasets for pattern discovery. ML methods and the standards for their use have recently attracted increasing attention in organizational research; recent accounts have raised awareness of the importance of transparent ML reporting practices, especially considering the influence of preprocessing and algorithm choice on analytical results. However, efforts made thus far to advance the quality of ML research have failed to consider the special methodological requirements of unsupervised machine learning (UML) separate from the more common supervised machine learning (SML). We confronted these issues by studying a common organizational research dataset of unstructured text and discovered interpretability and representativeness trade-offs between combinations of preprocessing and UML algorithm choices that jeopardize research reproducibility, accountability, and transparency. We highlight the need for contextual justifications to address such issues and offer principles for assessing the contextual suitability of UML choices in research settings.
dc.identifier.eissn1552-7425
dc.identifier.jour-issn1094-4281
dc.identifier.olddbid190255
dc.identifier.oldhandle10024/173346
dc.identifier.urihttps://www.utupub.fi/handle/11111/34376
dc.identifier.urlhttps://journals.sagepub.com/doi/10.1177/10944281221124947
dc.identifier.urnURN:NBN:fi-fe2022112967977
dc.language.isoen
dc.okm.affiliatedauthorMäkinen, Saku
dc.okm.discipline214 Mechanical engineeringen_GB
dc.okm.discipline216 Materials engineeringen_GB
dc.okm.discipline214 Kone- ja valmistustekniikkafi_FI
dc.okm.discipline216 Materiaalitekniikkafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherSage Publications
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.doi10.1177/10944281221124947
dc.relation.ispartofjournalOrganizational Research Methods
dc.source.identifierhttps://www.utupub.fi/handle/10024/173346
dc.titleAdvancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
10944281221124947.pdf
Size:
1.11 MB
Format:
Adobe Portable Document Format