Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection

Valtonen Laura; Mäkinen Saku J; Kirjavainen Johanna

Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection

dc.contributor.author	Valtonen Laura
dc.contributor.author	Mäkinen Saku J
dc.contributor.author	Kirjavainen Johanna
dc.contributor.organization	fi=tuotantotalous\|en=Industrial Engineering\|
dc.contributor.organization-code	2610200
dc.converis.publication-id	176969243
dc.converis.url	https://research.utu.fi/converis/portal/Publication/176969243
dc.date.accessioned	2022-11-29T15:50:41Z
dc.date.available	2022-11-29T15:50:41Z
dc.description.abstract	Machine learning (ML) enables the analysis of large datasets for pattern discovery. ML methods and the standards for their use have recently attracted increasing attention in organizational research; recent accounts have raised awareness of the importance of transparent ML reporting practices, especially considering the influence of preprocessing and algorithm choice on analytical results. However, efforts made thus far to advance the quality of ML research have failed to consider the special methodological requirements of unsupervised machine learning (UML) separate from the more common supervised machine learning (SML). We confronted these issues by studying a common organizational research dataset of unstructured text and discovered interpretability and representativeness trade-offs between combinations of preprocessing and UML algorithm choices that jeopardize research reproducibility, accountability, and transparency. We highlight the need for contextual justifications to address such issues and offer principles for assessing the contextual suitability of UML choices in research settings.
dc.identifier.eissn	1552-7425
dc.identifier.jour-issn	1094-4281
dc.identifier.olddbid	190255
dc.identifier.oldhandle	10024/173346
dc.identifier.uri	https://www.utupub.fi/handle/11111/34376
dc.identifier.url	https://journals.sagepub.com/doi/10.1177/10944281221124947
dc.identifier.urn	URN:NBN:fi-fe2022112967977
dc.language.iso	en
dc.okm.affiliatedauthor	Mäkinen, Saku
dc.okm.discipline	214 Mechanical engineering	en_GB
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Sage Publications
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.relation.doi	10.1177/10944281221124947
dc.relation.ispartofjournal	Organizational Research Methods
dc.source.identifier	https://www.utupub.fi/handle/10024/173346
dc.title	Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection
dc.year.issued	2022

Tiedostot

Näytetään 1 - 1 / 1

Name:: 10944281221124947.pdf
Size:: 1.11 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet