Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection
| dc.contributor.author | Valtonen Laura | |
| dc.contributor.author | Mäkinen Saku J | |
| dc.contributor.author | Kirjavainen Johanna | |
| dc.contributor.organization | fi=kone- ja materiaalitekniikan laitos|en=Department of Mechanical and Materials Engineering| | |
| dc.contributor.organization | fi=tuotantotalous|en=Industrial Engineering| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.60030805372 | |
| dc.contributor.organization-code | 2610200 | |
| dc.converis.publication-id | 176969243 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/176969243 | |
| dc.date.accessioned | 2022-11-29T15:50:41Z | |
| dc.date.available | 2022-11-29T15:50:41Z | |
| dc.description.abstract | Machine learning (ML) enables the analysis of large datasets for pattern discovery. ML methods and the standards for their use have recently attracted increasing attention in organizational research; recent accounts have raised awareness of the importance of transparent ML reporting practices, especially considering the influence of preprocessing and algorithm choice on analytical results. However, efforts made thus far to advance the quality of ML research have failed to consider the special methodological requirements of unsupervised machine learning (UML) separate from the more common supervised machine learning (SML). We confronted these issues by studying a common organizational research dataset of unstructured text and discovered interpretability and representativeness trade-offs between combinations of preprocessing and UML algorithm choices that jeopardize research reproducibility, accountability, and transparency. We highlight the need for contextual justifications to address such issues and offer principles for assessing the contextual suitability of UML choices in research settings. | |
| dc.identifier.eissn | 1552-7425 | |
| dc.identifier.jour-issn | 1094-4281 | |
| dc.identifier.olddbid | 190255 | |
| dc.identifier.oldhandle | 10024/173346 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/34376 | |
| dc.identifier.url | https://journals.sagepub.com/doi/10.1177/10944281221124947 | |
| dc.identifier.urn | URN:NBN:fi-fe2022112967977 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Mäkinen, Saku | |
| dc.okm.discipline | 214 Mechanical engineering | en_GB |
| dc.okm.discipline | 216 Materials engineering | en_GB |
| dc.okm.discipline | 214 Kone- ja valmistustekniikka | fi_FI |
| dc.okm.discipline | 216 Materiaalitekniikka | fi_FI |
| dc.okm.internationalcopublication | not an international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A1 ScientificArticle | |
| dc.publisher | Sage Publications | |
| dc.publisher.country | United States | en_GB |
| dc.publisher.country | Yhdysvallat (USA) | fi_FI |
| dc.publisher.country-code | US | |
| dc.relation.doi | 10.1177/10944281221124947 | |
| dc.relation.ispartofjournal | Organizational Research Methods | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/173346 | |
| dc.title | Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection | |
| dc.year.issued | 2022 |
Tiedostot
1 - 1 / 1