Quantifying bias and uncertainty in historical data collections with probabilistic programming

dc.contributor.authorLeo Lahti
dc.contributor.authorEetu Mäkelä
dc.contributor.authorMikko Tolonen
dc.contributor.organizationfi=sovellettu matematiikka|en=Applied mathematics|
dc.contributor.organization-code1.2.246.10.2458963.20.48078768388
dc.converis.publication-id51181243
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/51181243
dc.date.accessioned2022-10-28T14:28:40Z
dc.date.available2022-10-28T14:28:40Z
dc.description.abstract<p>The enhanced access to ever-expanding digital data collections and open computational methods have led to the emergence of new research lines within the humanities and social sciences, bringing in new quantitative evidence and insights. Any data interpretation depends critically on understanding of the scope and limitations in data collection, as well as on reliable downstream analysis. Quantitative analysis can complement qualitative research by providing access to overlooked information that is accessible only through systematic discovery and analysis of latent patterns underlying the available data collections. Probabilistic programming is an expanding paradigm in machine learning that provides new statistical tools for intuitive interpretation of complex data sets. This new paradigm stems from Bayesian analysis and emphasizes explicit modeling of the data generating processes and associated uncertainties. Despite its remarkable application potential, probabilistic programming has so far received little attention in computational humanities. We use a brief case study in computational history to demonstrate how probabilistic programming can be incorporated in reproducible data science workflows in order to detect and quantify bias in a widely studied historical text collection, the Eighteenth Century Collections Online.<br /></p>
dc.format.pagerange280
dc.format.pagerange289
dc.identifier.issn1613-0073
dc.identifier.jour-issn1613-0073
dc.identifier.olddbid188498
dc.identifier.oldhandle10024/171592
dc.identifier.urihttps://www.utupub.fi/handle/11111/52719
dc.identifier.urlhttp://ceur-ws.org/Vol-2723/short46.pdf
dc.identifier.urnURN:NBN:fi-fe2021042826733
dc.language.isoen
dc.okm.affiliatedauthorLahti, Leo
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryGermanyen_GB
dc.publisher.countrySaksafi_FI
dc.publisher.country-codeDE
dc.relation.conferenceWorkshop on Computational Humanities Research
dc.relation.ispartofjournalCEUR Workshop Proceedings
dc.relation.volume2723
dc.source.identifierhttps://www.utupub.fi/handle/10024/171592
dc.titleQuantifying bias and uncertainty in historical data collections with probabilistic programming
dc.title.book1st Workshop on Computational Humanities Research
dc.year.issued2020

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
short46.pdf
Size:
306.26 KB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF