Forecasting future events with publicly accessible online data: A Study on Finnish parliamentary elections from 2015 to 2023
Vepsäläinen, Tapio (2025-05-30)
Forecasting future events with publicly accessible online data: A Study on Finnish parliamentary elections from 2015 to 2023
Vepsäläinen, Tapio
(30.05.2025)
Turun yliopisto. Turun kauppakorkeakoulu
Julkaisun pysyvä osoite on:
https://urn.fi/URN:ISBN:978-952-02-0189-0
https://urn.fi/URN:ISBN:978-952-02-0189-0
Tiivistelmä
Publicly accessible online data has become an increasingly feasible data source for predictive analytics. This thesis explores how digital footprints, such as social media interactions, can be used in forecasting. Focusing specifcally on elections as an ex¬ample application, the thesis presents a series of models used to forecast the outcome of Finnish parliamentary elections. By evaluating the precision and limitations of the models in electoral forecasting, this research seeks to bridge the gap between data science and information systems research, offering insight into the broader impact of digital data utilization in societal decision-making contexts.
The methodology employed in this study combines predictive modeling and data science techniques. The research integrates publicly available data, such as social media interactions and online content, to train models capable of forecasting electoral results. The approach is built on multiple original studies, each exploring different facets of election prediction. The robustness and practical utility of the predictions are assessed through real-world testing, involving the publication of forecasts prior to elections. In addition, the interpretability of the models is analyzed to understand whether the results align with political theories. Ethical considerations, such as privacy and data ownership, are also carefully examined throughout the study.
The key fndings of this study demonstrate the potential of using publicly available online data to forecast election outcomes. The forecasting models evolved signifcantly during the three election cycles studied (2015, 2019, and 2023). The fnal model integrates diverse data sources, including social media interactions, electoral history, and candidate attributes. Progressive improvements in accuracy were observed throughout the study, and the models eventually approached the precision of traditional polling methods. The study underscores the incremental benefts of incorporating diverse data types while addressing the challenges associated with data collection and feature selection. Although current models exhibit robust predictive capabilities, their practical applicability compared to opinion polls is limited. However, the results suggest that there is substantial promise for future enhancements.
The research advances the feld of election forecasting by introducing a methodology that leverages publicly accessible candidate data alongside social media insights, offering a candidate-level perspective on electoral predictions. This approach not only complements traditional macro-level methods, but also provides insights towards understanding the theoretical foundations of voting behavior. Although the potential of social media as a predictive tool is highlighted, the research acknowledges existing challenges such as bias, suggesting mitigation strategies, and underscoring the importance of domain knowledge in data-driven research. Practically, the study suggests that hybrid methodologies that combine traditional polling with candidatespecifc insights can improve prediction precision. Additionally, it emphasizes the signifcance of cross-disciplinary understanding and transparent decision-making in refning methodologies for predictive analytics using online data. Overall, the research highlights the need for a holistic approach in utilizing digital data, balancing technical profciency with ethical and contextual awareness.
The methodology employed in this study combines predictive modeling and data science techniques. The research integrates publicly available data, such as social media interactions and online content, to train models capable of forecasting electoral results. The approach is built on multiple original studies, each exploring different facets of election prediction. The robustness and practical utility of the predictions are assessed through real-world testing, involving the publication of forecasts prior to elections. In addition, the interpretability of the models is analyzed to understand whether the results align with political theories. Ethical considerations, such as privacy and data ownership, are also carefully examined throughout the study.
The key fndings of this study demonstrate the potential of using publicly available online data to forecast election outcomes. The forecasting models evolved signifcantly during the three election cycles studied (2015, 2019, and 2023). The fnal model integrates diverse data sources, including social media interactions, electoral history, and candidate attributes. Progressive improvements in accuracy were observed throughout the study, and the models eventually approached the precision of traditional polling methods. The study underscores the incremental benefts of incorporating diverse data types while addressing the challenges associated with data collection and feature selection. Although current models exhibit robust predictive capabilities, their practical applicability compared to opinion polls is limited. However, the results suggest that there is substantial promise for future enhancements.
The research advances the feld of election forecasting by introducing a methodology that leverages publicly accessible candidate data alongside social media insights, offering a candidate-level perspective on electoral predictions. This approach not only complements traditional macro-level methods, but also provides insights towards understanding the theoretical foundations of voting behavior. Although the potential of social media as a predictive tool is highlighted, the research acknowledges existing challenges such as bias, suggesting mitigation strategies, and underscoring the importance of domain knowledge in data-driven research. Practically, the study suggests that hybrid methodologies that combine traditional polling with candidatespecifc insights can improve prediction precision. Additionally, it emphasizes the signifcance of cross-disciplinary understanding and transparent decision-making in refning methodologies for predictive analytics using online data. Overall, the research highlights the need for a holistic approach in utilizing digital data, balancing technical profciency with ethical and contextual awareness.
Kokoelmat
- Väitöskirjat [2946]