Extracting geopolitical risk indices from news data with sentiment analysis
Pikkarainen, Aleksi (2025-06-20)
Extracting geopolitical risk indices from news data with sentiment analysis
Pikkarainen, Aleksi
(20.06.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025062674519
https://urn.fi/URN:NBN:fi-fe2025062674519
Tiivistelmä
Geopolitical risk has been measured and quantified all the way since the 1950s. Over time, technologies to model it have evolved and various methods have been employed to construct geopolitical risk indices. These indices give insight to ongoing conflicts, trade wars and trade agreements between nations. Lately, utilizing natural language processing to create real-time indices from various internet data sources has been a rising trend. In today’s constantly changing world and overabundance of information acquiring critical information through these indices is extremely beneficial for those in business and governmental decision making.
This thesis explored geopolitical risk indices through constructing a pipeline that utilizes various natural language processing methods. I researched what kind of elements this pipeline requires, what are the most efficient ways to extract information and how the end result can be shaped into a time series index. Mission Grey’s News Dataset was used to both act as a source for the indices and to provide annotatable raw data for used information extraction methods.
The results showed that using a transformer-based text classifier could sufficiently categorize news data based on geopolitical context. Named entity recognition was used in combination to detect which countries these news articles discuss. A fine-tuned sentiment analysis model was an efficient way to extract polarity from chosen articles. This extracted polarity was transformed into indices using four different methods. From these methods, the total news ratio -method showed performance close to the state-of-the-art geopolitical index.
This thesis explored geopolitical risk indices through constructing a pipeline that utilizes various natural language processing methods. I researched what kind of elements this pipeline requires, what are the most efficient ways to extract information and how the end result can be shaped into a time series index. Mission Grey’s News Dataset was used to both act as a source for the indices and to provide annotatable raw data for used information extraction methods.
The results showed that using a transformer-based text classifier could sufficiently categorize news data based on geopolitical context. Named entity recognition was used in combination to detect which countries these news articles discuss. A fine-tuned sentiment analysis model was an efficient way to extract polarity from chosen articles. This extracted polarity was transformed into indices using four different methods. From these methods, the total news ratio -method showed performance close to the state-of-the-art geopolitical index.