Data-Driven Equity Selection with PCA and Minimum Variance Optimization
Valtonen, Samuli (2025-12-17)
Data-Driven Equity Selection with PCA and Minimum Variance Optimization
Valtonen, Samuli
(17.12.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe20251222123591
https://urn.fi/URN:NBN:fi-fe20251222123591
Tiivistelmä
The topic of this thesis is the development of a quantitative equity strategy that combines Principal Component Analysis (PCA) with a minimum-variance portfolio. The purpose of the study is to examine whether a strategy based on rolling PCA-based stock selection and variance-minimizing allocation can provide a more efficient and adaptive alternative to a traditional market-capitalization-weighted index. The thesis also investigates how PCA-derived factors can be transformed into scoring rules that guide investment decisions and how the strategy performs in different market conditions.
The literature review presents the theoretical foundations of financial markets, including the efficient market hypothesis, the random walk model, pricing theories and behavioral finance. The theoretical framework further discusses the basics of principal component analysis and Markowitz’s portfolio theory in more detail, as well as earlier research on the use of PCA in stock selection and risk management.
The empirical part of the study is conducted using daily data on S&P 500 constituents for the period 2010–2024. The strategy applies rolling one-year windows, from which PCA-based selection scores are constructed and the weights of a minimum-variance portfolio are solved. The performance of the strategy is evaluated over the full sample period using several return and risk measures. The results show that, over the sample period, the PCA-based strategy clearly outperforms the S&P 500 index in terms of cumulative return, achieves higher risk-adjusted returns and exhibits smaller maximum drawdowns. However, the excess return is clearly regime dependent and concentrated in certain periods, and rolling alpha and the Sharpe ratio are not consistently positive. Therefore, the results cannot be generalized to other markets or time periods without further evidence.
The literature review presents the theoretical foundations of financial markets, including the efficient market hypothesis, the random walk model, pricing theories and behavioral finance. The theoretical framework further discusses the basics of principal component analysis and Markowitz’s portfolio theory in more detail, as well as earlier research on the use of PCA in stock selection and risk management.
The empirical part of the study is conducted using daily data on S&P 500 constituents for the period 2010–2024. The strategy applies rolling one-year windows, from which PCA-based selection scores are constructed and the weights of a minimum-variance portfolio are solved. The performance of the strategy is evaluated over the full sample period using several return and risk measures. The results show that, over the sample period, the PCA-based strategy clearly outperforms the S&P 500 index in terms of cumulative return, achieves higher risk-adjusted returns and exhibits smaller maximum drawdowns. However, the excess return is clearly regime dependent and concentrated in certain periods, and rolling alpha and the Sharpe ratio are not consistently positive. Therefore, the results cannot be generalized to other markets or time periods without further evidence.
