Customer Lifetime Value Projection with K-means clustering
Varjus, Olivia (2022-10-19)
Customer Lifetime Value Projection with K-means clustering
Varjus, Olivia
(19.10.2022)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
suljettu
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2022102462922
https://urn.fi/URN:NBN:fi-fe2022102462922
Tiivistelmä
In recent years, the customers have become the most important asset of companies. To get the most value out of the customers, a way to calculate their Net Present Value (NPV) of their future profit has been created. This is called Customer Lifetime Value (CLV), which is the net profit of the customer during the relationship of the customer and the company.
In this thesis, the CLV is calculated for customer groups that are derived with the K-means clustering method. We review the concepts of CLV and K-means clustering with Cubic Clustering Criterion, Pseudo F Statistic and Pseudo T-Squared Statistic, from which the two first are used to determine the optimal number of clusters.
The data used in this thesis is collected from a telecommunications operator's 2018-2021 broadband customers. Before the analysis is started the data is transformed into the form that is needed in the K-means clustering and CLV calculation. One group is made manually of the customers with zero average monthly fees and the rest are divided with $K$-means. The clusters' CLV is calculated for 2018 and then the ratio of change for one and three years are calculated. To test prediction accuracy the 2019 customers are divided into the same groups that were obtained in 2018. The groups' CLVs are calculated for 2019 and then their projected 2020 CLVs and their real 2020 CLVs are derived. These are compared with each other to see if the predictions are accurate. After this is done the 2021 customers are divided to the same groups that were obtained in 2018 and their CLVs for one and three years forward are projected. Finally, suggestions to influence the future CLVs are presented with possible limitations. In addition, during all these steps the groups are interpreted comprehensively.
In this thesis, the CLV is calculated for customer groups that are derived with the K-means clustering method. We review the concepts of CLV and K-means clustering with Cubic Clustering Criterion, Pseudo F Statistic and Pseudo T-Squared Statistic, from which the two first are used to determine the optimal number of clusters.
The data used in this thesis is collected from a telecommunications operator's 2018-2021 broadband customers. Before the analysis is started the data is transformed into the form that is needed in the K-means clustering and CLV calculation. One group is made manually of the customers with zero average monthly fees and the rest are divided with $K$-means. The clusters' CLV is calculated for 2018 and then the ratio of change for one and three years are calculated. To test prediction accuracy the 2019 customers are divided into the same groups that were obtained in 2018. The groups' CLVs are calculated for 2019 and then their projected 2020 CLVs and their real 2020 CLVs are derived. These are compared with each other to see if the predictions are accurate. After this is done the 2021 customers are divided to the same groups that were obtained in 2018 and their CLVs for one and three years forward are projected. Finally, suggestions to influence the future CLVs are presented with possible limitations. In addition, during all these steps the groups are interpreted comprehensively.