Designing fair and compliant AI: Evaluating bias mitigation methods under data minimisation constraints A case study on the COMPAS dataset Information Systems Science Master's thesis Author: Johannes Bekkers Supervisors: Dr Emiel Caron Dr Farhan Ahmad 15.08.2025 Tilburg The originality of this thesis has been checked in accordance with the University of Turku quality assurance system using the Turnitin Originality Check service. Master's thesis Subject: Information Systems Science Author: Johannes Bekkers ANR: 901507 SNR: 2047005 Study right number: 2406937 Title: Designing fair and compliant AI: Evaluating bias mitigation methods under data minimisation constraints Supervisors: Dr Emiel Caron & Dr Farhan Ahmad Number of pages: 104 pages + appendices 42 pages Date: 15.08.2025 Abstract: The increasing integration of machine learning into high-stakes decision-making has intensified concerns about fairness, accountability, and compliance with data protection regulations. A central tension arises between the General Data Protection Regulation’s data minimisation principle, which restricts access to sensitive attributes such as race, and the practical need for such data in bias mitigation and fairness auditing. This study examines whether algorithmic fairness can be achieved when bias mitigation techniques are applied under data minimisation constraints. Using the publicly available COMPAS recidivism dataset as a case study, two interpretable and widely used classifiers, Logistic Regression and Random Forest, were trained on both a dataset that included race and a data-minimised version that excluded it from model training. Models were evaluated using a variety of well-established performance and fairness metrics. In the full-data setting, bias mitigation methods requiring sensitive attributes improved fairness while maintaining performance. In the data-minimised setting, baseline models exhibited higher fairness but slightly reduced predictive performance. Mitigation options were restricted to a narrow range of post-hoc interventions, with effectiveness concentrated in a single method. This setting nonetheless achieved stronger fairness improvements than the full-data case, despite the limited toolkit. The findings highlight a trade-off in which data minimisation can provide a fairer starting point but narrows the range and reliability of available bias mitigation techniques, creating dependence on specific interventions. These insights are relevant to policymakers and practitioners seeking to balance privacy compliance with robust algorithmic fairness. Key words: Machine Learning, Algorithmic Fairness, GDPR, Bias Mitigation, Data Minimisation, COMPAS, Ethical AI. TABLE OF CONTENTS 1 INTRODUCTION 12 1.1 Background 12 1.2 Problem statement 13 1.3 Research question 14 1.4 Research relevance 15 1.4.1 Academic relevance 15 1.4.2 Business relevance 16 1.5 Research methods 16 1.5.1 Research scope 17 1.6 Thesis outline 18 2 INTRODUCTION TO BIAS MITIGATION 20 2.1 The role of COMPAS in fairness research 21 2.2 Business understanding 22 2.3 Bias and fairness 23 2.4 Data minimisation and GDPR 25 2.4.1 Legal basis and purpose 26 2.4.2 Implementation challenges 26 2.4.3 Implications for ML development 27 2.4.4 Accountability 27 2.5 Bias mitigation techniques 28 2.5.1 Pre-processing techniques 28 2.5.2 In-processing techniques 29 2.5.3 Post-processing techniques 29 2.5.4 Summary of bias mitigation techniques 30 2.6 Classifiers 33 2.6.1 Logistic Regression 34 2.6.2 Decision Trees 34 2.6.3 Random Forests 34 2.6.4 Naive Bayes 35 2.6.5 Support Vector Machines 35 2.6.6 K-Nearest Neighbours 35 2.6.7 Compatibility with bias mitigation techniques 36 2.7 Deployment 38 2.8 Summary 38 3 METHODOLOGY 40 3.1 Research design 40 3.2 Dataset and preprocessing 40 3.3 Model selection 44 3.4 Applied bias mitigation methods 45 3.4.1 Pre-processing techniques 45 3.4.2 In-processing techniques 45 3.4.3 Post-processing techniques 47 3.5 Evaluation metrics 48 3.5.1 Performance metrics 48 3.5.2 Fairness metrics 49 3.6 Reproducibility and variability 50 3.7 Conceptual framework 51 3.8 Summary 52 4 EMPIRICAL RESULTS: FULL DATASET 53 4.1 Baseline model outcomes 53 4.1.1 Logistic Regression 53 4.1.2 Random Forest 57 4.2 Effects of bias mitigation 60 4.2.1 Logistic Regression 60 4.2.2 Random Forest 64 4.3 Comparative analysis of models 67 4.3.1 Baseline models 68 4.3.2 Effects of bias mitigation 68 4.3.3 Trade-offs and stability 68 4.4 Summary 69 5 EMPIRICAL RESULTS: DATA-MINIMISED DATASET 70 5.1 Baseline model outcomes 70 5.1.1 Logistic Regression 70 5.1.2 Random Forest 73 5.2 Effects of bias mitigation 76 5.2.1 Logistic Regression 76 5.2.2 Random Forest 79 5.3 Comparative analysis of models 82 5.3.1 Baseline models 83 5.3.2 Effects of bias mitigation 83 5.3.3 Trade-offs and stability 83 5.4 Summary 84 6 CROSS-DATASET COMPARISON 85 6.1 Comparative performance outcomes 86 6.2 Comparative fairness outcomes 86 6.3 Summary 87 7 DISCUSSION 89 7.1 Key findings 89 7.2 Implications 90 7.3 Limitations 91 7.4 Recommendations for future research 93 8 CONCLUSION 95 9 REFERENCES 97 APPENDICES 105 9.1 Analysis of COMPAS dataset features 105 9.2 Analysis of baseline LR model (full dataset) 107 9.3 Analysis of baseline RF model (full dataset) 111 9.4 Analysis of LR model with reweighting (full dataset) 114 9.5 Analysis of LR model with EGR (full dataset) 116 9.6 Analysis of LR model with threshold optimiser (full dataset) 118 9.7 Analysis of RF model with reweighting (full dataset) 120 9.8 Analysis of RF model with EGR (full dataset) 122 9.9 Analysis of RF model with threshold optimiser (full dataset) 124 9.10 Analysis of baseline LR model (data-minimised dataset) 126 9.11 Analysis of baseline RF model (data-minimised dataset) 129 9.12 Analysis of LR model with calibration (Platt scaling and isotonic regression, data-minimised dataset) 132 9.13 Analysis of LR with threshold optimiser (data-minimised dataset) 134 9.14 Analysis of RF model with calibration (Platt scaling and isotonic regression, data-minimised dataset) 136 9.15 Analysis of RF model with threshold optimiser (data-minimised dataset) 138 9.16 Analysis of LR model with reweighting, EGR, and threshold optimiser (full dataset) 140 9.17 Analysis of RF model with reweighting, EGR, and threshold optimiser (full dataset) 143 9.18 Statement on AI usage 146 LIST OF FIGURES Figure 1. ML models embedded within DSS and their surrounding influences 21 Figure 2. Racial distribution of defendants in the COMPAS dataset 41 Figure 3. Pearson correlation coefficients between race and non-race features in the COMPAS dataset 42 Figure 4. Mutual information scores between race and non-race features in the COMPAS dataset 43 Figure 5. Conceptual framework illustrating study design, influencing factors, and outcomes 52 Figure 6. ROC curve for baseline LR model (full dataset) 54 Figure 7. Predicted recidivism score distributions by race for baseline LR (full dataset) 56 Figure 8. ROC curve for baseline RF model (full dataset) 58 Figure 9. Predicted recidivism score distributions by race for baseline RF (full dataset) 60 Figure 10. ROC curve for baseline LR model (data-minimised dataset) 71 Figure 11. Predicted recidivism score distributions by race for baseline LR (data-minimised dataset) 73 Figure 12. ROC curve for baseline RF model (data-minimised dataset) 74 Figure 13. Predicted recidivism score distributions by race for baseline RF (data-minimised dataset) 76 LIST OF TABLES Table 1. Bias mitigation methods 30 Table 2. Compatibility of bias mitigation techniques with six prominent classifiers 37 Table 3. Overall performance metrics for the baseline LR (full dataset) 54 Table 4. Classification report for baseline LR (full dataset) 54 Table 5. Group-wise fairness metrics for baseline LR (full dataset) 55 Table 6. Fairness disparities for baseline LR (full dataset) 55 Table 7. Statistical significance and coefficient comparison for the LR model (full dataset) 56 Table 8. Overall performance metrics for baseline RF (full dataset) 58 Table 9. Classification report for baseline RF (full dataset) 58 Table 10. Group-wise fairness metrics and ROC AUC for baseline RF (full dataset) 59 Table 11. Fairness disparities for baseline RF (full dataset) 59 Table 12. Overall performance metrics for LR with reweighting (full dataset) 60 Table 13. Classification report for LR with reweighting (full dataset) 61 Table 14. Group-wise fairness metrics and ROC AUC for LR with reweighting (full dataset) 61 Table 15. Fairness disparities for LR with reweighting (full dataset) 61 Table 16. Overall performance metrics for LR with EGR (full dataset) 62 Table 17. Classification report for LR with EGR (full dataset) 62 Table 18. Group-wise fairness metrics and ROC AUC for LR with EGR (full dataset) 62 Table 19. Fairness disparities for LR with EGR (full dataset) 62 Table 20. Overall performance metrics for LR with threshold optimiser (full dataset) 63 Table 21. Classification report for LR with threshold optimiser (full dataset) 63 Table 22. Group-wise fairness metrics and ROC AUC for LR with threshold optimiser (full dataset) 63 Table 23. Fairness disparities for LR with threshold optimiser (full dataset) 63 Table 24. Overall performance metrics for RF with reweighting (full dataset) 64 Table 25. Classification report for RF with reweighting (full dataset) 64 Table 26. Group-wise fairness metrics and ROC AUC for RF with reweighting (full dataset) 64 Table 27. Fairness disparities for RF with reweighting (full dataset) 64 Table 28. Overall performance metrics for RF with EGR (full dataset) 65 Table 29. Classification report for RF with EGR (full dataset) 65 Table 30. Group-wise fairness metrics and ROC AUC for RF with EGR (full dataset) 65 Table 31. Fairness disparities for RF with EGR (full dataset) 66 Table 32. Overall performance metrics for RF with threshold optimiser (full dataset) 66 Table 33. Classification report for RF with threshold optimiser (full dataset) 66 Table 34. Group-wise fairness metrics and ROC AUC for RF with threshold optimiser (full dataset) 66 Table 35. Fairness disparities for RF with threshold optimiser (full dataset) 67 Table 36. Baseline and post-mitigation results for LR and RF (full dataset) 67 Table 37. Overall performance metrics for baseline LR (data-minimised dataset) 71 Table 38. Classification report for baseline LR (data-minimised dataset) 71 Table 39. Group-wise fairness metrics and ROC AUC for baseline LR (data-minimised dataset) 72 Table 40. Fairness disparities for baseline LR (data-minimised dataset) 72 Table 41. Overall performance metrics for baseline RF (data-minimised dataset) 74 Table 42. Classification report for baseline RF (data-minimised dataset) 74 Table 43. Group-wise fairnesss metrics and ROC AUC for baseline RF (data-minimised dataset) 75 Table 44. Fairness disparities for baseline RF (data-minimised dataset) 75 Table 45. Overall performance metrics for LR with isotonic regression (data-minimised dataset) 77 Table 46. Classification report for LR with isotonic regression (data-minimised dataset) 77 Table 47. Group-wise fairness metrics and ROC AUC for LR with isotonic regression (data-minimised dataset) 77 Table 48. Fairness disparities for LR with isotonic regression (data-minimised dataset) 77 Table 49. Overall performance metrics for LR with threshold optimiser (data-minimised dataset) 78 Table 50. Classification report for LR with threshold optimiser (data-minimised dataset) 78 Table 51. Group-wise fairness metrics and ROC AUC for LR with threshold optimiser (data-minimised dataset) 78 Table 52. Fairness disparities for LR with threshold optimiser (data-minimised dataset) 78 Table 53. Overall performance metrics for RF with Platt scaling (data-minimised dataset) 79 Table 54. Classification report for RF with Platt scaling (data-minimised dataset) 79 Table 55. Group-wise fairness metrics and ROC AUC for RF with Platt scaling (data-minimised dataset) 79 Table 56. Fairness disparities for RF with Platt scaling (data-minimised dataset) 80 Table 57. Overall performance metrics for RF with isotonic regression (data-minimised dataset) 80 Table 58. Classification report for RF with isotonic regression (data-minimised dataset) 80 Table 59. Group-wise fairness metrics and ROC AUC for RF with isotonic regression (data-minimised dataset) 80 Table 60. Fairness disparities for RF with isotonic regression (data-minimised dataset) 81 Table 61. Overall performance metrics for RF with threshold optimiser (data-minimised dataset) 81 Table 62. Classification report for RF with threshold optimiser (data-minimised dataset) 81 Table 63. Group-wise fairness metrics and ROC AUC for RF with threshold optimiser (data-minimised dataset) 81 Table 64. Fairness disparities for RF with threshold optimiser (data-minimised dataset) 82 Table 65. Baseline and post-mitigation results for LR and RF (data-minimised dataset) 82 Table 66. Baseline and post-mitigation results for LR and RF on both datasets 85 LIST OF ABBREVIATIONS AI Artificial Intelligence COMPAS Correctional Offender Management Profiling for Alternative Sanctions CRISP-DM Cross-Industry Standard Process for Data Mining DSS Decision Support Systems EGR Exponentiated Gradient Reduction EOD Equalised Odds Difference EOD-NAA Equalised Odds Difference excluding Native American and Asian FPR False Positive Rate GDPR General Data Protection Regulation KDE Kernel Density Estimates KNN K-Nearest Neighbours LR Logistic Regression ML Machine Learning RF Random Forest ROC AUC Receiving Operating Characteristic Area Under the Curve SR Selection Rate SVMs Support Vector Machines TPR True Positive Rate 12 1 Introduction 1.1 Background In recent years, the growing adoption of Artificial Intelligence (AI) across both public and private sectors has raised increasing concern about fairness, accountability and data protection. AI systems, often built using Machine Learning (ML), are being used to assist or automate decisions in high- stakes domains such as hiring, lending, healthcare and criminal justice. These systems promise enhanced efficiency and reduced operational costs (Davenport & Ronanki, 2018). As a result, global AI adoption has accelerated, driven by advances in algorithms, the growing availability of data, and increased computational power. According to McKinsey (2022), the share of organisations using AI in at least one business function rose from 20% in 2017 to 50% in 2022, with parallel increases in budget and capability investment. However, despite this growth, the mitigation of AI-related risks has not kept pace. A particular concern in this context is bias in ML, which refers to any systematic influence arising from algorithmic design, modelling assumptions, data characteristics, or human decisions that causes models to produce unfair, inaccurate, or non-generalisable outcomes (Holmberg et al., 2020). While bias is often rooted in historical data, it can also be introduced through modelling choices such as feature selection, objective functions, or optimisation techniques. As a result, fairness concerns may emerge even when the data appears neutral, making it essential to evaluate the entire development pipeline of AI systems. Fairness in this context refers to the absence of prejudice or favouritism toward individuals or groups based on inherent or acquired characteristics (Mehrabi et al., 2019). The risk posed by algorithmic bias becomes especially problematic when decisions affect diverse populations. Historical studies show that human judgement has always been prone to bias (Bertrand & Mullainathan, 2004; Martin, 2007), but biased AI can amplify these effects at scale (Gupta & Krishnan, 2020). For instance, biased facial recognition software and hiring tools have already caused reputational damage and legal consequences for corporations (Singer, 2018; EEOC, 2022; Reuters, 2023). As a result, there is increasing pressure on organisations to ensure fairness not only for ethical reasons but also to comply with legal and reputational expectations. In parallel with these developments, data protection regulations such as the European General Data Protection Regulation (GDPR) have introduced additional constraints on how data can be used in AI systems. One key principle, data minimisation, requires organisations to limit data collection and processing to what is strictly necessary for the intended purpose. However, in fairness auditing and 13 mitigation, access to sensitive attributes such as race, gender or age is often crucial. This presents a regulatory dilemma: enforcing fairness often requires using data that privacy law aims to restrict. This thesis explores whether algorithmic fairness can still be effectively achieved when ML systems are subject to data minimisation constraints. Using the well-known COMPAS dataset as a case study, the research focuses on supervised ML classifiers, which are algorithms that learn from labelled data to predict discrete outcomes. In this study, Logistic Regression (LR) and Random Forest (RF) are selected as proxy models for replicating complex or black-box decision systems. Logistic Regression offers high interpretability (Caraciolo, 2011), while Random Forest, an ensemble method, provides robust performance (Breiman, 2011). Both models are widely used and accessible, making them suitable for real-world deployment in both public and corporate contexts. By comparing bias mitigation outcomes across these models both before and after removing sensitive attributes, this study seeks to assess the trade-offs between fairness, performance, and regulatory compliance. The findings aim to inform not only academic debates on algorithmic fairness, but also provide practical guidance for businesses seeking to deploy responsible AI within legal constraints. 1.2 Problem statement As outlined above, organisations increasingly rely on algorithmic systems to support or automate high-stakes decision-making. One prominent example is the use of ML models in criminal justice to predict recidivism risk, such as in the COMPAS system widely used in the United States. However, these systems have raised serious concerns about fairness and discrimination, especially when outcomes disproportionately affect certain demographic groups. While bias mitigation techniques have been developed to promote algorithmic fairness, data protection frameworks such as the GDPR impose constraints like data minimisation, which may limit access to the sensitive attributes often needed for such mitigation. This raises a practical and regulatory dilemma between fairness and compliance. Therefore, this study seeks to examine whether it is possible to apply bias mitigation techniques effectively in contexts where sensitive data must be removed. The goal is to understand how both fairness and predictive performance are affected when applying bias mitigation techniques before and after data minimisation, while also considering the transparency and interpretability of the models used. This study uses interpretable proxy models: Logistic Regression and Random Forest, trained on the COMPAS dataset. 14 The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset contains real-world data from the U.S. criminal justice system. It was originally published by ProPublica as part of an investigation into potential racial bias in algorithmic risk assessments (Angwin et al., 2016). The dataset includes demographic variables (such as race, age, and gender), criminal history, and the outcome of whether a person reoffended within two years. These characteristics make it a widely used benchmark in algorithmic fairness research and well-suited for studying the effects of bias mitigation and data minimisation in practice. Hence, the problem statement for this study is: Evaluate whether algorithmic fairness can still be achieved when bias mitigation techniques are applied to supervised machine learning classifiers that operate under data minimisation constraints. 1.3 Research question The research question to address this problem statement is: How effective are different bias mitigation methods across multiple supervised machine learning classifiers when access to sensitive data is restricted due to data minimisation? To answer the main research question, the sub-questions are formulated below. Together, they help evaluate the effects of bias mitigation under data minimisation across different supervised ML classifiers. Theoretical: SQ1: How are bias and fairness defined in the context of supervised machine learning classifiers, and what are the most commonly used fairness metrics? Answered through a literature review on algorithmic fairness and fairness metrics relevant to group- level evaluation (e.g. demographic parity, equalised odds). SQ2: Which bias mitigation techniques are applicable to supervised machine learning classifiers, and how are they typically categorised (pre-processing, in-processing, post-processing)? Answered through a review and tabulation of existing bias mitigation methods. 15 Model experimentation: SQ3: How do different supervised classifiers (Logistic Regression, Random Forest) perform in terms of fairness and performance before applying bias mitigation? Answered by training and testing each model on the full dataset and evaluating baseline fairness and performance metrics. SQ4: What is the impact of applying selected bias mitigation methods on the fairness and performance of each classifier? Answered by applying bias mitigation techniques and evaluating their effects on both fairness and performance metrics. Compliance and constraint evaluation: SQ5: To what extent does the removal of sensitive attributes, in line with data minimisation principles, impact classifier fairness and performance, and how effective are bias mitigation techniques that do not rely on sensitive attribute access (e.g., calibration, regularisation) in addressing these disparities? Answered by training and evaluating models on the data-minimised dataset, then applying and evaluating bias mitigation techniques applicable under these constraints, and comparing these results against the full-data and sensitive-attribute-aware mitigation conditions. 1.4 Research relevance 1.4.1 Academic relevance Algorithmic fairness and responsible artificial intelligence are prominent topics in academic research. While various studies have proposed methods for identifying and reducing bias in ML systems, relatively few have evaluated these methods under data minimisation constraints. Many fairness interventions rely on access to sensitive attributes, yet privacy laws may prohibit the use of these attributes in real-world deployment. This thesis contributes to the academic discourse by empirically testing bias mitigation methods across multiple supervised learning models in both unrestricted and restricted data settings. It provides a structured comparison of performance and fairness outcomes when models are trained without access to sensitive demographic information. Additionally, by focusing on interpretable 16 models and using the COMPAS dataset as a case study, this thesis offers a grounded and reproducible contribution to ML research. The COMPAS dataset contains real-world risk assessment data from the U.S. criminal justice system. It includes demographic information (such as race, gender, and age), criminal history, and whether an individual reoffended within two years. Its public availability and frequent use in fairness research make it a reliable benchmark for evaluating algorithmic bias, enabling replication and comparison across studies. 1.4.2 Business relevance As ML becomes increasingly embedded in organisational decision-making, companies face growing pressure to ensure that these systems are both fair and legally compliant. From recruitment and credit assessments to fraud detection and customer analytics, algorithmic bias presents reputational, financial, and legal risks. At the same time, organisations operating within the European Union are required to comply with data protection regulations such as the GDPR, which includes the principle of data minimisation. This principle restricts the collection and use of personal data to what is strictly necessary, often limiting access to sensitive attributes like race, gender, or age. These attributes are frequently used in fairness interventions. This situation creates a practical challenge: organisations must ensure fairness and transparency in their algorithmic systems while limiting the use of the data that is often required to audit and mitigate bias. By exploring the effectiveness of bias mitigation techniques when sensitive data is unavailable, this research addresses a relevant and timely business problem. It aims to provide practical insights for organisations that wish to develop and deploy ML models that are fair, interpretable, and compliant with privacy regulations. 1.5 Research methods This study adopts a mixed-methods approach, combining a literature review and an empirical experimental design, to evaluate the effectiveness of bias mitigation techniques under data minimisation constraints. The study follows the CRISP-DM framework, which provides a systematic structure for data preparation, model training, bias mitigation, and evaluation (Schröer et al., 2021). A detailed description of the methodology is provided in Chapter 3. The research methods were tailored to address each sub-question (SQ) and, collectively, the main research question. 17 Theoretical understanding (SQ1 & SQ2): These questions were addressed through a comprehensive literature review. SQ1 defined bias and fairness in the context of supervised machine learning classifiers and identified relevant group-level fairness metrics (e.g., demographic parity, equalised odds). SQ2 categorised bias mitigation techniques into pre-processing, in-processing, and post- processing approaches. Together, these findings established the theoretical foundation for the empirical investigation. Model experimentation (SQ3, SQ4 & SQ5): These questions were addressed through an experimental design using two supervised classifiers, Logistic Regression and Random Forest, applied to the COMPAS dataset. Models were trained and evaluated on both a full version (retaining all features, including race) and a data-minimised version in which race was removed from the training features to reflect GDPR data minimisation requirements. For the full dataset, explicit bias mitigation techniques were applied, including pre-processing (reweighing), in-processing (Exponentiated Gradient Reduction (EGR)), and post-processing (threshold optimiser). For the data-minimised dataset, where direct access to race was restricted, the analysis focused on inherent bias and the effectiveness of mitigation techniques that do not rely on sensitive attributes (e.g., calibration and regularisation). All models were assessed using key fairness and performance metrics to determine the impact of classifier choice and data minimisation constraints. 1.5.1 Research scope This study focuses on evaluating the effectiveness of bias mitigation techniques in the context of supervised machine learning classifiers, specifically under data minimisation constraints as outlined by the GDPR. The research is limited to two widely used model types: Logistic Regression and Random Forest. These models are used as proxy classifiers to approximate decision-making systems similar to proprietary or black-box algorithms employed in real-world settings. The COMPAS dataset is selected as the case study due to its relevance in fairness-related research and the presence of sensitive demographic features such as race, gender, and age. The scope of bias mitigation includes methods applied across different classifiers, with an investigation into both: • Sensitive-attribute-aware methods: These methods (e.g., reweighing, EGR, and threshold optimiser) require access to sensitive attributes to function, representing a scenario where such data is available for direct fairness intervention. 18 • Unaware methods: These methods (e.g., calibration and regularisation) do not rely on explicit access to sensitive attributes, representing a scenario where data minimisation is strictly enforced. Data minimisation is operationalised in this study by removing or excluding sensitive features (e.g., race, age, gender) from the training and prediction process, in order to simulate realistic GDPR- compliant scenarios. Only structured data from the COMPAS dataset is used; unstructured data and external datasets are not considered. The study does not aim to develop new bias mitigation methods but rather to evaluate existing ones under constrained data conditions. It also does not attempt to audit the original COMPAS algorithm, but rather to explore how similar decision systems may behave under fairness interventions. The findings are intended to inform both academic understanding and practical decision-making in organisational contexts that rely on automated classification systems while adhering to data protection regulations. 1.6 Thesis outline This thesis is structured as follows. Chapter 2 introduces the key theoretical concepts relevant to this study. It defines algorithmic bias, fairness, and data minimisation, and explores their intersections in the context of supervised ML. The chapter also includes a structured overview of existing bias mitigation methods, introduces the concept and types of classifiers, and discusses GDPR’s implications for ML models. Chapter 3 presents the detailed research methodology. It expands on the approach summarised in Chapter 1, explaining the research design, data preparation, evaluation metrics, and experimental setup in full. It also justifies the specific selection of classifiers, bias mitigation techniques, and tools used in the analysis. Chapter 4 presents the experimental results from applying bias mitigation methods across different supervised classifiers using the full dataset. This chapter evaluates how effective these methods are when sensitive attributes are available. Chapter 5 applies data minimisation by removing sensitive attributes from the training data and retraining the models. It re-evaluates model performance and fairness after applying bias mitigation methods suitable for these conditions, and includes a comparative analysis of models within this data- minimised context. 19 Chapter 6 provides a direct comparative analysis of the empirical findings from both the full and data- minimised datasets. It quantifies the impact of sensitive attribute removal on baseline model performance and assesses the comparative effectiveness of the applied bias mitigation strategies across these two data conditions, addressing the core research question regarding the necessity of sensitive attributes for achieving fairness. Chapter 7 discusses the results of the study. It highlights the key findings, considers their implications for both practice and theory, and examines the limitations that may have influenced the results. The chapter concludes by outlining suggestions for future research that could build on or address gaps identified in this study. Chapter 8 summarises the answers to the research questions, outlines the study’s contributions, highlights its limitations, and offers recommendations for both practitioners and future research. Several chapters refer to the Appendices, where implementation code is provided. All references are included in the text. 20 2 Introduction to bias mitigation Organisations across various sectors, from criminal justice to finance and healthcare, rely on decision- making processes and decision support systems (DSS) to navigate complex environments and optimise outcomes (Turban et al., 2005). Effective decision-making is vital, and the quality of these decisions hinges on several critical properties, including accuracy, transparency, and crucially, unbiasedness and fairness (European Commission, 2019). Historically, human biases and organisational factors such as data availability, regulatory pressures, and competing goals have significantly influenced decision-making quality (Arnott, 2006). In recent decades, the integration of AI and ML systems has profoundly transformed DSS, offering unprecedented capabilities for data analysis and predictive modelling. This transformation, however, has also introduced new forms of bias and unfairness, which can arise at scale and in less transparent ways (Barocas & Selbst, 2016; Goddard et al., 2011). These systems are powerful examples of modern DSS, designed to enhance efficiency and objectivity, but while often perceived as neutral tools, ML algorithms can inadvertently reproduce or even intensify structural inequalities and human biases when trained on historical data, developed with flawed assumptions, or deployed without adequate oversight (Angwin et al., 2016; Noble, 2018). This challenge has brought fairness and unbiasedness to the forefront of academic and regulatory debates concerning algorithmic decision-making. At the same time, data protection and privacy concerns are escalating. Regulations like the GDPR mandate principles such as data minimisation, which can directly conflict with efforts to mitigate bias because fairness interventions often require access to sensitive attributes to measure and correct disparities (Hardt et al., 2016; Tran & Fioretto, 2023). This chapter provides the theoretical foundation for the thesis by first establishing the context of quality decision-making and the role of AI/ML. It then explores the importance of early problem framing, including how organisations define fairness and bias in relation to their objectives, and how stakeholder engagement shapes these definitions. Next, it examines definitions and categories of bias and fairness in ML, reviews key mitigation techniques, and considers how data minimisation under the GDPR constrains fairness interventions. The chapter also discusses widely used classifiers and the challenges of deploying models in production environments, including the need for continuous monitoring to maintain fairness and reliability. 21 To support this discussion, Figure 1 outlines the broader environment in which ML models operate when integrated into DSS. It illustrates how these models are embedded within organisational, legal, and societal structures, shaped by constraints such as the GDPR, fairness expectations, business goals, data limitations, and historical patterns. The figure also shows how model outputs result in decisions, which in turn have real-world consequences for the people affected. This context sets the stage for the theoretical concepts explored in the remainder of the chapter. Figure 1. ML models embedded within DSS and their surrounding influences 2.1 The role of COMPAS in fairness research The COMPAS dataset is one of the most frequently used case studies in algorithmic fairness research. It was originally published by ProPublica as part of a 2016 investigative report that examined potential racial bias in algorithmic risk assessments within the U.S. criminal justice system (Angwin et al., 2016). The COMPAS algorithm itself is proprietary and not publicly available, meaning independent evaluations have relied on the risk score outputs included in the published dataset. The dataset includes demographic variables such as race, sex, and age, as well as criminal history and a 22 binary outcome indicating whether a defendant reoffended within two years. These features make it well suited for evaluating both model performance and fairness interventions. ProPublica’s analysis concluded that the COMPAS algorithm was racially biased. Specifically, it found that black defendants were almost twice as likely to be falsely labelled as high risk, while white defendants were more likely to be incorrectly classified as low risk. In response, the company that developed COMPAS, Northpointe, argued that the tool was fair because it was calibrated. Calibration, in this context, means that within each risk score category, the observed rate of recidivism was approximately equal across racial groups. This public disagreement revealed a critical tension in the field of algorithmic fairness: it is possible for a model to be calibrated while still producing unequal error rates across groups. This conflict became a catalyst for formal research into fairness metrics. The main issue identified by ProPublica, disparities in false positive and false negative rates between racial groups, directly led to the development of group fairness metrics such as equalised odds and equal opportunity (Hardt et al., 2016). These metrics require that models exhibit similar error rates across protected groups. At the same time, subsequent work demonstrated that, under certain conditions, it is mathematically impossible for a classifier to satisfy both calibration and equalised odds simultaneously if base rates differ between groups. This result is now widely referred to as the impossibility theorem of fairness (Chouldechova, 2017; Kleinberg et al., 2017). These findings had a significant impact on the field. The COMPAS controversy did not simply highlight the existence of bias in a specific system; it also shifted the conversation from general concerns about fairness to a structured, mathematical framework for evaluating and comparing fairness criteria. The case remains foundational in the literature and is often cited as the moment when algorithmic fairness became a rigorous and formalised area of study (Barocas & Selbst, 2016). In addition to its historical significance, the COMPAS dataset continues to serve as a standard benchmark in machine learning fairness research. Its availability and relevance to high-stakes decision-making make it useful for assessing both fairness-aware modelling techniques and the effects of legal constraints such as data minimisation. 2.2 Business understanding To understand the different phases involved in bias mitigation, this theoretical background is guided by the CRISP-DM framework, a widely adopted standard for organising ML workflows (Chapman 23 et al., 2000; Schröer et al., 2021). CRISP-DM defines the sequence of phases as Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment. The first of these, Business Understanding, is particularly critical in the context of fairness-aware ML. The success of any machine learning project depends heavily on how the problem is initially framed. Before any technical work begins, organisations must determine what constitutes fairness and bias within the context of their decision-making objectives. These definitions are shaped by business priorities, stakeholder expectations, and legal or regulatory obligations (Holstein et al., 2023). Failing to establish a shared understanding can lead to models that optimise for narrow technical goals while perpetuating harm or undermining trust. Fairness is rarely a purely technical construct; it is influenced by organisational values, the needs of affected communities, and broader societal norms. As Barocas, Hardt, and Narayanan (2019) emphasise, fairness is a socio-technical challenge that requires careful consideration of the application context, stakeholder values, and potential societal impacts. Different definitions of fairness exist because the concept is complex and context-dependent, and these decisions must be agreed upon before technical solutions are sought. Partnership on AI (2024) further argues that engaging diverse stakeholders, particularly those from marginalised communities, is essential for identifying and mitigating biases, foreseeing risks, and ensuring AI systems are equitable and relevant. Early choices about which outcomes to optimise, what data to collect, and which groups to prioritise will significantly shape the success of later mitigation strategies. Many problems that appear downstream, such as biased training data or unsuitable fairness metrics, often originate from incomplete or inconsistent problem definitions at this stage. 2.3 Bias and fairness Bias in ML refers to any systematic influence arising from algorithmic design, modelling assumptions, data characteristics, or human decisions that causes models to produce unfair, inaccurate, or non-generalisable outcomes (Holmberg et al., 2020). Such bias is particularly concerning in high-stakes contexts, where unequal treatment of individuals or groups can result in social, ethical, and legal harm. Consequently, the concept of fairness is almost always intrinsically linked to people, with the goal of preventing harm to human groups defined by protected attributes (Mehrabi et al., 2019). Fairness in ML is thus a normative goal aimed at counteracting these biases. Although there is no single universally accepted definition, a widely cited description frames fairness 24 as the absence of prejudice or favouritism toward an individual or group based on their inherent or acquired characteristics (Mehrabi et al., 2019). A wide range of biases have been identified in the literature. Although there is no unified taxonomy, eight recurring types of bias have been noted across scholarly work: • Social bias arises from existing inequities embedded in society. When models are trained on data that reflect historical or societal inequalities, those patterns may be reinforced. For instance, underrepresentation of women in leadership roles can result in skewed search engine outputs unless actively corrected (Mehrabi et al., 2019; Olteanu et al., 2019; Zarya, 2018; Suresh & Guttag, 2018). • Representation bias occurs when the training data is not representative of the broader population. This may be due to outdated data collection, biased sampling, or limited data availability. As a result, certain groups may be over- or underrepresented, reducing model accuracy for those populations (Baer, 2019; Lan et al., 2010). • Measurement bias refers to the use of flawed features or labels that do not accurately reflect the intended variables. Protected attributes such as race or gender may be directly or indirectly encoded in other variables, producing discriminatory outcomes even if explicitly excluded from the dataset (Mullainathan & Obermeyer, 2017; Corbett-Davies et al., 2023; van Giffen et al., 2022; Angwin et al., 2016). • Label bias emerges when training examples are assigned incorrect or culturally specific labels that deviate from their true classifications. For example, image classification models may reflect Western assumptions about events like weddings, leading to systematic mislabelling of culturally distinct examples (Barocas & Selbst, 2016; Baer, 2019). • Technical bias is introduced through design decisions, such as how abstract concepts are formalised into code or how hyperparameters are chosen. This includes statistical assumptions embedded in models that systematically favour certain outputs over others (Friedman & Nissenbaum, 1996). • Evaluation bias arises when benchmark datasets used to assess model performance are themselves unrepresentative. As a result, a model may perform well in testing but fail in real-world contexts where the data distribution is more diverse (Suresh et al., 2018; Suresh & Guttag, 2021). 25 • Deployment bias occurs when models are used outside the scope of their original design. For example, predictive models built to assess risk may instead be applied to determine sentencing length, despite lacking validation for such use (Chouldechova, 2017; Collins, 2018). • Feedback bias happens when a model’s predictions influence the future data on which it is retrained. This creates self-reinforcing cycles in applications such as recommendation systems, where certain content is repeatedly shown and thus increasingly clicked, regardless of actual user preference (Bellamy et al., 2018; Baeza-Yates, 2018). In the CRISP-DM framework, the Data Understanding phase provides an opportunity to evaluate the likelihood of biases being present in the dataset before modelling begins (Schröer et al., 2021). While not all forms of bias originate from the data itself, several types, such as representation, measurement, and label bias, can be identified at this stage. To assess fairness in light of these biases, scholars have proposed a range of formal fairness metrics. These include demographic parity, which checks whether different groups receive positive predictions at equal rates, and equalised odds, which requires similar false positive and false negative rates across groups (Hardt et al., 2016). However, these metrics are often mathematically incompatible, and no single metric fully captures the multidimensional nature of fairness (Binns, 2018). Although the primary focus of this thesis is on fairness, a model’s performance remains an important baseline measure in machine learning. Maintaining adequate predictive performance ensures that fairness interventions do not render models ineffective in practice. Standard performance metrics such as accuracy score, F1 score, and ROC AUC (Receiver Operating Area Under the Curve) are therefore used as comparative baselines in later chapters, even though the main research objective is to reduce bias and improve fairness (Caruana & Niculescu-Mizil, 2006). 2.4 Data minimisation and GDPR The increasing adoption of ML across domains such as healthcare, finance, and public administration has raised concerns about data protection and privacy. Within this context, the principle of data minimisation has gained prominence as a safeguard against the misuse of personal data. It is a core feature of the GDPR and influences both the design and operation of ML systems (Article 5(1)(c)). 26 2.4.1 Legal basis and purpose The GDPR outlines data minimisation in Article 5(1)(c)1, which states that personal data must be