Anomymization in the context of generative AI : Aligning computer science and legal standards
Romano, Melanie (2025-07-31)
Anomymization in the context of generative AI : Aligning computer science and legal standards
Romano, Melanie
(31.07.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025090193741
https://urn.fi/URN:NBN:fi-fe2025090193741
Tiivistelmä
This thesis investigates anonymization in the age of generative artificial intelligence (AI), with a focus on aligning technical approaches from computer science with legal standards, particularly European data protection law. Adopting a multidisciplinary framework, the study explores the evolving notion of personal data, the theoretical and practical mechanisms of anonymization, and the challenges posed by generative AI systems at various levels (including training data, models, inputs, and outputs).
Special attention is paid to the potential of synthetic data generation as a privacy-preserving technique and to differential privacy as a semantic privacy model. The work critically examines whether outputs produced by generative AI can themselves constitute personal data, and to what extent anonymization methods remain effective against modern reidentification attacks. Legal uncertainties surrounding the definition and sufficiency of anonymization, especially in light of the GDPR, the AI Act, and opinions by regulatory bodies like the EDPB, are highlighted.
Ultimately, this study contributes to bridging the gap between legal doctrine and technical realities, offering insights into the strengths, limitations, and necessary evolution of anonymization practices in data-intensive AI systems.
Special attention is paid to the potential of synthetic data generation as a privacy-preserving technique and to differential privacy as a semantic privacy model. The work critically examines whether outputs produced by generative AI can themselves constitute personal data, and to what extent anonymization methods remain effective against modern reidentification attacks. Legal uncertainties surrounding the definition and sufficiency of anonymization, especially in light of the GDPR, the AI Act, and opinions by regulatory bodies like the EDPB, are highlighted.
Ultimately, this study contributes to bridging the gap between legal doctrine and technical realities, offering insights into the strengths, limitations, and necessary evolution of anonymization practices in data-intensive AI systems.