Evaluating hypothesis tests on differentially private histogram-based synthetic data

dc.contributor.authorBöhmeke, Jan
dc.contributor.departmentfi=Tietotekniikan laitos|en=Department of Computing|
dc.contributor.facultyfi=Teknillinen tiedekunta|en=Faculty of Technology|
dc.contributor.studysubjectfi=Tietojenkäsittelytieteet|en=Computer Science|
dc.date.accessioned2024-06-29T21:01:28Z
dc.date.available2024-06-29T21:01:28Z
dc.date.issued2024-06-27
dc.description.abstractSharing synthetic data that preserves privacy has been suggested as an option for releasing sensitive data without compromising individuals’ privacy. The synthetic data should maintain the structure and statistical characteristics of the original data, while ensuring individuals privacy. Differential privacy (DP) effectively assures privacy concerns, while preserving structure and characteristics of the original data. Objectives of this research is to evaluate Students T-test and Mann-Whitney U test empirically to verify if those tests are prone to result in loss of tests validity or decreased power. Empirically demonstrating this is done in terms of Type I and Type II errors. I evaluate the statistical hypothesis tests on sets of additively smoothed DP synthetic data generated from sets of original data. The original data sets are simulated questionnaire data (n=20 000) following 5-point Likert Scale and 10-point Likert Scale and Kaggle Cardiovascular Dataset (n=70 000). The validity of tests was preserved for all privacy budget values (0.001 ≤ ϵ ≤ 100) and sampled dataset sizes (50,100,500,1000) for all data. The power of the tests was considerably reduced in all cases.
dc.format.extent63
dc.identifier.olddbid195678
dc.identifier.oldhandle10024/178730
dc.identifier.urihttps://www.utupub.fi/handle/11111/19608
dc.identifier.urnURN:NBN:fi-fe2024062859806
dc.language.isoeng
dc.rightsfi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|
dc.rights.accessrightsavoin
dc.source.identifierhttps://www.utupub.fi/handle/10024/178730
dc.subjectDifferential Privacy, Synthetic Data, Exponential Mechanism
dc.titleEvaluating hypothesis tests on differentially private histogram-based synthetic data
dc.type.ontasotfi=Pro gradu -tutkielma|en=Master's thesis|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Evaluating_hypothesis_tests_on_differentially_private_histogram_based_synthetic_data_.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format