Evaluating hypothesis tests on differentially private histogram-based synthetic data
| dc.contributor.author | Böhmeke, Jan | |
| dc.contributor.department | fi=Tietotekniikan laitos|en=Department of Computing| | |
| dc.contributor.faculty | fi=Teknillinen tiedekunta|en=Faculty of Technology| | |
| dc.contributor.studysubject | fi=Tietojenkäsittelytieteet|en=Computer Science| | |
| dc.date.accessioned | 2024-06-29T21:01:28Z | |
| dc.date.available | 2024-06-29T21:01:28Z | |
| dc.date.issued | 2024-06-27 | |
| dc.description.abstract | Sharing synthetic data that preserves privacy has been suggested as an option for releasing sensitive data without compromising individuals’ privacy. The synthetic data should maintain the structure and statistical characteristics of the original data, while ensuring individuals privacy. Differential privacy (DP) effectively assures privacy concerns, while preserving structure and characteristics of the original data. Objectives of this research is to evaluate Students T-test and Mann-Whitney U test empirically to verify if those tests are prone to result in loss of tests validity or decreased power. Empirically demonstrating this is done in terms of Type I and Type II errors. I evaluate the statistical hypothesis tests on sets of additively smoothed DP synthetic data generated from sets of original data. The original data sets are simulated questionnaire data (n=20 000) following 5-point Likert Scale and 10-point Likert Scale and Kaggle Cardiovascular Dataset (n=70 000). The validity of tests was preserved for all privacy budget values (0.001 ≤ ϵ ≤ 100) and sampled dataset sizes (50,100,500,1000) for all data. The power of the tests was considerably reduced in all cases. | |
| dc.format.extent | 63 | |
| dc.identifier.olddbid | 195678 | |
| dc.identifier.oldhandle | 10024/178730 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/19608 | |
| dc.identifier.urn | URN:NBN:fi-fe2024062859806 | |
| dc.language.iso | eng | |
| dc.rights | fi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.| | |
| dc.rights.accessrights | avoin | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/178730 | |
| dc.subject | Differential Privacy, Synthetic Data, Exponential Mechanism | |
| dc.title | Evaluating hypothesis tests on differentially private histogram-based synthetic data | |
| dc.type.ontasot | fi=Pro gradu -tutkielma|en=Master's thesis| |
Tiedostot
1 - 1 / 1
Ladataan...
- Name:
- Evaluating_hypothesis_tests_on_differentially_private_histogram_based_synthetic_data_.pdf
- Size:
- 1.5 MB
- Format:
- Adobe Portable Document Format