Empirical evaluation of amplifying privacy by subsampling for GANs to create differentially private synthetic tabular data

Nieminen Valtteri A.; Pahikkala Tapio; Airola Antti

Empirical evaluation of amplifying privacy by subsampling for GANs to create differentially private synthetic tabular data

dc.contributor.author	Nieminen Valtteri A.
dc.contributor.author	Pahikkala Tapio
dc.contributor.author	Airola Antti
dc.contributor.organization	fi=data-analytiikka\|en=Data-analytiikka\|
dc.contributor.organization	fi=terveysteknologia\|en=Health Technology\|
dc.contributor.organization-code	1.2.246.10.2458963.20.28696315432
dc.contributor.organization-code	1.2.246.10.2458963.20.68940835793
dc.converis.publication-id	181712336
dc.converis.url	https://research.utu.fi/converis/portal/Publication/181712336
dc.date.accessioned	2025-08-28T01:25:41Z
dc.date.available	2025-08-28T01:25:41Z
dc.description.abstract	<p>Privacy concerns often limit sharing sensitive data collected from individuals. One proposed solution to make secondary use possible is privacy-preserving synthetic data that attempts to mimic real data. Due to their success on non-private tasks, GAN networks trained with differentially private stochastic gradient descent (DPSGD) have been popular for generating DP synthetic data. In recent years, a prominent approach to achieving better privacy guarantees has been to train ensembles of discriminator networks with DPSDG on mutually exclusive subsets to obtain better differential privacy guarantees by taking advantage of the synergy between GANs and privacy amplification by subsampling. However, this research has been done almost exclusively on images, and empirical evaluations of this strategy on other types of data are lacking. This work focuses on the effects of subsampling in creating DP synthetic tabular data with GANs. We evaluate synthetic data utility by training classification models on synthetic- and testing on real data at varying subsampling rates. Further, we complement the evaluation with a qualitative examination of the generated data. Our findings show that while subsampling does bring benefits with tabular data in terms of the prediction performance for classifiers trained on synthetic data, the resulting samples can be very distorted compared to original real data. The results suggest that the benefits obtainable via this method of training DP GAN can differ significantly based on the type of data used.</p>
dc.format.pagerange	81
dc.identifier.issn	1613-0073
dc.identifier.olddbid	207539
dc.identifier.oldhandle	10024/190566
dc.identifier.uri	https://www.utupub.fi/handle/11111/52269
dc.identifier.url	https://ceur-ws.org/Vol-3506/
dc.identifier.urn	URN:NBN:fi-fe2025082787704
dc.language.iso	en
dc.okm.affiliatedauthor	Nieminen, Valtteri
dc.okm.affiliatedauthor	Pahikkala, Tapio
dc.okm.affiliatedauthor	Airola, Antti
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.publisher.country	Germany	en_GB
dc.publisher.country	Saksa	fi_FI
dc.publisher.country-code	DE
dc.relation.conference	Annual Symposium for Computer Science
dc.relation.ispartofjournal	CEUR Workshop Proceedings
dc.relation.ispartofseries	CEUR Workshop Proceedings
dc.relation.volume	3506
dc.source.identifier	https://www.utupub.fi/handle/10024/190566
dc.title	Empirical evaluation of amplifying privacy by subsampling for GANs to create differentially private synthetic tabular data
dc.title.book	TKTP 2023: Annual Symposium for Computer Science 2023: Proceedings of the 40th Anniversary Symposium of the Finnish Society for Computer Science
dc.year.issued	2023

Tiedostot

Näytetään 1 - 1 / 1

Name:: paper06.pdf
Size:: 2.6 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet