Methods for Generating and Evaluating Synthetic Longitudinal Patient Data: A Systematic Review

dc.contributor.authorPerkonoja, Katariina
dc.contributor.authorAuranen, Kari
dc.contributor.authorVirta, Joni
dc.contributor.organizationfi=kliininen laitos|en=Department of Clinical Medicine|
dc.contributor.organizationfi=terveysteknologia|en=Health Technology|
dc.contributor.organizationfi=tilastotiede|en=Statistics|
dc.contributor.organization-code1.2.246.10.2458963.20.28696315432
dc.contributor.organization-code1.2.246.10.2458963.20.42133013740
dc.converis.publication-id505614547
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/505614547
dc.date.accessioned2026-01-21T14:33:48Z
dc.date.available2026-01-21T14:33:48Z
dc.description.abstractThe rapid growth in data availability has facilitated research and development, yet not all industries have benefited equally due to legal and privacy constraints. The healthcare sector faces significant challenges in utilizing patient data because of concerns about data security and confidentiality. To address this, various privacy-preserving methods, including synthetic data generation, have been proposed. Synthetic data replicate existing data as closely as possible, acting as a proxy for sensitive information. While patient data are often longitudinal, this aspect remains underrepresented in existing reviews of synthetic data generation in healthcare. This paper maps and describes methods for generating and evaluating synthetic longitudinal patient data in real-life settings through a systematic literature review, conducted following the PRISMA guidelines and incorporating data from five databases up to May 2024. Thirty-nine methods were identified, with four addressing all key challenges in longitudinal patient data generation: preserving temporal structure, heterogeneous variable types, missing values, and unbalanced data. Most studies assessed resemblance to real data, the majority evaluated utility, and just over half examined privacy. However, only a minority considered all three aspects together. While four methods addressed the key challenges in generating synthetic longitudinal patient data, none incorporated privacy-preserving mechanisms. Additionally, their effectiveness with small sample sizes remains unclear, raising concerns about their real-world applicability. The lack of standardized evaluation criteria further complicates comparison. Future research should focus on developing privacy-preserving methods, robust evaluation frameworks, and ensuring publicly accessible code. Clearer directives from data protection authorities are needed, as synthetic patient data availability lags behind method development.
dc.identifier.eissn2509-498X
dc.identifier.jour-issn2509-4971
dc.identifier.olddbid213404
dc.identifier.oldhandle10024/196422
dc.identifier.urihttps://www.utupub.fi/handle/11111/55323
dc.identifier.urlhttps://doi.org/10.1007/s41666-025-00223-7
dc.identifier.urnURN:NBN:fi-fe202601216541
dc.language.isoen
dc.okm.affiliatedauthorPerkonoja, Katariina
dc.okm.affiliatedauthorAuranen, Kari
dc.okm.affiliatedauthorVirta, Joni
dc.okm.affiliatedauthorDataimport, Kliinisen laitoksen yhteiset
dc.okm.discipline112 Statistics and probabilityen_GB
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline3141 Health care scienceen_GB
dc.okm.discipline112 Tilastotiedefi_FI
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline3141 Terveystiedefi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA2 Scientific Article
dc.publisherSpringer Science and Business Media LLC
dc.publisher.countrySwitzerlanden_GB
dc.publisher.countrySveitsifi_FI
dc.publisher.country-codeCH
dc.relation.doi10.1007/s41666-025-00223-7
dc.relation.ispartofjournalJournal of Healthcare Informatics Research
dc.source.identifierhttps://www.utupub.fi/handle/10024/196422
dc.titleMethods for Generating and Evaluating Synthetic Longitudinal Patient Data: A Systematic Review
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s41666-025-00223-7.pdf
Size:
3.76 MB
Format:
Adobe Portable Document Format