Feedback and Questions Regarding Parsynthesizer

Hello,

As part of my master thesis on synthetic data generation for healthcare, we tested and evaluated SDV’s ParSynthesizer on different segments of a clinical dataset.

Our main conclusions and findings were:
-> The model is easy to configure and performs efficiently on smaller data samples.
-> However, the generated data showed some consistency and quality issues, particularly in preserving correlations between fields and capturing more subtle patterns from the original dataset
-> In particular, it struggled to preserve correlations between fields and missed subtle but important patterns present in the original data. One example we observed was the occurrence of diagnostic exams dated before the corresponding medical consultations
or the presence of medical consultations dated after the recorded date of death.
-> Additionally, due to its full in-memory processing design, we faced scalability issues when attempting to synthesize larger datasets, even after filtering and simplifying the input.

Could you please share with us any suggestions for improving results, such as advanced configurations, alternative preprocessing steps, or known limitations to consider, we would be happy to incorporate them into our testing.

Best regards.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feedback and Questions Regarding Parsynthesizer #2618

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feedback and Questions Regarding Parsynthesizer #2618

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions