Skip to content

Feedback and Questions Regarding Parsynthesizer #2618

@Abreu6

Description

@Abreu6

Hello,

As part of my master thesis on synthetic data generation for healthcare, we tested and evaluated SDV’s ParSynthesizer on different segments of a clinical dataset.

Our main conclusions and findings were:
-> The model is easy to configure and performs efficiently on smaller data samples.
-> However, the generated data showed some consistency and quality issues, particularly in preserving correlations between fields and capturing more subtle patterns from the original dataset
-> In particular, it struggled to preserve correlations between fields and missed subtle but important patterns present in the original data. One example we observed was the occurrence of diagnostic exams dated before the corresponding medical consultations
or the presence of medical consultations dated after the recorded date of death.
-> Additionally, due to its full in-memory processing design, we faced scalability issues when attempting to synthesize larger datasets, even after filtering and simplifying the input.

Could you please share with us any suggestions for improving results, such as advanced configurations, alternative preprocessing steps, or known limitations to consider, we would be happy to incorporate them into our testing.

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionGeneral question about the softwareunder discussionIssue is currently being discussed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions