Skip to content

Any ongoing research for multi-table CTGAN solutions? #268

@wilhelmagren

Description

@wilhelmagren

TL;DR, ideas, thoughts, insights, about multi-table solutions using the CTGAN model? Yay or nay?


Hi,

Let me start of by saying how much I enjoy this repository. You truly managed to make the CTGAN model easily digestible, both in your paper, and in the implemented code.

I am wondering; is there ongoing research for multi-table synthetic data GAN based solutions (e.g. extending the CTGAN to be hierarchical, which Hazy supposedly can make, ref). Or is it not worth exploring it?

If it is not worth exploring multi-table CTGAN, could someone offer me some insight as to why? Does it have to do with difficulties capturing long-term primary-foreign key relations? Maintaining referential integrity? Model complexity? Are Gaussian Copulas just the better alternative for encoding the statistic properties of table relations?

I understand that CTGAN is designed to be conditional on discrete columns during training, for one table. But could one not extend the model to e.g. sample the latent space noise vector $z \sim \mathcal{N}(\mu_r, \sigma_r)$ from a prior distribution based on related table statistics $\mu_r$ and $\sigma_r$ aggregated over all the columns? This way you would, again, condition your prior on information that is relevant to the table being synthesized.

Nevertheless, I think synthetic data is a very interesting area of research, and I'm eager to read anyone's opinions, insights, or comments on the questions which I pose above.

Regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    pending reviewThis issue needs to be further reviewed, so work cannot be startedquestionGeneral question about the software

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions