-
Notifications
You must be signed in to change notification settings - Fork 326
Description
TL;DR, ideas, thoughts, insights, about multi-table solutions using the CTGAN model? Yay or nay?
Hi,
Let me start of by saying how much I enjoy this repository. You truly managed to make the CTGAN model easily digestible, both in your paper, and in the implemented code.
I am wondering; is there ongoing research for multi-table synthetic data GAN based solutions (e.g. extending the CTGAN to be hierarchical, which Hazy supposedly can make, ref). Or is it not worth exploring it?
If it is not worth exploring multi-table CTGAN, could someone offer me some insight as to why? Does it have to do with difficulties capturing long-term primary-foreign key relations? Maintaining referential integrity? Model complexity? Are Gaussian Copulas just the better alternative for encoding the statistic properties of table relations?
I understand that CTGAN is designed to be conditional on discrete columns during training, for one table. But could one not extend the model to e.g. sample the latent space noise vector
Nevertheless, I think synthetic data is a very interesting area of research, and I'm eager to read anyone's opinions, insights, or comments on the questions which I pose above.
Regards,