Skip to content

Avoid generating the conditional column #292

@saart

Description

@saart

Environment details

  • CTGAN version: 0.7.1 (latest)
  • Python version: 3.10.11
  • Operating System: Mac/Unix

Problem description

I want to generate data conditionally, but I don't want to include the conditioned column in the output of the generator.

What I already tried

Currently, I just trim this column from the output.
Intuitively, it creates a big waste everywhere: the network is bigger (thus slower), and the model size is bigger.

Example:

Data that holds two columns: hospital name and patient's age.
Let's assume that there are 100 different hospitals, and my sole use of the generative model is to generate new rows for a given hospital.
Currently, the model will create 101 input features: 100 one-hot features (for hospital names) and one continuous feature (for age).

Metadata

Metadata

Assignees

No one assigned

    Labels

    pending reviewThis issue needs to be further reviewed, so work cannot be startedquestionGeneral question about the software

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions