-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Problem Description
As indicated in this issue, some users have found that applying a min/max scaling significantly improved the synthetic data quality.
However, the RDT library currently does not offer min/max scaling. It only offers the GaussianNormalizer(which uses the z-score), and ClusterBasedNormalizer which uses Bayesian GMMs.
Expected behavior
Min/max scaling will need to learn the min and max values during the fit stage. When transforming, it will take the entire distribution and transform it into the range [0,1] by using the formula: (value - min)/(max - min). Finally, the reverse transform will expand values back into the original [min, max] range, ensuring that out-of-bounds values are clipped.
Additional context
This is a tracking issue. The exact API (incl transformer name, parameters, etc.) still need to be figured out.