Skip to content

For sequential data, allow me to add an Inequality constraint between a context and non-context column #2637

@npatki

Description

@npatki

This feedback was first noted in #2618.

Problem Description

I have a dataset that contains information about different patients' visits to the hospital. In this dataset, the patient_id is the sequence key, and the patient_birthdate and patient_sex are context columns (they do not vary per patient). The remainder of the columns are sequential columns that vary based on every visit a patient makes to the hospital.

patient_id patient_birthdate patient_sex visit_date weight ...
p_1934 2009-02-12 M 2025-01-29 169
p_1934 2009-02-12 M 2025-04-08 174
p_1934 2009-02-12 M 2025-07-23 171
p_1210 1995-06-15 F 2025-02-19 135
p_1210 1995-06-15 F 2025-05-02 128

Based on this data, I would like to input a constraint to ensure that the patient_birthdate <= visit_date for every single row of the table. Unfortunately, I am unable to do this right now because PARSynthesizer doesn't support constraints between contextual and non-contextual columns.

Expected behavior

Allow me to add an Inequality constraint where one of the columns is a context column and the other is a non-context columns. I expect to be able to apply this just like any other constraint to the PARSynthesizer.

from sdv.cag import Inequality
from sdv.sequential import PARSynthesizer

my_constraint = Inequality(
  low_column_name='patient_birthdate',
  high_column_name='visit_date'
)

synthesizer = PARSynthesizer(metadata, context_columns=['patient_birthdate', 'patient_sex'])
synthesizer.add_constraints([my_constraint])
synthesizer.fit(data)
synthetic_data = synthesizer.sample(num_sequences=2)

Workarounds

Until this is fixed, it's possible to fix this via a custom constraints. However, since a custom constraint is not support on PARSynthesizer at the moment, you'd have to use the pre- and post-processing outside of the synthesizer as a workaround.

my_constraint = MyCustomInequalityConstraint()

# allow the constraint to transform the input data and metadata
new_data = my_constraint.transform(data)
new_metadata = my_constraint.get_updated_metadata(metadata)

synth = PARSynthesizer(new_metadata, epochs=2, context_columns=['patient_birthday', 'patient_sex'])
synth.fit(new_data)
synthetic_data = synth.sample(2)

# allow the constraint to reverse transform the outputted synthetic data
post_synthetic_data = my_constraint.reverse_transform(synthetic_data)

Note that MyCustomInequalityConstraint here would do the following:

  • On the transform, modify the visit_date to represent the # of days after the birthdate instead
  • On the reverse transform, recalculate the actual visit date

Additional context

If we can do this for Inequality, we should also be able to support the related constraints:

  • Range
  • ChainedInequality

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions