-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Problem Description
In sdv-dev/SDV#2778, we are allowing for composite keys to be specified in the metadata. Although SDV Community won't be able to model composite keys, SDV Enterprise will have this capability. Therefore, a user might have real and synthetic data with composite keys.
In this case, the CardinalityBoundaryAdherence metric should be able to work with composite keys.
Expected behavior
Currently, this metric takes in a tuples of pd.Series objects that represent the primary and foreign keys. Instead of pd.Series, the metric should actually take in pd.DataFrame objects. This way, the user would be able to pass in more than one column.
(If there is a singular column, then it's easy enough to create a dataframe with only 1 column.)
from sdmetrics.column_pairs import CardinalityBoundaryAdherence
# current behavior: pass in tuples of pd.Series object
CardinalityBoundaryAdherence.compute(
real_data=(real_parent['primary_key'], real_child['foreign_key']),
synthetic_data=(synthetic_parent['primary_key'], synthetic_child['foreign_key']))
# expected behavior: pass in pd.DataFrame objects
CardinalityBoundaryAdherence.compute(
real_data=(real_parent[['primary_key']], real_child[['foreign_key']]),
synthetic_data=(synthetic_parent[['primary_key']], synthetic_child[['foreign_key']]))
# the expected behavior then also supports composite keys
CardinalityBoundaryAdherence.compute(
real_data=(real_parent[['col1', 'col2']], real_child[['col1', 'col2']]),
synthetic_data=(synthetic_parent[['col1', 'col2']], synthetic_child[['col1', 'col2']]))Additional context
This is a blocker for supporting composite keys in the reports. See #835.