Skip to content

Allow the ReferentialIntegrity metric to work with composite keys #838

@npatki

Description

@npatki

Problem Description

In sdv-dev/SDV#2778, we are allowing for composite keys to be specified in the metadata. Although SDV Community won't be able to model composite keys, SDV Enterprise will have this capability. Therefore, a user might have real and synthetic data with composite keys.

In this case, the ReferentialIntegrity metric should be able to work with composite keys.

Expected behavior

Currently, this metric takes in a tuples of pd.Series objects that represent the primary and foreign keys. Instead of pd.Series, the metric should actually take in pd.DataFrame objects. This way, the user would be able to pass in more than one column.

(If there is a singular column, then it's easy enough to create a dataframe with only 1 column.)

from sdmetrics.column_pairs import ReferentialIntegrity

# current behavior: pass in pd.Series objects
ReferentialIntegrity.compute(
    real_data=(real_parent['primary_key'], real_child['foreign_key']),
    synthetic_data=(synthetic_parent['primary_key'], synthetic_child['foreign_key']))

# expected behavior: pass in pd.DataFrame objects
ReferentialIntegrity.compute(
    real_data=(real_parent[['primary_key']], real_child[['foreign_key']]),
    synthetic_data=(synthetic_parent[['primary_key']], synthetic_child[['foreign_key']]))

# this should work on composite keys too
ReferentialIntegrity.compute(
    real_data=(real_parent[['col1', 'col2']], real_child[['col1', 'col2']]),
    synthetic_data=(synthetic_parent[['col1', 'col2']], synthetic_child[['col1', 'col2']]))

For composite keys, the referential integrity should be based on all columns in the foreign key matching all columns in the primary key.

Additional context

This is a blocker for supporting composite keys in the reports. See #835.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions