Skip to content

Conversation

@fealho
Copy link
Member

@fealho fealho commented Jun 17, 2025

CU-86b5ayqd6, Resolve #772

@sdv-team
Copy link
Contributor

@codecov
Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 98.89503% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95.73%. Comparing base (48e7ee5) to head (9de8f92).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sdmetrics/single_table/equalized_odds.py 98.37% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #775      +/-   ##
==========================================
+ Coverage   95.64%   95.73%   +0.09%     
==========================================
  Files         115      117       +2     
  Lines        4590     4736     +146     
==========================================
+ Hits         4390     4534     +144     
- Misses        200      202       +2     
Flag Coverage Δ
integration 80.91% <93.37%> (+0.54%) ⬆️
unit 84.14% <86.74%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fealho fealho force-pushed the issue-772-equalized-odds branch 3 times, most recently from 85aadc3 to 6e3fbca Compare June 23, 2025 16:07
@fealho fealho force-pushed the issue-772-equalized-odds branch from 6e3fbca to 482328d Compare June 23, 2025 16:46
@fealho fealho marked this pull request as ready for review June 23, 2025 16:47
@fealho fealho requested a review from a team as a code owner June 23, 2025 16:47
@fealho fealho requested review from R-Palazzo and frances-h and removed request for a team June 23, 2025 16:47
Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

Could you add a integration where the sensitive_column_value is np.nan

Comment on lines +53 to +66
for is_sensitive_group in [True, False]:
group_predictions = prediction_binary[sensitive_binary == is_sensitive_group]
group_name = 'sensitive' if is_sensitive_group else 'non-sensitive'

if len(group_predictions) == 0:
raise ValueError(f'No data found for {group_name} group.')

positive_count = group_predictions.sum()
negative_count = len(group_predictions) - positive_count

if positive_count < 5 or negative_count < 5:
raise ValueError(
f'Insufficient data for {group_name} group: {positive_count} positive, '
f'{negative_count} negative examples (need ≥5 each).'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a for loop here since we are counting both positive and negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There are two groups (1) data that matches the sensitive value and (2) data that doesn't. For both cases we need at least 5 True target values and 5 False target values.

Comment on lines 88 to 90
data[sensitive_column_name] = (
data[sensitive_column_name] == sensitive_column_value
).astype(int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if the sensitive_column_value is np.nan?

Copy link
Member Author

@fealho fealho Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question. If the column is categorical the nans are just another category, since the column dtype is object, so the user should pass 'nan', 'None', or whatever the string representation they are using for missing values is.

It won't reach the line of code you are referring if the user passes np.nan, since the data validation before this will complain np.nan is not present in the data. I added a test showing this.

If the data is numerical it would indeed crash. I added new logic to handle it both in these lines of code and the validation.

@fealho fealho requested a review from R-Palazzo June 25, 2025 03:35
Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@fealho fealho merged commit ac5450a into main Jun 26, 2025
57 checks passed
@fealho fealho deleted the issue-772-equalized-odds branch June 26, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a fairness metric that computes Equalized Odds

5 participants