[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322

MarcelMatsal · 2025-10-03T20:03:50Z

Description

This code introduces functionality for injecting spurious correlations (SSTI from our paper) into huggingface datasets. This will allow us to see how these correlations affect the pretraining of models. It will be expanded to other modalities. Currently only textual injections are possible but soon will add functionality to add injections into image data.

Added an additional dependency of "termcolor" to the dataset additional dependencies

Checklist

I have read the Contributing document.
The documentation is up-to-date with the changes I made (check build artifacts).
All tests passed, and additional code has been covered with new tests.
I have added the PR to the RELEASES.rst file.

huguesva

thanks a lot @MarcelMatsal , are there some plans to include examples of pretraining or finetuning of LLMs later in the library? @RandallBalestriero don't hesitate to review as well if you have time since you know more than me

stable_pretraining/data/spurious_corr/.DS_Store

MarcelMatsal · 2025-10-17T17:20:30Z

@huguesva Our current plan was to finetune/pretrain some VLMs like CLIP with spurious data and I could include those examples in this library. I could definitely also include some finetuning examples of pure LLMs down the line or we could fully do the pretaining step

huguesva · 2025-10-19T16:53:32Z

Thanks @MarcelMatsal ! @RandallBalestriero do you validate this PR ? thanks

[email protected] and others added 6 commits October 3, 2025 15:53

spurious correlation package

8347b74

fixed the unit tests

ed30672

updateing package for spurious corr visualization

9728b31

update release.rst

762cc85

updated punctuation in release.rst

801a121

Merge branch 'main' into spurious-correlation

6a58a59

MarcelMatsal changed the title ~~[WIP] Package to inject Spurious Correlation(s) in huggingface datasets~~ Package to inject Spurious Correlation(s) in huggingface datasets Oct 4, 2025

MarcelMatsal changed the title ~~Package to inject Spurious Correlation(s) in huggingface datasets~~ [Feature] Package to inject Spurious Correlation(s) in huggingface datasets Oct 4, 2025

[email protected] and others added 5 commits October 13, 2025 16:02

implementing suggestions and comments from Randall

609b3f1

Merge branch 'main' into spurious-correlation

fb9a430

minor bug fixed from reformatting

e7e5d40

fixing import errors

4a5bdd9

removed the files for spurious text to huggingface

65a203e

huguesva reviewed Oct 17, 2025

View reviewed changes

stable_pretraining/data/spurious_corr/.DS_Store Outdated Show resolved Hide resolved

[email protected] and others added 2 commits October 17, 2025 12:57

removed ds.store

a3371a9

Merge branch 'main' into spurious-correlation

fa62bc1

huguesva assigned RandallBalestriero Oct 19, 2025

[email protected] and others added 11 commits October 22, 2025 15:36

refactoring to make the code cleaner, updated tests as well

ed0a4e4

Merge branch 'main' into spurious-correlation

2ae1b46

updating tests

9e52fb2

updating tests

e66b276

updating tests

3d3c9e5

updating tests

d74bddd

further refactored code to make them all transformations

35013db

changed some parameter names

eeabb83

trying to fix one of two errors

65fdf42

updated the test

f6b0ee8

updating deterministic injectoin

dc658e9

[email protected] and others added 10 commits October 23, 2025 15:24

minor update

039896c

hopefully final fix

3bbe8bb

please fix

fc2ca12

final fix

1b44162

included the spurious vision transforms

ec4156e

small update to the releast.rst

f20c568

Merge branch 'main' into spurious-correlation

a547960

Merge branch 'main' into spurious-correlation

462da69

fixing random error from pulling

be6d10b

Merge branch 'main' into spurious-correlation

bea5c27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322

[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322

Uh oh!

MarcelMatsal commented Oct 3, 2025 •

edited

Loading

Uh oh!

huguesva left a comment

Uh oh!

Uh oh!

MarcelMatsal commented Oct 17, 2025

Uh oh!

huguesva commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322

Are you sure you want to change the base?

[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322

Uh oh!

Conversation

MarcelMatsal commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

huguesva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarcelMatsal commented Oct 17, 2025

Uh oh!

huguesva commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MarcelMatsal commented Oct 3, 2025 •

edited

Loading