-
Notifications
You must be signed in to change notification settings - Fork 18
[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Feature] Package to inject Spurious Correlation(s) in huggingface datasets #322
Conversation
huguesva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @MarcelMatsal , are there some plans to include examples of pretraining or finetuning of LLMs later in the library? @RandallBalestriero don't hesitate to review as well if you have time since you know more than me
|
@huguesva Our current plan was to finetune/pretrain some VLMs like CLIP with spurious data and I could include those examples in this library. I could definitely also include some finetuning examples of pure LLMs down the line or we could fully do the pretaining step |
|
Thanks @MarcelMatsal ! @RandallBalestriero do you validate this PR ? thanks |
Description
This code introduces functionality for injecting spurious correlations (SSTI from our paper) into huggingface datasets. This will allow us to see how these correlations affect the pretraining of models. It will be expanded to other modalities. Currently only textual injections are possible but soon will add functionality to add injections into image data.
Added an additional dependency of "termcolor" to the dataset additional dependencies
Checklist