[compat] Resolve compatibility issues with Datasets & Sentence Transformers#614
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR resolves compatibility issues with newer versions of the Datasets and Sentence Transformers libraries. The changes ensure backward compatibility by explicitly converting dataset columns to lists and wrapping prediction/evaluation methods with torch.no_grad() to handle inference_mode conflicts.
- Explicitly convert dataset columns to lists to handle Datasets v4.0 changes where columns return Column objects instead of lists
- Wrap prediction and evaluation methods with
torch.no_grad()to resolve conflicts with SentenceTransformer's newinference_modewrapper - Apply consistent list conversion across training and prediction workflows
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/setfit/trainer_distillation.py | Add list conversion for embeddings input and wrap teacher model prediction with torch.no_grad() |
| src/setfit/trainer.py | Add torch.no_grad() decorator to evaluate method |
| src/setfit/modeling.py | Add list conversions for training data and inputs, plus torch.no_grad() decorators for prediction methods |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
There's numerous test failures due to rate limits when downloading from the HF Hub, that I won't be able to easily get around. The tests do pass locally.
|
Hello!
Pull Request overview
Details
Notably,
datasetsv4.0 is now tricky asdataset["column_name"]no longer returns a list, but instead a Column object. In this PR, I'm casting to a list explicitly more often to turn this Column into a list.Beyond that,
SentenceTransformer.encodeis now wrapped ininference_mode. This seems problematic in some training setups in SetFit, so I'm wrapping thepredictandevaluateinno_gradto remove those errors.