Skip to content

[compat] Resolve compatibility issues with Datasets & Sentence Transformers#614

Merged
tomaarsen merged 1 commit intohuggingface:mainfrom
tomaarsen:compat/datasets_sentence_transformers
Aug 5, 2025
Merged

[compat] Resolve compatibility issues with Datasets & Sentence Transformers#614
tomaarsen merged 1 commit intohuggingface:mainfrom
tomaarsen:compat/datasets_sentence_transformers

Conversation

@tomaarsen
Copy link
Member

Hello!

Pull Request overview

  • Resolve compatibility issues with Datasets & Sentence Transformers

Details

Notably, datasets v4.0 is now tricky as dataset["column_name"] no longer returns a list, but instead a Column object. In this PR, I'm casting to a list explicitly more often to turn this Column into a list.

Beyond that, SentenceTransformer.encode is now wrapped in inference_mode. This seems problematic in some training setups in SetFit, so I'm wrapping the predict and evaluate in no_grad to remove those errors.

  • Tom Aarsen

@tomaarsen tomaarsen requested a review from Copilot August 5, 2025 11:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR resolves compatibility issues with newer versions of the Datasets and Sentence Transformers libraries. The changes ensure backward compatibility by explicitly converting dataset columns to lists and wrapping prediction/evaluation methods with torch.no_grad() to handle inference_mode conflicts.

  • Explicitly convert dataset columns to lists to handle Datasets v4.0 changes where columns return Column objects instead of lists
  • Wrap prediction and evaluation methods with torch.no_grad() to resolve conflicts with SentenceTransformer's new inference_mode wrapper
  • Apply consistent list conversion across training and prediction workflows

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/setfit/trainer_distillation.py Add list conversion for embeddings input and wrap teacher model prediction with torch.no_grad()
src/setfit/trainer.py Add torch.no_grad() decorator to evaluate method
src/setfit/modeling.py Add list conversions for training data and inputs, plus torch.no_grad() decorators for prediction methods

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tomaarsen
Copy link
Member Author

There's numerous test failures due to rate limits when downloading from the HF Hub, that I won't be able to easily get around. The tests do pass locally.
There's also failures for ONNX in the CI, which I suspect might be real failures, but I don't have time dedicated to trying to resolve those right now. Users can use the SentenceTransformer ONNX/OV support for the underlying model. It'll be a bit hacky, but should work well.

  • Tom Aarsen

@tomaarsen tomaarsen merged commit a285a70 into huggingface:main Aug 5, 2025
2 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants