-
Notifications
You must be signed in to change notification settings - Fork 31.8k
More V5 pipeline cleanup #43325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
More V5 pipeline cleanup #43325
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Just commenting before more other langs docs might get broken:
|
de69c78 to
1373e63
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: afmoe, aimv2, albert, align, altclip, audio_spectrogram_transformer, autoformer, bamba, bart |
|
@vasqu I think this should be ready for review now! I chased down the references in the other language docs too |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43325&sha=3571cb |
vasqu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup 🧹 mostly nits, but I think we need to take another look at the tests to properly rename some stuff there
| `Text2TextGenerationPipeline`, as well as the related `SummarizationPipeline` and `TranslationPipeline`, were deprecated and will now be removed. The | ||
| `question-answering` pipeline has also been removed. `pipeline` classes are intended as a high-level beginner-friendly API, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `Text2TextGenerationPipeline`, as well as the related `SummarizationPipeline` and `TranslationPipeline`, were deprecated and will now be removed. The | |
| `question-answering` pipeline has also been removed. `pipeline` classes are intended as a high-level beginner-friendly API, | |
| `question-answering` and `Text2TextGenerationPipeline`, including its related `SummarizationPipeline` and `TranslationPipeline`, were deprecated and will now be removed. `pipeline` classes are intended as a high-level beginner-friendly API, |
More of a nit, the first 2 sentences just don't read super well
| Similarly, the `image-to-text` pipeline has been removed. This pipeline was used for early image captioning models, but these | ||
| no longer offer competitive performance. Instead, for image captioning tasks we recommend using a modern vision-language chat model | ||
| via the `image-text-to-text` pipeline. For example: | ||
| The above example can be adapted for translation or question answering simply by changing the prompt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The above example can be adapted for translation or question answering simply by changing the prompt. | |
| The above example can be adapted for other tasks, e.g. translation or question answering, simply by changing the prompt. |
|
|
||
| ### Other changes | ||
|
|
||
| - The `feature-extraction` pipeline has now been renamed to `text-embedding` and the `image-feature-extraction` pipeline has been renamed to `image-embedding`. The older names are still usable as aliases, so this should not impact your existing code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to mark that these aliases won't be forever (we should deprecate them later on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we make changes on the Hub side, people will forever have feature-extraction models they'll want to run.
| "FillMaskPipeline", | ||
| "ImageClassificationPipeline", | ||
| "ImageFeatureExtractionPipeline", | ||
| "ImageEmbeddingPipeline", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to build on top of #42564 for getting modality-specific embeddings, just as a note (to myself)
| "impl": TextEmbeddingPipeline, | ||
| "pt": (AutoModel,) if is_torch_available() else (), | ||
| "default": {"model": ("distilbert/distilbert-base-cased", "6ea8117")}, | ||
| "type": "multimodal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was wrong?
| @slow | ||
| def test_base_mask_filling(self): | ||
| pbase = pipeline(task="fill-mask", model="facebook/bart-base") | ||
| src_text = [" I went to the <mask>."] | ||
| results = [x["token_str"] for x in pbase(src_text)] | ||
| assert " bathroom" in results | ||
|
|
||
| @slow | ||
| def test_large_mask_filling(self): | ||
| plarge = pipeline(task="fill-mask", model="facebook/bart-large") | ||
| src_text = [" I went to the <mask>."] | ||
| results = [x["token_str"] for x in plarge(src_text)] | ||
| expected_results = [" bathroom", " gym", " wrong", " movies", " hospital"] | ||
| self.assertListEqual(results, expected_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are essentially integration tests, can we rewrite those instead of removing
| @is_pipeline_test | ||
| def test_pipeline_fill_mask(self): | ||
| self.run_task_tests(task="fill-mask") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only seeing tests removed but no renames / additions for the new naming? E.g. self.run_task_tests(task="text-embedding") should exist, no?
| "document-question-answering": {"test": DocumentQuestionAnsweringPipelineTests}, | ||
| "feature-extraction": {"test": FeatureExtractionPipelineTests}, | ||
| "fill-mask": {"test": FillMaskPipelineTests}, | ||
| "text-embedding": {"test": FeatureExtractionPipelineTests}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We should also rename FeatureExtractionPipelineTests (for image as well)
More V5 pipeline cleanup, followup to #43256 and #43306:
feature-extractionrenamed totext-embedding(keeping the old name as an alias)image-feature-extractionrenamed toimage-embedding(keeping the old name as an alias)question-answeringandvisual-question-answeringremovedfill-maskremovedtext-generationandimage-text-to-textmodels