Skip to content

Conversation

szabosteve
Copy link
Contributor

@szabosteve szabosteve commented Sep 11, 2024

Overview

This PR expands the Tokenization properties section on the PUT trained models API doc page and the Infer trained model API doc page with the DeBERTa v2 tokenizer reference docs. The updates also effect the Inference processor reference docs.

Preview

PUT trained models - Tokenization properties
Infer trained model
Inference processor

@szabosteve szabosteve added >docs General docs changes :ml Machine learning Team:Docs Meta label for docs team v8.16.0 labels Sep 11, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Sep 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

Copy link
Contributor

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Istvan!

overall, I think that the deberta_v2 tokenizer is usable for any ML task, but it doesn't appear in all of the tasks in https://elasticsearch_bk_112752.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/inference-processor.html#inference-processor-fill-mask-opt

I also noticed that page doesn't have a secontion for text_similarity, which is the main thing we will be using deberta for. I don't think that needs to be a part of this PR, but we should consider adding that.

@szabosteve
Copy link
Contributor Author

Hey @maxhniebergall,
I've added DeBERTa to all the ML tasks on the inference pipeline reference doc page.
I also opened an issue to document text_similarity and add it to the page: https://github.com/elastic/search-docs-team/issues/188

Copy link
Contributor

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just wondering why some of the new sections in docs/reference/ingest/processors/inference.asciidoc have span and some dont?

edit: I see that zero-shot classification doesn't support span https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfig.java#L145

@szabosteve szabosteve added auto-backport Automatically create backport pull requests when merged v8.16.0 labels Oct 7, 2024
@szabosteve szabosteve merged commit 57955cb into elastic:main Oct 7, 2024
5 checks passed
@szabosteve szabosteve deleted the deberta-docs-update branch October 7, 2024 08:23
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

szabosteve added a commit to szabosteve/elasticsearch that referenced this pull request Oct 7, 2024
matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >docs General docs changes :ml Machine learning Team:Docs Meta label for docs team Team:ML Meta label for the ML team v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants