Added information about the use of slow tokenizers #2517

ppf2 · 2025-08-11T22:06:53Z

Added information about the use of slow tokenizers to generate vocab files in ML.

github-actions · 2025-08-11T22:09:04Z

🔍 Preview links for changed docs

explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md

benironside

I am not familiar with the content, but the writing LGTM

davidkyle · 2025-08-13T08:49:34Z

FYI I started work on switching to the fast tokenizers for Eland in elastic/eland#803. This change is required for supporting more of the models found on HuggingFace, the Jina AI Reranker is an example

However, some tests failed after the switch so it is not a simple change and we must first understand why those failures are occuring

explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md

Co-authored-by: David Kyle <[email protected]>

davidkyle

We've lost the part about the slow tokenizers now

ppf2 · 2025-08-14T16:59:15Z

@davidkyle Oh I thought it was intentional because we are about to support fast tokenizers 😄 Do you think we should hold off on this PR until fast tokenizer support is available and we will make the statement about slow/fast tokenizers then? WDYT?

vishaangelova · 2025-08-15T07:08:13Z

@ppf2 I disabled the auto-merge as I saw your question on holding off on this PR, just to be sure this doesn’t get merged until you want it to.

davidkyle

LGTM

davidkyle · 2025-08-15T07:52:36Z

Oh I thought it was intentional because we are about to support fast tokenizers 😄

Good point. Let's merge as is and I will concentrate on the fast tokenizer work. If I don't make any progress next week I will create another PR here to document the use of slow tokenizers

explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md

Co-authored-by: shainaraskas <[email protected]>

vishaangelova · 2025-08-21T15:04:33Z

@ppf2 please feel free to merge at your convenience as the auto-merge is not enabled.

Added information about the use of slow tokenizers

25cacb9

Added information about the use of slow tokenizers to generate vocab files in ML.

ppf2 requested a review from davidkyle August 11, 2025 22:06

ppf2 requested review from a team as code owners August 11, 2025 22:06

benironside approved these changes Aug 12, 2025

View reviewed changes

davidkyle reviewed Aug 13, 2025

View reviewed changes

explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md Outdated Show resolved Hide resolved

Update explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md

2de8432

Co-authored-by: David Kyle <[email protected]>

ppf2 enabled auto-merge (squash) August 13, 2025 22:48

davidkyle reviewed Aug 14, 2025

View reviewed changes

Merge branch 'main' into ppf2-slow-tokenizers

4dc1a63

vishaangelova disabled auto-merge August 15, 2025 07:06

davidkyle approved these changes Aug 15, 2025

View reviewed changes

shainaraskas reviewed Aug 15, 2025

View reviewed changes

explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md Outdated Show resolved Hide resolved

ppf2 and others added 2 commits August 15, 2025 11:20

Update explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md

29ec13c

Co-authored-by: shainaraskas <[email protected]>

Merge branch 'main' into ppf2-slow-tokenizers

7c9f097

Merge branch 'main' into ppf2-slow-tokenizers

044dcb2

bmorelli25 enabled auto-merge (squash) August 22, 2025 17:19

bmorelli25 merged commit b8bedb2 into main Aug 22, 2025
6 of 7 checks passed

bmorelli25 deleted the ppf2-slow-tokenizers branch August 22, 2025 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added information about the use of slow tokenizers #2517

Added information about the use of slow tokenizers #2517

Uh oh!

ppf2 commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025 •

edited

Loading

Uh oh!

benironside left a comment

Uh oh!

davidkyle commented Aug 13, 2025

Uh oh!

Uh oh!

davidkyle left a comment

Uh oh!

ppf2 commented Aug 14, 2025

Uh oh!

vishaangelova commented Aug 15, 2025

Uh oh!

davidkyle left a comment

Uh oh!

davidkyle commented Aug 15, 2025

Uh oh!

Uh oh!

vishaangelova commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Added information about the use of slow tokenizers #2517

Added information about the use of slow tokenizers #2517

Uh oh!

Conversation

ppf2 commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

benironside left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle commented Aug 13, 2025

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

ppf2 commented Aug 14, 2025

Uh oh!

vishaangelova commented Aug 15, 2025

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle commented Aug 15, 2025

Uh oh!

Uh oh!

vishaangelova commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Aug 11, 2025 •

edited

Loading