Add `tokenizers` package #1675

yuvmen · 2025-09-29T21:37:11Z

tokenizers is used for tokenization of strings. While we do currently import titoken as well, we want to use tokenizers to count tokens the same way Seer does in Sentry for stacktrace token counts. We could consider converting usages of titoken to tokenizers to maintain a dependency on only one tokenization package, but for now we will have them both.

`tokenizers` is used for tokenization of strings. While we do currently import titoken as well, we want to use `tokenizers` to count tokens the same way Seer does in Sentry for stacktrace token counts. We could consider converting usages of titoken to `tokenizers` to maintain a dependency on only one tokenization package, but for now we will have them both.

JoshFerge

i would prefer if we could vendor the library -- otherwise we should audit these other dependencies it's adding

joshuarli

most of the extra stuff is from huggingface-hub: https://github.com/huggingface/huggingface_hub/blob/main/setup.py#L14

shrug

kddubey approved these changes Sep 29, 2025

View reviewed changes

JoshFerge reviewed Sep 29, 2025

View reviewed changes

JoshFerge approved these changes Sep 29, 2025

View reviewed changes

joshuarli approved these changes Sep 29, 2025

View reviewed changes

joshuarli merged commit 0ddbc16 into main Sep 29, 2025
15 checks passed

joshuarli deleted the yuvmen/add-tokenizers-package branch September 29, 2025 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `tokenizers` package #1675

Add `tokenizers` package #1675

Uh oh!

yuvmen commented Sep 29, 2025

Uh oh!

JoshFerge left a comment

Uh oh!

joshuarli left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Add tokenizers package #1675

Add tokenizers package #1675

Uh oh!

Conversation

yuvmen commented Sep 29, 2025

Uh oh!

JoshFerge left a comment

Choose a reason for hiding this comment

Uh oh!

joshuarli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add `tokenizers` package #1675

Add `tokenizers` package #1675