HuggingFaceTokenizer took 90ms to process 10^5 length text #2846
Closed
Rfank2021
started this conversation in
Development
Replies: 4 comments 5 replies
-
What's your expectation? What are you comparing with? Do you have benchmark for both python and DJL's implementation? |
Beta Was this translation helpful? Give feedback.
2 replies
-
Don't know about the algorithm, but why not stop when already got 256 tokens? |
Beta Was this translation helpful? Give feedback.
2 replies
-
I'm able to reproduce your issue. Here is what I found:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
I created a PR to address this issue: #2857 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Feel it's slow for roberta-base model, maybe it's more for batch tokenization rather than single text.
Beta Was this translation helpful? Give feedback.
All reactions