Is there any easy way to just get the generated tokens instead of the full output? #204
arnavgarg1
started this conversation in
General
Replies: 1 comment
-
This is working in progress, in the same line with SSE |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Same as the discussion title - I am wondering if I can just get the generated tokens in the response as opposed to the input tokens + generated tokens.
I couldn't see anything around this here: https://github.com/bentoml/OpenLLM/blob/main/src/openllm/client/runtimes/base.py. Probably just needs a new
return_response
strategy?Beta Was this translation helpful? Give feedback.
All reactions