Is there any easy way to just get the generated tokens instead of the full output? #204

arnavgarg1 · 2023-08-14T15:30:02Z

arnavgarg1
Aug 14, 2023

Same as the discussion title - I am wondering if I can just get the generated tokens in the response as opposed to the input tokens + generated tokens.

I couldn't see anything around this here: https://github.com/bentoml/OpenLLM/blob/main/src/openllm/client/runtimes/base.py. Probably just needs a new return_response strategy?

aarnphm · 2023-08-15T02:20:23Z

aarnphm
Aug 15, 2023
Maintainer

This is working in progress, in the same line with SSE

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there any easy way to just get the generated tokens instead of the full output? #204

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there any easy way to just get the generated tokens instead of the full output? #204

Uh oh!

Uh oh!

arnavgarg1 Aug 14, 2023

Replies: 1 comment

Uh oh!

aarnphm Aug 15, 2023 Maintainer

arnavgarg1
Aug 14, 2023

aarnphm
Aug 15, 2023
Maintainer