(Work In Progress) Exploration of how to show token usage #609

wonderwhy-er · 2024-12-09T15:29:29Z

This PR adds the ability to show token usage in messages for enhanced transparency and monitoring of API consumption.
The key features include:
Token Usage Display: Display token count for input and output in each message.

Technical implementation and exploration:

First I tried to use on onFinished options.usage in Chat.client.tsx
But above does not work, returns NaNs, I could not figure out why and how to fix. Could be issue with Vercel client or how Bolt uses Verce ai on server with custom switchble stream streams
Then I looked in to data streams
That did not work aether...
Next I tried returning them in text, as bolt actions and arrived at current implementation

Its bit hackish but hard to get to better variant in reasonable amount of time, may be someone can come later and improve better than I could.

pjmartorell · 2024-12-09T20:11:14Z

@wonderwhy-er Which LLM are you using to test it? As far as a I know your approach correspond to OpenAI models only. Also I think that the request must contain a stream_options: {"include_usage": true}parameter in order to make the API return the usageobject (see https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage).

thecodacus · 2024-12-09T20:16:43Z

this should work

thecodacus · 2024-12-09T20:22:54Z

nah, its returning nan

wonderwhy-er · 2024-12-09T20:31:24Z

@wonderwhy-er Which LLM are you using to test it? As far as a I know your approach correspond to OpenAI models only. Also I think that the request must contain a stream_options: {"include_usage": true}parameter in order to make the API return the usageobject (see https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage).

It may not work with all providers but google and openai 100% return it
But something is of in how bolt is setup
Server in onFinish does get usage
But client does not
I am not sure how to fix that, have not found anything
Instead I will pass server usage stats to client for now

I also just tested Antropic, OpenRouter, Cohere, Together - and they work.

Test Groq, HuggingFace and for them it does not work.

So, initially it will work only for some providers.

wonderwhy-er · 2024-12-09T20:33:04Z

nah, its returning nan

On client yet. On server depending on the provider it does return.
I think it will solve itself if we move API calls to client like discussed on Saturday, but for now I will return server responses to client.
And show them if they are not NaN.
Will serve as a way to test context use optimisations.

thecodacus · 2024-12-10T16:43:03Z

if we move the llm call to client, can we still use the vercel's AI sdk. its very easy but also restricted

wonderwhy-er · 2024-12-10T18:12:05Z

Vercel SDK works on client too. I tested it against LM studio and it worked.
Not sure yet, will need to start work on it first.

I will start by moving LM Studio call to client, then ollama.
Then we can see with other popular providers.

thecodacus · 2024-12-16T10:41:24Z

getting this error

trying to find from which line its coming, but its not in terminal nor in console

Edit:
coming from OnFinish:
here is the stack

SyntaxError: Unexpected non-whitespace character after JSON at position 11 (line 1 column 12)
    at JSON.parse (<anonymous>)
    at parseStreamPart (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6528:26)
    at Array.map (<anonymous>)
    at readDataStream (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6562:121)
    at async processDataProtocolResponse (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6600:34)
    at async callChatApi (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6885:14)
    at async getStreamedResponse (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:8328:10)
    at async processChatStream (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:7075:42)
    at async http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:8449:9

thecodacus · 2024-12-16T11:31:46Z

I suggest using this approach, it will automatically get appended to the assistant message and can be used without any regex parsing

if (usage) {
  cumulativeUsage.completionTokens += usage.completionTokens || 0;
  cumulativeUsage.promptTokens += usage.promptTokens || 0;
  cumulativeUsage.totalTokens += usage.totalTokens || 0;
  
  return stream.switchSource(createDataStream({
    async execute(dataStream) {
      dataStream.writeMessageAnnotation({
        type: 'usage', value: {
          completionTokens: cumulativeUsage.completionTokens,
          promptTokens: cumulativeUsage.promptTokens,
          totalTokens: cumulativeUsage.totalTokens,
        } });
    },
    onError: (error: any) => `Custom error: ${error.message}`,
  })).then(()=>{
    stream.close()
  })
}

the output looks like this

dustinwloring1988 · 2024-12-16T11:32:09Z

Same error as above on Windows 11 except I am able to see the token usage in the terminal.

1.mp4

Refactor to use newver v4 version of Vercel AI package

fcb61ba

wonderwhy-er changed the title ~~Exploration of how to show token usage~~ (Work In Progress) Exploration of how to show token usage Dec 9, 2024

wonderwhy-er added 4 commits December 14, 2024 19:48

Merge

2e05270

Another attempt to add toek usage info

225b553

merge

1b76d3c

Lint fix

070e911

wonderwhy-er marked this pull request as ready for review December 16, 2024 09:12

wonderwhy-er requested review from dustinwloring1988 and thecodacus and removed request for thecodacus December 16, 2024 09:18

thecodacus mentioned this pull request Dec 16, 2024

feat: Show token usage #769

Merged

thecodacus merged commit 070e911 into stackblitz-labs:main Dec 16, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Work In Progress) Exploration of how to show token usage #609

(Work In Progress) Exploration of how to show token usage #609

Uh oh!

wonderwhy-er commented Dec 9, 2024 •

edited

Loading

Uh oh!

pjmartorell commented Dec 9, 2024 •

edited

Loading

Uh oh!

thecodacus commented Dec 9, 2024

Uh oh!

thecodacus commented Dec 9, 2024

Uh oh!

wonderwhy-er commented Dec 9, 2024

Uh oh!

wonderwhy-er commented Dec 9, 2024

Uh oh!

thecodacus commented Dec 10, 2024 •

edited

Loading

Uh oh!

wonderwhy-er commented Dec 10, 2024

Uh oh!

thecodacus commented Dec 16, 2024 •

edited

Loading

Uh oh!

thecodacus commented Dec 16, 2024

Uh oh!

dustinwloring1988 commented Dec 16, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

(Work In Progress) Exploration of how to show token usage #609

(Work In Progress) Exploration of how to show token usage #609

Uh oh!

Conversation

wonderwhy-er commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjmartorell commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thecodacus commented Dec 9, 2024

Uh oh!

thecodacus commented Dec 9, 2024

Uh oh!

wonderwhy-er commented Dec 9, 2024

Uh oh!

wonderwhy-er commented Dec 9, 2024

Uh oh!

thecodacus commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wonderwhy-er commented Dec 10, 2024

Uh oh!

thecodacus commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thecodacus commented Dec 16, 2024

Uh oh!

dustinwloring1988 commented Dec 16, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wonderwhy-er commented Dec 9, 2024 •

edited

Loading

pjmartorell commented Dec 9, 2024 •

edited

Loading

thecodacus commented Dec 10, 2024 •

edited

Loading

thecodacus commented Dec 16, 2024 •

edited

Loading