Skip to content

Conversation

@wonderwhy-er
Copy link
Collaborator

@wonderwhy-er wonderwhy-er commented Dec 9, 2024

This PR adds the ability to show token usage in messages for enhanced transparency and monitoring of API consumption.
The key features include:
Token Usage Display: Display token count for input and output in each message.

Technical implementation and exploration:

  • First I tried to use on onFinished options.usage in Chat.client.tsx
  • But above does not work, returns NaNs, I could not figure out why and how to fix. Could be issue with Vercel client or how Bolt uses Verce ai on server with custom switchble stream streams
  • Then I looked in to data streams
  • That did not work aether...
  • Next I tried returning them in text, as bolt actions and arrived at current implementation

Its bit hackish but hard to get to better variant in reasonable amount of time, may be someone can come later and improve better than I could.
image

@wonderwhy-er wonderwhy-er changed the title Exploration of how to show token usage (Work In Progress) Exploration of how to show token usage Dec 9, 2024
@pjmartorell
Copy link

pjmartorell commented Dec 9, 2024

@wonderwhy-er Which LLM are you using to test it? As far as a I know your approach correspond to OpenAI models only. Also I think that the request must contain a stream_options: {"include_usage": true}parameter in order to make the API return the usageobject (see https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage).

@thecodacus
Copy link
Collaborator

this should work
image

@thecodacus
Copy link
Collaborator

nah, its returning nan

@wonderwhy-er
Copy link
Collaborator Author

@wonderwhy-er Which LLM are you using to test it? As far as a I know your approach correspond to OpenAI models only. Also I think that the request must contain a stream_options: {"include_usage": true}parameter in order to make the API return the usageobject (see https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage).

It may not work with all providers but google and openai 100% return it
But something is of in how bolt is setup
Server in onFinish does get usage
But client does not
I am not sure how to fix that, have not found anything
Instead I will pass server usage stats to client for now
image

I also just tested Antropic, OpenRouter, Cohere, Together - and they work.

Test Groq, HuggingFace and for them it does not work.

So, initially it will work only for some providers.

@wonderwhy-er
Copy link
Collaborator Author

nah, its returning nan

On client yet. On server depending on the provider it does return.
I think it will solve itself if we move API calls to client like discussed on Saturday, but for now I will return server responses to client.
And show them if they are not NaN.
Will serve as a way to test context use optimisations.

@thecodacus
Copy link
Collaborator

thecodacus commented Dec 10, 2024

if we move the llm call to client, can we still use the vercel's AI sdk. its very easy but also restricted

@wonderwhy-er
Copy link
Collaborator Author

Vercel SDK works on client too. I tested it against LM studio and it worked.
Not sure yet, will need to start work on it first.

I will start by moving LM Studio call to client, then ollama.
Then we can see with other popular providers.

@wonderwhy-er wonderwhy-er marked this pull request as ready for review December 16, 2024 09:12
@wonderwhy-er wonderwhy-er requested review from dustinwloring1988 and thecodacus and removed request for thecodacus December 16, 2024 09:18
@thecodacus
Copy link
Collaborator

thecodacus commented Dec 16, 2024

getting this error
Screenshot 2024-12-16 at 3 55 03 PM

trying to find from which line its coming, but its not in terminal nor in console

Edit:
coming from OnFinish:
here is the stack

SyntaxError: Unexpected non-whitespace character after JSON at position 11 (line 1 column 12)
    at JSON.parse (<anonymous>)
    at parseStreamPart (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6528:26)
    at Array.map (<anonymous>)
    at readDataStream (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6562:121)
    at async processDataProtocolResponse (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6600:34)
    at async callChatApi (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:6885:14)
    at async getStreamedResponse (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:8328:10)
    at async processChatStream (http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:7075:42)
    at async http://localhost:5173/node_modules/.vite/deps/ai_react.js?v=4aea7f2c:8449:9

@thecodacus
Copy link
Collaborator

I suggest using this approach, it will automatically get appended to the assistant message and can be used without any regex parsing

if (usage) {
  cumulativeUsage.completionTokens += usage.completionTokens || 0;
  cumulativeUsage.promptTokens += usage.promptTokens || 0;
  cumulativeUsage.totalTokens += usage.totalTokens || 0;
  
  return stream.switchSource(createDataStream({
    async execute(dataStream) {
      dataStream.writeMessageAnnotation({
        type: 'usage', value: {
          completionTokens: cumulativeUsage.completionTokens,
          promptTokens: cumulativeUsage.promptTokens,
          totalTokens: cumulativeUsage.totalTokens,
        } });
    },
    onError: (error: any) => `Custom error: ${error.message}`,
  })).then(()=>{
    stream.close()
  })
}

the output looks like this
image

@dustinwloring1988
Copy link
Collaborator

Same error as above on Windows 11 except I am able to see the token usage in the terminal.

1.mp4

@thecodacus thecodacus merged commit 070e911 into stackblitz-labs:main Dec 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants