Skip to content

ChatDatabricks: Token usage not returned when streaming an external modelΒ #198

@jakesteelman-insurica

Description

@jakesteelman-insurica

We have multiple LangGraph agents using ChatDatabricks as the LLM provider, some are using databricks-claude-sonnet-4-5 or databricks-claude-sonnet-4, and some are using gpt-5 which is an external endpoint.

Token usage streaming works fine with the claude-sonnet foundation models (token usage is returned on message.response_metadata.usage and shows in the Mlflow Experiment UI) but does not work for OpenAI when used specifically in a stream context.

It's almost as if usage gets completely stripped out or 'stream_usage' gets ignored with external models.

I am using databricks-langchain>=0.8.1.

Reproduction:

# Works - Claude Invoke
from databricks_langchain import ChatDatabricks

llm = ChatDatabricks(endpoint="databricks-claude-sonnet-4-5")
response = llm.invoke("Hello!")
print(response)
content='Hello! πŸ‘‹ How can I help you today?' additional_kwargs={} response_metadata={'usage': {'prompt_tokens': 10, 'completion_tokens': 16, 'total_tokens': 26}, 'prompt_tokens': 10, 'completion_tokens': 16, 'total_tokens': 26, 'model': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', 'model_name': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', 'finish_reason': 'stop'} id='run--100cd751-826c-4a4f-a7e8-3b48b64d48d2-0'
# Works - Claude Streaming
llm = ChatDatabricks(endpoint="databricks-claude-sonnet-4-5")

for chunk in llm.stream("How are you?", stream_usage=True):
    print(chunk)
content="I'm doing well, thank you for" additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content=" asking! I'm here and ready to" additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content=' help you with whatever you need. How are you' additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content=' doing today?' additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content='' additional_kwargs={} response_metadata={'finish_reason': 'stop'} id='run--6c010bf5-3c55-4252-994b-1914419cd445' usage_metadata={'input_tokens': 11, 'output_tokens': 32, 'total_tokens': 43}
content='' additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445' usage_metadata={'input_tokens': 11, 'output_tokens': 32, 'total_tokens': 43}
# Works - OpenAI External Invoke
llm = ChatDatabricks(endpoint="gpt-5")
response = llm.invoke("Hello!")
print(response)
content='Hello! How can I help you today?' additional_kwargs={} response_metadata={'usage': {'prompt_tokens': 9, 'completion_tokens': 18, 'total_tokens': 27}, 'prompt_tokens': 9, 'completion_tokens': 18, 'total_tokens': 27, 'model': 'gpt-5-2025-08-07', 'model_name': 'gpt-5-2025-08-07', 'finish_reason': 'stop'} id='run--7adba9ed-5560-4509-9ddc-b3d34a5c42b6-0'
# Doesn't work - OpenAI External Streaming
llm = ChatDatabricks(endpoint="gpt-5", stream_usage=True)

for chunk in llm.stream("How are you?", stream_usage=True):
    print(chunk)
content='' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='I' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='’m' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
...
content=' you' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='?' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='' additional_kwargs={} response_metadata={'finish_reason': 'stop'} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions