-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
We have multiple LangGraph agents using ChatDatabricks as the LLM provider, some are using databricks-claude-sonnet-4-5 or databricks-claude-sonnet-4, and some are using gpt-5 which is an external endpoint.
Token usage streaming works fine with the claude-sonnet foundation models (token usage is returned on message.response_metadata.usage and shows in the Mlflow Experiment UI) but does not work for OpenAI when used specifically in a stream context.
It's almost as if usage gets completely stripped out or 'stream_usage' gets ignored with external models.
I am using databricks-langchain>=0.8.1.
Reproduction:
# Works - Claude Invoke
from databricks_langchain import ChatDatabricks
llm = ChatDatabricks(endpoint="databricks-claude-sonnet-4-5")
response = llm.invoke("Hello!")
print(response)content='Hello! π How can I help you today?' additional_kwargs={} response_metadata={'usage': {'prompt_tokens': 10, 'completion_tokens': 16, 'total_tokens': 26}, 'prompt_tokens': 10, 'completion_tokens': 16, 'total_tokens': 26, 'model': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', 'model_name': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', 'finish_reason': 'stop'} id='run--100cd751-826c-4a4f-a7e8-3b48b64d48d2-0'# Works - Claude Streaming
llm = ChatDatabricks(endpoint="databricks-claude-sonnet-4-5")
for chunk in llm.stream("How are you?", stream_usage=True):
print(chunk)content="I'm doing well, thank you for" additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content=" asking! I'm here and ready to" additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content=' help you with whatever you need. How are you' additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content=' doing today?' additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445'
content='' additional_kwargs={} response_metadata={'finish_reason': 'stop'} id='run--6c010bf5-3c55-4252-994b-1914419cd445' usage_metadata={'input_tokens': 11, 'output_tokens': 32, 'total_tokens': 43}
content='' additional_kwargs={} response_metadata={} id='run--6c010bf5-3c55-4252-994b-1914419cd445' usage_metadata={'input_tokens': 11, 'output_tokens': 32, 'total_tokens': 43}# Works - OpenAI External Invoke
llm = ChatDatabricks(endpoint="gpt-5")
response = llm.invoke("Hello!")
print(response)content='Hello! How can I help you today?' additional_kwargs={} response_metadata={'usage': {'prompt_tokens': 9, 'completion_tokens': 18, 'total_tokens': 27}, 'prompt_tokens': 9, 'completion_tokens': 18, 'total_tokens': 27, 'model': 'gpt-5-2025-08-07', 'model_name': 'gpt-5-2025-08-07', 'finish_reason': 'stop'} id='run--7adba9ed-5560-4509-9ddc-b3d34a5c42b6-0'# Doesn't work - OpenAI External Streaming
llm = ChatDatabricks(endpoint="gpt-5", stream_usage=True)
for chunk in llm.stream("How are you?", stream_usage=True):
print(chunk)content='' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='I' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='βm' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
...
content=' you' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='?' additional_kwargs={} response_metadata={} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'
content='' additional_kwargs={} response_metadata={'finish_reason': 'stop'} id='run--3e6bbb06-6ed9-4366-9b60-ae8487085a6e'Metadata
Metadata
Assignees
Labels
No labels