Bug on databricks_langchain/openai streaming when used with thinking.

During the message streaming using `llm.stream` method in langgraph or regular streaming where message chunks has to be merged.
Here is the each message chunk during predict_stream looks like 

```python
for i in chat_model.stream("hi"):
    print(i)
```

Response
```text
content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': 'The', 'signature': ''}]}] additional_kwargs={} response_metadata={} id='run--b1a1d8f7-57af-4d17-bf8f-3dad69e98fe1'
content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': ' message from the user is a simple', 'signature': ''}]}] additional_kwargs={} response_metadata={} id='run--b1a1d8f7-57af-4d17-bf8f-3dad69e98fe1'
content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': ' greeting - "hi". I shoul', 'signature': ''}]}] additional_kwargs={} response_metadata={} id='run--b1a1d8f7-57af-4d17-bf8f-3dad69e98fe1'
content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}]}] additional_kwargs={} response_metadata={} id='run--b1a1d8f7-57af-4d17-bf8f-3dad69e98fe1'
content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': ', introducing myself as Claude and offering to help the', 'signature': ''}]}] additional_kwargs={} response_metadata={} id='run--b1a1d8f7-57af-4d17-bf8f-3dad69e98fe1'
content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': ' user with anything they might need.', 'signature': ''}]}] 
```

The langchain `_stream` functions merges `ChatGenerationChunk` using 
``` python
from langchain_core.outputs.chat_generation import merge_chat_generation_chunks
```

you will get final message in a mangled format like below
```text
ChatGenerationChunk(message=AIMessageChunk(content=[{'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}]}, {'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}]}, {'type': 'reasoning', 'summary': [{'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}]}], additional_kwargs={}, response_metadata={}, id='run--4511c99b-da39-46cc-b701-4cc161e52d9b'))
```

Upon investigation it was found that, if the message stream chunk is in the format below, then the merge will work good as expected.

```text
content=[{'type': 'reasoning', 'summary': {'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}, 'index': 0}]
or
content=[{'type': 'text', 'text': "hi", 'index': 1}
```

Please make changes to include the `index':0` to merge all indexes and `index:1` for next index. Let me know if you need more info on this issue.


Sample working example which will work : 

```python
merge_chat_generation_chunks([
    ChatGenerationChunk(message=AIMessageChunk(content=[{'type': 'reasoning', 'summary': {'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}, 'index': 0}], additional_kwargs={}, response_metadata={}, id='run--4511c99b-da39-46cc-b701-4cc161e52d9b')), 
    ChatGenerationChunk(message=AIMessageChunk(content=[{'type': 'reasoning', 'summary': {'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}, 'index': 0}], additional_kwargs={}, response_metadata={}, id='run--4511c99b-da39-46cc-b701-4cc161e52d9b')), 
    ChatGenerationChunk(message=AIMessageChunk(content=[{'type': 'reasoning', 'summary': {'type': 'summary_text', 'text': 'd respond in a friendly and welcoming manner', 'signature': ''}, 'index': 0}], additional_kwargs={}, response_metadata={}, id='run--4511c99b-da39-46cc-b701-4cc161e52d9b')),
    ChatGenerationChunk(message=AIMessageChunk(content=[{'type': 'text', 'text': "hi", 'index': 1}], additional_kwargs={}, response_metadata={}, id='run--4511c99b-da39-46cc-b701-4cc161e52d9b'))])
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug on databricks_langchain/openai streaming when used with thinking. #133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug on databricks_langchain/openai streaming when used with thinking. #133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions