Support ChatHistory in VLM Generate Method - Stage 1#3120
Support ChatHistory in VLM Generate Method - Stage 1#3120yatarkan wants to merge 21 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces support for ChatHistory as an argument in the VLM pipeline's generate() method. This is the first stage of implementation where the original user chat history is modified with the normalized prompt after calling generate.
Key changes:
- Added new
generate()overloads that acceptChatHistoryinstead of string prompts - Introduced
ChatHistoryInternalStateto track chat state across generations - Refactored common generation logic into helper methods
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python_tests/test_vlm_pipeline.py | Added test comparing start_chat() vs ChatHistory approaches |
| src/python/py_vlm_pipeline.cpp | Added Python bindings for ChatHistory-based generate methods |
| src/python/openvino_genai/py_openvino_genai.pyi | Updated type stubs with new generate overloads |
| src/cpp/src/visual_language/pipeline_base.hpp | Added abstract methods and helper functions for ChatHistory generation |
| src/cpp/src/visual_language/pipeline.cpp | Implemented ChatHistory generation logic and refactored existing code |
| src/cpp/src/visual_language/inputs_embedder.hpp | Added method to extract last user message from ChatHistory |
| src/cpp/src/visual_language/inputs_embedder.cpp | Implemented last user message extraction with validation |
| src/cpp/src/visual_language/continuous_batching_adapter.hpp | Added ChatHistory generate overloads for continuous batching |
| src/cpp/src/visual_language/chat_history_state.hpp | New file defining internal state management for ChatHistory |
| src/cpp/src/continuous_batching/pipeline_base.hpp | Added ChatHistory generate method signatures |
| src/cpp/src/continuous_batching/pipeline_base.cpp | Implemented ChatHistory generation for continuous batching |
| src/cpp/src/continuous_batching/pipeline.cpp | Added public API methods for ChatHistory generation |
| src/cpp/src/chat_history.cpp | Added internal state management and updated clear() method |
| src/cpp/include/openvino/genai/visual_language/pipeline.hpp | Added public API declarations for ChatHistory generation |
| src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp | Added ChatHistory generate declarations |
| src/cpp/include/openvino/genai/chat_history.hpp | Added internal state getter/setter methods |
Comments suppressed due to low confidence (1)
src/cpp/src/visual_language/pipeline.cpp:1
- Missing space after assignment operator. Should be
res = pipe.generate(...)for consistency with coding style.
// Copyright (C) 2023-2025 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| encoded_images.emplace_back(m_vision_encoder->encode(image)); | ||
| } | ||
| return embeds; | ||
| OPENVINO_ASSERT(images.size() == encoded_images.size(), "Input images size and encoded images size mismatch!"); |
There was a problem hiding this comment.
This assertion duplicates logic that was removed from pipeline.cpp line 208. The assertion is now in a different location which could make debugging harder since it's further from where the mismatch might occur. Consider whether this location provides the most helpful error context.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -9,6 +9,8 @@ | |||
| namespace ov { | |||
| namespace genai { | |||
|
|
|||
There was a problem hiding this comment.
Add a brief comment explaining the purpose of this forward declaration, e.g., '// Forward declaration for VLM pipeline state management'
| // Forward declaration for ChatHistory internal state management |
| std::vector<size_t> image_sequence; | ||
| std::vector<size_t> video_sequence; | ||
|
|
||
| std::vector<std::pair<std::size_t, std::size_t>> vision_count; // pair<video count, image count> |
There was a problem hiding this comment.
The comment describes the pair order as <video count, image count>, but this field appears unused in the current implementation. Consider removing it if it's not needed for this stage, or add a TODO comment if it's reserved for future use.
| std::vector<std::pair<std::size_t, std::size_t>> vision_count; // pair<video count, image count> | |
| // pair<video count, image count>, reserved for future use. | |
| // TODO: Use vision_count to track per-message vision inputs, or remove if it remains unused. | |
| std::vector<std::pair<std::size_t, std::size_t>> vision_count; |
| if (processed_history_size == 0) { | ||
| return false; | ||
| } | ||
| return history_size == processed_history_size + 2; // assistant response and last user messages are added to history manually |
There was a problem hiding this comment.
The magic number '2' should be defined as a named constant to clarify its meaning, e.g., static constexpr size_t EXPECTED_NEW_MESSAGES = 2;
| :param prompt: input prompt | ||
| :type prompt: str |
There was a problem hiding this comment.
The parameter documentation incorrectly refers to 'prompt' instead of 'history'. Update to match the actual parameter name.
| } | ||
|
|
||
| // Update original user chat history with normalized last user message | ||
| history.last()["content"] = unified_prompt; |
There was a problem hiding this comment.
Modifying the user's input ChatHistory object is a side effect that may be unexpected. Consider documenting this behavior clearly in the method's docstring or exploring alternatives to avoid mutating the input parameter in Stage 2.
| // Update original user chat history with normalized last user message | ||
| history.last()["content"] = unified_prompt; | ||
|
|
||
| chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end()); | ||
| chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end()); | ||
| chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size())); | ||
|
|
||
| std::string templated_history = m_tokenizer.apply_chat_template(history, true); |
There was a problem hiding this comment.
Same as in pipeline.cpp: modifying the input ChatHistory object is a side effect that should be documented or reconsidered in future stages.
| // Update original user chat history with normalized last user message | |
| history.last()["content"] = unified_prompt; | |
| chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end()); | |
| chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end()); | |
| chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size())); | |
| std::string templated_history = m_tokenizer.apply_chat_template(history, true); | |
| // Update a local copy of user chat history with normalized last user message | |
| auto normalized_history = history; | |
| normalized_history.last()["content"] = unified_prompt; | |
| chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end()); | |
| chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end()); | |
| chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size())); | |
| std::string templated_history = m_tokenizer.apply_chat_template(normalized_history, true); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| :param prompt: input prompt | ||
| :type prompt: str |
There was a problem hiding this comment.
The docstrings for the new ChatHistory-based generate overloads incorrectly reference 'prompt' parameter in the documentation. These overloads accept 'history' as the first parameter instead. Update the parameter documentation to reflect the correct parameter name and type.
| :param prompt: input prompt | |
| :type prompt: str | |
| :param history: input chat history | |
| :type history: ChatHistory |
| history.last()["content"] = unified_prompt; | ||
|
|
||
| chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end()); | ||
| chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end()); |
There was a problem hiding this comment.
The order of arguments in the pair (video_sequence.size(), image_sequence.size()) is inconsistent with the comment on line 22 of chat_history_state.hpp which defines the pair as <video count, image count>. While this matches the comment, the inconsistency with the order used elsewhere (image_id before video_id in the struct) could lead to confusion. Consider documenting why video comes before image in this specific case or maintaining consistent ordering throughout.
| chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end()); | |
| chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end()); | |
| // NOTE: vision_count stores pair<video_count, image_count> as defined in chat_history_state.hpp. |
|
Closing in favor of #3182 |
## Description This PR enables `ChatHistory` argument in `generate()` method of VLM pipeline. It introduces new abstractions (`ChatHistoryInternalState`, `VisionRegistry`, `VLMChatContext`) for easier processing of external chat history with images/videos. The PR is based on #3120 , but it fully reworks internal state integration and overcomes user history mutation limitation, so it might be more reasonable to review the diff with master rather than with intermediate PR. CVS-175244 ## Checklist: - [x] Tests have been updated or added to cover the new code. <!-- If the change isn't maintenance related, update the tests at https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or explain in the description why the tests don't need an update. --> - [x] This patch fully addresses the ticket. <!--- If follow-up pull requests are needed, specify in description. --> - [ ] I have made corresponding changes to the documentation. <!-- Run github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml on your fork with your branch as a parameter to deploy a test version with the updated content. Replace this comment with the link to the built docs. -->
## Description This PR enables `ChatHistory` argument in `generate()` method of VLM pipeline. It introduces new abstractions (`ChatHistoryInternalState`, `VisionRegistry`, `VLMChatContext`) for easier processing of external chat history with images/videos. The PR is based on #3120 , but it fully reworks internal state integration and overcomes user history mutation limitation, so it might be more reasonable to review the diff with master rather than with intermediate PR. CVS-175244 ## Checklist: - [x] Tests have been updated or added to cover the new code. <!-- If the change isn't maintenance related, update the tests at https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or explain in the description why the tests don't need an update. --> - [x] This patch fully addresses the ticket. <!--- If follow-up pull requests are needed, specify in description. --> - [ ] I have made corresponding changes to the documentation. <!-- Run github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml on your fork with your branch as a parameter to deploy a test version with the updated content. Replace this comment with the link to the built docs. -->
Description
This PR enables
ChatHistoryargument ingenerate()method of VLM pipeline. This is first stage of the ticket assuming that original user chat history is modified with normalized prompt after calling generate. Follow-up PRs will overcome this limitation.CVS-175244
Checklist: