Skip to content

Support ChatHistory in VLM Generate Method - Stage 1#3120

Closed
yatarkan wants to merge 21 commits intoopenvinotoolkit:masterfrom
yatarkan:yt/chat-history-vlm
Closed

Support ChatHistory in VLM Generate Method - Stage 1#3120
yatarkan wants to merge 21 commits intoopenvinotoolkit:masterfrom
yatarkan:yt/chat-history-vlm

Conversation

@yatarkan
Copy link
Contributor

Description

This PR enables ChatHistory argument in generate() method of VLM pipeline. This is first stage of the ticket assuming that original user chat history is modified with normalized prompt after calling generate. Follow-up PRs will overcome this limitation.

CVS-175244

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers no-match-files category: GGUF GGUF file reader labels Dec 16, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces support for ChatHistory as an argument in the VLM pipeline's generate() method. This is the first stage of implementation where the original user chat history is modified with the normalized prompt after calling generate.

Key changes:

  • Added new generate() overloads that accept ChatHistory instead of string prompts
  • Introduced ChatHistoryInternalState to track chat state across generations
  • Refactored common generation logic into helper methods

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/python_tests/test_vlm_pipeline.py Added test comparing start_chat() vs ChatHistory approaches
src/python/py_vlm_pipeline.cpp Added Python bindings for ChatHistory-based generate methods
src/python/openvino_genai/py_openvino_genai.pyi Updated type stubs with new generate overloads
src/cpp/src/visual_language/pipeline_base.hpp Added abstract methods and helper functions for ChatHistory generation
src/cpp/src/visual_language/pipeline.cpp Implemented ChatHistory generation logic and refactored existing code
src/cpp/src/visual_language/inputs_embedder.hpp Added method to extract last user message from ChatHistory
src/cpp/src/visual_language/inputs_embedder.cpp Implemented last user message extraction with validation
src/cpp/src/visual_language/continuous_batching_adapter.hpp Added ChatHistory generate overloads for continuous batching
src/cpp/src/visual_language/chat_history_state.hpp New file defining internal state management for ChatHistory
src/cpp/src/continuous_batching/pipeline_base.hpp Added ChatHistory generate method signatures
src/cpp/src/continuous_batching/pipeline_base.cpp Implemented ChatHistory generation for continuous batching
src/cpp/src/continuous_batching/pipeline.cpp Added public API methods for ChatHistory generation
src/cpp/src/chat_history.cpp Added internal state management and updated clear() method
src/cpp/include/openvino/genai/visual_language/pipeline.hpp Added public API declarations for ChatHistory generation
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp Added ChatHistory generate declarations
src/cpp/include/openvino/genai/chat_history.hpp Added internal state getter/setter methods
Comments suppressed due to low confidence (1)

src/cpp/src/visual_language/pipeline.cpp:1

  • Missing space after assignment operator. Should be res = pipe.generate(...) for consistency with coding style.
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

encoded_images.emplace_back(m_vision_encoder->encode(image));
}
return embeds;
OPENVINO_ASSERT(images.size() == encoded_images.size(), "Input images size and encoded images size mismatch!");
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion duplicates logic that was removed from pipeline.cpp line 208. The assertion is now in a different location which could make debugging harder since it's further from where the mismatch might occur. Consider whether this location provides the most helpful error context.

Copilot uses AI. Check for mistakes.
@yatarkan yatarkan requested a review from sgonorov as a code owner December 22, 2025 11:34
@yatarkan yatarkan requested a review from popovaan as a code owner December 22, 2025 11:34
@yatarkan yatarkan requested a review from Wovchena December 22, 2025 11:36
@Wovchena Wovchena requested a review from Copilot December 22, 2025 11:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 23, 2025 13:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -9,6 +9,8 @@
namespace ov {
namespace genai {

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a brief comment explaining the purpose of this forward declaration, e.g., '// Forward declaration for VLM pipeline state management'

Suggested change
// Forward declaration for ChatHistory internal state management

Copilot uses AI. Check for mistakes.
std::vector<size_t> image_sequence;
std::vector<size_t> video_sequence;

std::vector<std::pair<std::size_t, std::size_t>> vision_count; // pair<video count, image count>
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describes the pair order as <video count, image count>, but this field appears unused in the current implementation. Consider removing it if it's not needed for this stage, or add a TODO comment if it's reserved for future use.

Suggested change
std::vector<std::pair<std::size_t, std::size_t>> vision_count; // pair<video count, image count>
// pair<video count, image count>, reserved for future use.
// TODO: Use vision_count to track per-message vision inputs, or remove if it remains unused.
std::vector<std::pair<std::size_t, std::size_t>> vision_count;

Copilot uses AI. Check for mistakes.
if (processed_history_size == 0) {
return false;
}
return history_size == processed_history_size + 2; // assistant response and last user messages are added to history manually
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number '2' should be defined as a named constant to clarify its meaning, e.g., static constexpr size_t EXPECTED_NEW_MESSAGES = 2;

Copilot uses AI. Check for mistakes.
Comment on lines +4033 to +4034
:param prompt: input prompt
:type prompt: str
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter documentation incorrectly refers to 'prompt' instead of 'history'. Update to match the actual parameter name.

Copilot uses AI. Check for mistakes.
}

// Update original user chat history with normalized last user message
history.last()["content"] = unified_prompt;
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the user's input ChatHistory object is a side effect that may be unexpected. Consider documenting this behavior clearly in the method's docstring or exploring alternatives to avoid mutating the input parameter in Stage 2.

Copilot uses AI. Check for mistakes.
Comment on lines +446 to +453
// Update original user chat history with normalized last user message
history.last()["content"] = unified_prompt;

chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());
chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size()));

std::string templated_history = m_tokenizer.apply_chat_template(history, true);
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in pipeline.cpp: modifying the input ChatHistory object is a side effect that should be documented or reconsidered in future stages.

Suggested change
// Update original user chat history with normalized last user message
history.last()["content"] = unified_prompt;
chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());
chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size()));
std::string templated_history = m_tokenizer.apply_chat_template(history, true);
// Update a local copy of user chat history with normalized last user message
auto normalized_history = history;
normalized_history.last()["content"] = unified_prompt;
chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());
chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size()));
std::string templated_history = m_tokenizer.apply_chat_template(normalized_history, true);

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings December 24, 2025 12:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +4033 to +4034
:param prompt: input prompt
:type prompt: str
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings for the new ChatHistory-based generate overloads incorrectly reference 'prompt' parameter in the documentation. These overloads accept 'history' as the first parameter instead. Update the parameter documentation to reflect the correct parameter name and type.

Suggested change
:param prompt: input prompt
:type prompt: str
:param history: input chat history
:type history: ChatHistory

Copilot uses AI. Check for mistakes.
history.last()["content"] = unified_prompt;

chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());
chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of arguments in the pair (video_sequence.size(), image_sequence.size()) is inconsistent with the comment on line 22 of chat_history_state.hpp which defines the pair as <video count, image count>. While this matches the comment, the inconsistency with the order used elsewhere (image_id before video_id in the struct) could lead to confusion. Consider documenting why video comes before image in this specific case or maintaining consistent ordering throughout.

Suggested change
chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
// NOTE: vision_count stores pair<video_count, image_count> as defined in chat_history_state.hpp.

Copilot uses AI. Check for mistakes.
@yatarkan
Copy link
Contributor Author

Closing in favor of #3182

@yatarkan yatarkan closed this Jan 16, 2026
github-merge-queue bot pushed a commit that referenced this pull request Jan 20, 2026
## Description
This PR enables `ChatHistory` argument in `generate()` method of VLM
pipeline. It introduces new abstractions (`ChatHistoryInternalState`,
`VisionRegistry`, `VLMChatContext`) for easier processing of external
chat history with images/videos.
The PR is based on #3120 , but it fully reworks internal state
integration and overcomes user history mutation limitation, so it might
be more reasonable to review the diff with master rather than with
intermediate PR.

CVS-175244


## Checklist:
- [x] Tests have been updated or added to cover the new code. <!-- If
the change isn't maintenance related, update the tests at
https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or
explain in the description why the tests don't need an update. -->
- [x] This patch fully addresses the ticket. <!--- If follow-up pull
requests are needed, specify in description. -->
- [ ] I have made corresponding changes to the documentation. <!-- Run
github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml
on your fork with your branch as a parameter to deploy a test version
with the updated content. Replace this comment with the link to the
built docs. -->
github-merge-queue bot pushed a commit that referenced this pull request Jan 20, 2026
## Description
This PR enables `ChatHistory` argument in `generate()` method of VLM
pipeline. It introduces new abstractions (`ChatHistoryInternalState`,
`VisionRegistry`, `VLMChatContext`) for easier processing of external
chat history with images/videos.
The PR is based on #3120 , but it fully reworks internal state
integration and overcomes user history mutation limitation, so it might
be more reasonable to review the diff with master rather than with
intermediate PR.

CVS-175244


## Checklist:
- [x] Tests have been updated or added to cover the new code. <!-- If
the change isn't maintenance related, update the tests at
https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or
explain in the description why the tests don't need an update. -->
- [x] This patch fully addresses the ticket. <!--- If follow-up pull
requests are needed, specify in description. -->
- [ ] I have made corresponding changes to the documentation. <!-- Run
github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml
on your fork with your branch as a parameter to deploy a test version
with the updated content. Replace this comment with the link to the
built docs. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: Python API Python API for GenAI category: visual language Visual language pipeline no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants