Support ChatHistory in VLM Generate Method - Stage 1 by yatarkan · Pull Request #3120 · openvinotoolkit/openvino.genai

yatarkan · 2025-12-16T15:04:09Z

Description

This PR enables ChatHistory argument in generate() method of VLM pipeline. This is first stage of the ticket assuming that original user chat history is modified with normalized prompt after calling generate. Follow-up PRs will overcome this limitation.

CVS-175244

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

This PR introduces support for ChatHistory as an argument in the VLM pipeline's generate() method. This is the first stage of implementation where the original user chat history is modified with the normalized prompt after calling generate.

Key changes:

Added new generate() overloads that accept ChatHistory instead of string prompts
Introduced ChatHistoryInternalState to track chat state across generations
Refactored common generation logic into helper methods

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/python_tests/test_vlm_pipeline.py	Added test comparing `start_chat()` vs `ChatHistory` approaches
src/python/py_vlm_pipeline.cpp	Added Python bindings for ChatHistory-based generate methods
src/python/openvino_genai/py_openvino_genai.pyi	Updated type stubs with new generate overloads
src/cpp/src/visual_language/pipeline_base.hpp	Added abstract methods and helper functions for ChatHistory generation
src/cpp/src/visual_language/pipeline.cpp	Implemented ChatHistory generation logic and refactored existing code
src/cpp/src/visual_language/inputs_embedder.hpp	Added method to extract last user message from ChatHistory
src/cpp/src/visual_language/inputs_embedder.cpp	Implemented last user message extraction with validation
src/cpp/src/visual_language/continuous_batching_adapter.hpp	Added ChatHistory generate overloads for continuous batching
src/cpp/src/visual_language/chat_history_state.hpp	New file defining internal state management for ChatHistory
src/cpp/src/continuous_batching/pipeline_base.hpp	Added ChatHistory generate method signatures
src/cpp/src/continuous_batching/pipeline_base.cpp	Implemented ChatHistory generation for continuous batching
src/cpp/src/continuous_batching/pipeline.cpp	Added public API methods for ChatHistory generation
src/cpp/src/chat_history.cpp	Added internal state management and updated clear() method
src/cpp/include/openvino/genai/visual_language/pipeline.hpp	Added public API declarations for ChatHistory generation
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp	Added ChatHistory generate declarations
src/cpp/include/openvino/genai/chat_history.hpp	Added internal state getter/setter methods

Comments suppressed due to low confidence (1)

src/cpp/src/visual_language/pipeline.cpp:1

Missing space after assignment operator. Should be res = pipe.generate(...) for consistency with coding style.

// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-16T15:05:01Z

src/cpp/src/visual_language/inputs_embedder.cpp

+        encoded_images.emplace_back(m_vision_encoder->encode(image));
    }
-    return embeds;
+    OPENVINO_ASSERT(images.size() == encoded_images.size(), "Input images size and encoded images size mismatch!");


This assertion duplicates logic that was removed from pipeline.cpp line 208. The assertion is now in a different location which could make debugging harder since it's further from where the mismatch might occur. Consider whether this location provides the most helpful error context.

tests/python_tests/test_vlm_pipeline.py

src/cpp/include/openvino/genai/chat_history.hpp

src/cpp/include/openvino/genai/visual_language/pipeline.hpp

src/cpp/src/visual_language/chat_history_state.hpp

src/cpp/src/continuous_batching/pipeline_base.cpp

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/python/py_vlm_pipeline.cpp

src/cpp/src/visual_language/pipeline.cpp

src/cpp/src/visual_language/chat_history_state.hpp

src/python/openvino_genai/py_openvino_genai.pyi

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-23T13:32:10Z

src/cpp/include/openvino/genai/chat_history.hpp

@@ -9,6 +9,8 @@
 namespace ov {
 namespace genai {



Add a brief comment explaining the purpose of this forward declaration, e.g., '// Forward declaration for VLM pipeline state management'

Suggested change

// Forward declaration for ChatHistory internal state management

Copilot · 2025-12-23T13:32:10Z

src/cpp/src/visual_language/chat_history_state.hpp

+    std::vector<size_t> image_sequence;
+    std::vector<size_t> video_sequence;
+
+    std::vector<std::pair<std::size_t, std::size_t>> vision_count;  // pair<video count, image count>


The comment describes the pair order as <video count, image count>, but this field appears unused in the current implementation. Consider removing it if it's not needed for this stage, or add a TODO comment if it's reserved for future use.

Suggested change

std::vector<std::pair<std::size_t, std::size_t>> vision_count; // pair<video count, image count>

// pair<video count, image count>, reserved for future use.

// TODO: Use vision_count to track per-message vision inputs, or remove if it remains unused.

std::vector<std::pair<std::size_t, std::size_t>> vision_count;

Copilot · 2025-12-23T13:32:10Z

src/cpp/src/visual_language/chat_history_state.hpp

+        if (processed_history_size == 0) {
+            return false;
+        }
+        return history_size == processed_history_size + 2; // assistant response and last user messages are added to history manually


The magic number '2' should be defined as a named constant to clarify its meaning, e.g., static constexpr size_t EXPECTED_NEW_MESSAGES = 2;

Copilot · 2025-12-23T13:32:11Z

src/python/openvino_genai/py_openvino_genai.pyi

+            :param prompt: input prompt
+            :type prompt: str


The parameter documentation incorrectly refers to 'prompt' instead of 'history'. Update to match the actual parameter name.

Copilot · 2025-12-23T13:32:11Z

src/cpp/src/visual_language/pipeline.cpp

+        }
+
+        // Update original user chat history with normalized last user message
+        history.last()["content"] = unified_prompt;


Modifying the user's input ChatHistory object is a side effect that may be unexpected. Consider documenting this behavior clearly in the method's docstring or exploring alternatives to avoid mutating the input parameter in Stage 2.

Copilot · 2025-12-23T13:32:11Z

src/cpp/src/continuous_batching/pipeline_base.cpp

+    // Update original user chat history with normalized last user message
+    history.last()["content"] = unified_prompt;
+
+    chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());
+    chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
+    chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size()));
+
+    std::string templated_history = m_tokenizer.apply_chat_template(history, true);


Same as in pipeline.cpp: modifying the input ChatHistory object is a side effect that should be documented or reconsidered in future stages.

Suggested change

// Update original user chat history with normalized last user message

history.last()["content"] = unified_prompt;

chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());

chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());

chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size()));

std::string templated_history = m_tokenizer.apply_chat_template(history, true);

// Update a local copy of user chat history with normalized last user message

auto normalized_history = history;

normalized_history.last()["content"] = unified_prompt;

chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());

chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());

chat_history_state->vision_count.emplace_back(std::make_pair(video_sequence.size(), image_sequence.size()));

std::string templated_history = m_tokenizer.apply_chat_template(normalized_history, true);

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T12:24:49Z

src/python/openvino_genai/py_openvino_genai.pyi

+            :param prompt: input prompt
+            :type prompt: str


The docstrings for the new ChatHistory-based generate overloads incorrectly reference 'prompt' parameter in the documentation. These overloads accept 'history' as the first parameter instead. Update the parameter documentation to reflect the correct parameter name and type.

Suggested change

:param prompt: input prompt

:type prompt: str

:param history: input chat history

:type history: ChatHistory

Copilot · 2025-12-24T12:24:50Z

src/cpp/src/continuous_batching/pipeline_base.cpp

+    history.last()["content"] = unified_prompt;
+
+    chat_history_state->image_sequence.insert(chat_history_state->image_sequence.end(), image_sequence.begin(), image_sequence.end());
+    chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());


The order of arguments in the pair (video_sequence.size(), image_sequence.size()) is inconsistent with the comment on line 22 of chat_history_state.hpp which defines the pair as <video count, image count>. While this matches the comment, the inconsistency with the order used elsewhere (image_id before video_id in the struct) could lead to confusion. Consider documenting why video comes before image in this specific case or maintaining consistent ordering throughout.

Suggested change

chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());

chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());

// NOTE: vision_count stores pair<video_count, image_count> as defined in chat_history_state.hpp.

yatarkan · 2026-01-16T10:46:42Z

Closing in favor of #3182

## Description This PR enables `ChatHistory` argument in `generate()` method of VLM pipeline. It introduces new abstractions (`ChatHistoryInternalState`, `VisionRegistry`, `VLMChatContext`) for easier processing of external chat history with images/videos. The PR is based on #3120 , but it fully reworks internal state integration and overcomes user history mutation limitation, so it might be more reasonable to review the diff with master rather than with intermediate PR. CVS-175244 ## Checklist: - [x] Tests have been updated or added to cover the new code.  - [x] This patch fully addresses the ticket.  - [ ] I have made corresponding changes to the documentation.

yatarkan added 12 commits November 17, 2025 19:10

Align param names for generate method with single image

1a9bbe3

Merge branch 'master' into yt/chat-history-vlm

22a205a

Move encoded images check to inputs embedder

527520e

Add initial internal state to chat history

1c76fbe

Add VLM specific chat history internal state

c78c5c9

Add last user message getter to inputs embedder

bb9da18

Enable chat history in stateful vlm pipeline

6577b0a

Enable chat history for PA VLM pipeline

6b79b8a

Add python bindings for history in VLM pipeline

c0f047a

Merge branch 'master' into yt/chat-history-vlm

50f77a1

Update python bindings + stubs

fd1a64b

Add test for matching start chat vs chat history

59ee913

yatarkan requested review from Wovchena, apaniukov, as-suvorov and Copilot December 16, 2025 15:04

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers no-match-files category: GGUF GGUF file reader labels Dec 16, 2025

Copilot AI reviewed Dec 16, 2025

View reviewed changes

apaniukov approved these changes Dec 18, 2025

View reviewed changes

Wovchena requested changes Dec 22, 2025

View reviewed changes

yatarkan added 4 commits December 22, 2025 15:22

Remove redundant loop over chat histories

85b54f4

Fix chat history internal state type

478d5ce

fix typo

fa7b3f5

Remove chat history state reset method

7454ef8

yatarkan requested a review from sgonorov as a code owner December 22, 2025 11:34

yatarkan requested a review from popovaan as a code owner December 22, 2025 11:34

yatarkan requested a review from Wovchena December 22, 2025 11:36

Wovchena requested a review from Copilot December 22, 2025 11:39

Wovchena approved these changes Dec 22, 2025

View reviewed changes

Copilot AI reviewed Dec 22, 2025

View reviewed changes

src/python/py_vlm_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/pipeline.cpp Show resolved Hide resolved

src/cpp/src/visual_language/chat_history_state.hpp Show resolved Hide resolved

src/python/openvino_genai/py_openvino_genai.pyi Show resolved Hide resolved

yatarkan added 3 commits December 22, 2025 16:08

Fix set_internal_state signature

b092807

Fix whitespace

10a85c3

Merge branch 'master' into yt/chat-history-vlm

a6f8319

Copilot AI review requested due to automatic review settings December 23, 2025 13:30

Copilot AI reviewed Dec 23, 2025

View reviewed changes

yatarkan added 2 commits December 24, 2025 13:36

Merge branch 'master' into yt/chat-history-vlm

f6e4d8c

Swap chat types in test to ensure clenup in the end

b71002c

Copilot AI review requested due to automatic review settings December 24, 2025 12:24

Copilot AI reviewed Dec 24, 2025

View reviewed changes

yatarkan mentioned this pull request Jan 13, 2026

Support ChatHistory in VLM Generate Method #3182

Merged

3 tasks

yatarkan closed this Jan 16, 2026



	// Forward declaration for ChatHistory internal state management

	chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
	chat_history_state->video_sequence.insert(chat_history_state->video_sequence.end(), video_sequence.begin(), video_sequence.end());
	// NOTE: vision_count stores pair<video_count, image_count> as defined in chat_history_state.hpp.

Conversation

yatarkan commented Dec 16, 2025

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

yatarkan commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants