VLM concurrency fix by mzegla · Pull Request #2010 · openvinotoolkit/openvino.genai

mzegla · 2025-04-07T14:47:19Z

Summary of changes:

move resetting embeds_pos outside requests loop, so whole generated tensor is used
extending embeddings model infer request reservation scope since it's used both in add_request and step

popovaan

Added minor comments to remove debug print, the rest looks good to me.

src/cpp/src/icontinuous_batching.cpp

src/cpp/src/model_runner.hpp

Co-authored-by: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com>

src/cpp/src/sequence_group.hpp

atobiszei · 2025-04-08T08:08:58Z

src/cpp/src/visual_language/embedding_model.hpp

-    ov::Tensor infer(const ov::Tensor& input_idx, bool return_remote_tensor=false);
+    // We have getter for the request queue, so we can reserve request outside of infer scope
+    // Tensor produced by infer is stored in the request and used further in the pipeline, so we can't free it right after infer call
+    std::unique_ptr<CircularBufferQueue<EmbeddingsRequest>>& get_request_queue();


I think users of that class does not actually need unique_ptr - you don't want give them right to eg. reset ptr. Change to returning reference & extract address of queue in places where needed.

CircularBufferQueueElementGuard shouldn't actually take ptr but just reference to queue, since it is not supposed to do anything related to ptr (reset, delete etc). @dkalinowski

Sounds good, but since it's not a blocker I would rather postpone making such change and have it merged to master.
Current commit on this PR has been somewhat checked in terms of accuracy and under heavy load giving good results and I would prefer not rerunning the whole thing if possible.

Porting: #2010

Porting: openvinotoolkit/openvino.genai#2010

Porting: openvinotoolkit#2010

mzegla added 4 commits April 4, 2025 16:48

print appended embeddings size and last generated id

5a514b1

move reseting embeds position counter outside of the loop

ea7f608

print input ids len

92874f2

extend embeddings tensor reservation

c4b61f7

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) no-match-files labels Apr 7, 2025

popovaan reviewed Apr 7, 2025

View reviewed changes

src/cpp/src/icontinuous_batching.cpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

Apply suggestions from code review

95dbd33

Co-authored-by: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com>

github-actions bot removed the no-match-files label Apr 7, 2025

popovaan reviewed Apr 7, 2025

View reviewed changes

src/cpp/src/sequence_group.hpp Outdated Show resolved Hide resolved

popovaan approved these changes Apr 7, 2025

View reviewed changes

mzegla commented Apr 7, 2025

View reviewed changes

src/cpp/src/sequence_group.hpp Outdated Show resolved Hide resolved

Update src/cpp/src/sequence_group.hpp

8de2249

mzegla marked this pull request as ready for review April 7, 2025 15:01

popovaan requested a review from ilya-lavrenov April 8, 2025 07:46

atobiszei reviewed Apr 8, 2025

View reviewed changes

ilya-lavrenov added this to the 2025.1 milestone Apr 8, 2025

ilya-lavrenov approved these changes Apr 8, 2025

View reviewed changes

ilya-lavrenov assigned ilya-lavrenov and popovaan Apr 8, 2025

dtrawins approved these changes Apr 8, 2025

View reviewed changes

mzegla mentioned this pull request Apr 8, 2025

Fix VLM concurrency #2022

Merged

Wovchena merged commit 6545b75 into openvinotoolkit:releases/2025/1 Apr 8, 2025
46 of 56 checks passed

Wovchena mentioned this pull request Apr 10, 2025

[JS] Prepare to publish openvino-genai-node@2025.1.0-preview to npm #2033

Merged

github-merge-queue bot pushed a commit that referenced this pull request Apr 10, 2025

Fix VLM concurrency (#2022)

3c5da55

Porting: #2010

adricwht pushed a commit to adricwht/genai that referenced this pull request Apr 11, 2025

Fix VLM concurrency (#2022)

e2b675f

Porting: openvinotoolkit/openvino.genai#2010

apram0d pushed a commit to apram0d/openvino.genai that referenced this pull request Apr 28, 2025

Fix VLM concurrency (openvinotoolkit#2022)

9ee034b

Porting: openvinotoolkit#2010

mzegla deleted the 2025_1_verbose branch October 30, 2025 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM concurrency fix#2010

VLM concurrency fix#2010
Wovchena merged 6 commits intoopenvinotoolkit:releases/2025/1from
mzegla:2025_1_verbose

mzegla commented Apr 7, 2025 •

edited

Loading

Uh oh!

popovaan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atobiszei Apr 8, 2025

Uh oh!

mzegla Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mzegla commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

popovaan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atobiszei Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

mzegla Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mzegla commented Apr 7, 2025 •

edited

Loading