VLM concurrency fix#2010
Conversation
popovaan
left a comment
There was a problem hiding this comment.
Added minor comments to remove debug print, the rest looks good to me.
Co-authored-by: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com>
| ov::Tensor infer(const ov::Tensor& input_idx, bool return_remote_tensor=false); | ||
| // We have getter for the request queue, so we can reserve request outside of infer scope | ||
| // Tensor produced by infer is stored in the request and used further in the pipeline, so we can't free it right after infer call | ||
| std::unique_ptr<CircularBufferQueue<EmbeddingsRequest>>& get_request_queue(); |
There was a problem hiding this comment.
I think users of that class does not actually need unique_ptr - you don't want give them right to eg. reset ptr. Change to returning reference & extract address of queue in places where needed.
CircularBufferQueueElementGuard shouldn't actually take ptr but just reference to queue, since it is not supposed to do anything related to ptr (reset, delete etc). @dkalinowski
There was a problem hiding this comment.
Sounds good, but since it's not a blocker I would rather postpone making such change and have it merged to master.
Current commit on this PR has been somewhat checked in terms of accuracy and under heavy load giving good results and I would prefer not rerunning the whole thing if possible.
6545b75
into
openvinotoolkit:releases/2025/1
Porting: #2010
Summary of changes:
embeds_posoutside requests loop, so whole generated tensor is usedadd_requestandstep