feat(multimodal): add media caching support with UUID-based lookup by vincentzed · Pull Request #4 · vincentzed/vllm

vincentzed · 2025-08-02T19:38:56Z

Introduce global media cache for images, audio, and video
Add UUID field to multimodal content part parameters
Modify parsers to accept optional UUIDs for caching
Implement synchronous and asynchronous cache-aware fetching
Create MediaCache class with LRU eviction policy
Support content deduplication via UUID or URL hashing

Original Ticket:
Motivation.

The current processing flow of media inputs (image/video/audio/embeddings) has several inefficiencies when media is used repeatedly (e.g. multi-turn or when the same media is used in different requests). The current processing flow looks like this when a media is provided in the input:

MediaConnector runs to download the input (if a remote URL is provided), incurring network delays
MediaConnector loads the downloaded bytes into python objects (e.g. PIL.Image, audio/video decoding), incurring CPU delays
BaseMultiModalProcessor then hashes the entire media using its full bytes to do cache lookup from ProcessingCache, incurring CPU delays

All three are run before any cache is checked, therefore repeated media still incur these costs even if there’s a cache hit later on. Additionally, decoding long video & hashing based on all bytes can be slow (millisecs to tens of millisecs) and inefficient. For media that are sent repeatedly, we want to bypass all three steps. Proposed Change.
Proposal

Add a user provided “uuid” field in media inputs as the media’s identifier & hash key
Example:

{"type": "image_url", "image_url": {"url": image_url}, "uuid": “abcde”}
{"type": "image_embeds", "image_embeds": “xyz”, "uuid": “abcde”}}

For URL inputs without a uuid set, hash only the URL instead of the entire bytes. For other inputs, we fall back to full hashing of bytes. We should then allow empty media inputs if uuids are provided (i.e. if the client expects a cache hit).
During input processing, first check ProcessingCache to see if any media is already cached, and bypass MediaConnector altogether on cache hit.

Potential Issues

Compatibility with OAI API
OAI API currently doesn’t support user-provided uuids, so adding this field would introduce more discrepancy between the APIs. However, given that this is an additional field, the impact should be minimal.

Race Condition Between Cache Lookup Cache Read
(Needs confirmation). Given that the cache is checked when parsing contents but the lookup is performed later, we could potentially have a cache miss after the initial check (i.e. eviction between lookup & cache read). We could just fail the one single request if that happens.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

- Introduce global media cache for images, audio, and video - Add UUID field to multimodal content part parameters - Modify parsers to accept optional UUIDs for caching - Implement synchronous and asynchronous cache-aware fetching - Create MediaCache class with LRU eviction policy - Support content deduplication via UUID or URL hashing Original Ticket: Motivation. The current processing flow of media inputs (image/video/audio/embeddings) has several inefficiencies when media is used repeatedly (e.g. multi-turn or when the same media is used in different requests). The current processing flow looks like this when a media is provided in the input: MediaConnector runs to download the input (if a remote URL is provided), incurring network delays MediaConnector loads the downloaded bytes into python objects (e.g. PIL.Image, audio/video decoding), incurring CPU delays BaseMultiModalProcessor then hashes the entire media using its full bytes to do cache lookup from ProcessingCache, incurring CPU delays All three are run before any cache is checked, therefore repeated media still incur these costs even if there’s a cache hit later on. Additionally, decoding long video & hashing based on all bytes can be slow (millisecs to tens of millisecs) and inefficient. For media that are sent repeatedly, we want to bypass all three steps. Proposed Change. Proposal Add a user provided “uuid” field in media inputs as the media’s identifier & hash key Example: {"type": "image_url", "image_url": {"url": image_url}, "uuid": “abcde”} {"type": "image_embeds", "image_embeds": “xyz”, "uuid": “abcde”}} For URL inputs without a uuid set, hash only the URL instead of the entire bytes. For other inputs, we fall back to full hashing of bytes. We should then allow empty media inputs if uuids are provided (i.e. if the client expects a cache hit). During input processing, first check ProcessingCache to see if any media is already cached, and bypass MediaConnector altogether on cache hit. Potential Issues Compatibility with OAI API OAI API currently doesn’t support user-provided uuids, so adding this field would introduce more discrepancy between the APIs. However, given that this is an additional field, the impact should be minimal. Race Condition Between Cache Lookup Cache Read (Needs confirmation). Given that the cache is checked when parsing contents but the lookup is performed later, we could potentially have a cache miss after the initial check (i.e. eviction between lookup & cache read). We could just fail the one single request if that happens.

github-actions · 2025-08-02T19:39:09Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vincentzed · 2025-08-02T19:39:16Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a global media cache to improve performance for repeated media inputs. The changes are generally well-structured, but there are a few key areas for improvement. The new MediaCache class is not thread-safe, which is a critical issue for a server environment. There is also significant code duplication in the caching logic for different media types, which should be refactored for better maintainability. Finally, caching support is missing for PIL images and image embeddings, which is inconsistent with the PR's goals.

gemini-code-assist · 2025-08-02T19:41:41Z

vllm/multimodal/media_cache.py

+from typing import Optional, Union
+import hashlib
+from vllm.utils import LRUCache, GiB_bytes
+from vllm.logger import init_logger
+
+logger = init_logger(__name__)
+
+MediaData = Union[object, str, bytes]  # Can be image, audio, video, embeddings, etc.
+
+class MediaCache:
+    """
+    Global cache for media data, using UUID or content hash as key.
+    This allows reusing media data across multiple requests.
+    """
+
+    def __init__(self, capacity_gb: float = 1.0) -> None:
+        self._cache = LRUCache[str, MediaData](
+            capacity=int(GiB_bytes * capacity_gb)
+        )
+
+    def _get_url_hash(self, url: str) -> str:
+        """Generate a hash for a URL to use as cache key."""
+        return hashlib.sha256(url.encode()).hexdigest()
+
+    def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]:
+        """
+        Get media data from cache.
+
+        Args:
+            uuid: User-provided UUID for the media
+            url: URL of the media (used as fallback key if UUID not provided)
+
+        Returns:
+            Cached media data if found, None otherwise
+        """
+        if uuid:
+            cache_key = uuid
+        elif url:
+            cache_key = self._get_url_hash(url)
+        else:
+            return None
+
+        return self._cache.get(cache_key)
+
+    def put(self, data: MediaData, uuid: Optional[str] = None, url: Optional[str] = None) -> str:
+        """
+        Store media data in cache.
+
+        Args:
+            data: The media data to cache
+            uuid: User-provided UUID for the media
+            url: URL of the media (used to generate key if UUID not provided)
+
+        Returns:
+            The cache key used
+        """
+        if uuid:
+            cache_key = uuid
+        elif url:
+            cache_key = self._get_url_hash(url)
+        else:
+            raise ValueError("Either uuid or url must be provided")
+
+        self._cache[cache_key] = data
+        return cache_key
+
+    def clear(self) -> None:
+        """Clear all cached media data."""
+        self._cache.clear()


The MediaCache is used as a global singleton and will be accessed from multiple threads in a server environment. The current implementation using LRUCache is not thread-safe. Concurrent access to get, put, and clear methods can lead to race conditions and corrupt the internal state of the cache.

To prevent this, you should add a threading.Lock to synchronize access to the cache.

from typing import Optional, Union import hashlib import threading from vllm.utils import LRUCache, GiB_bytes from vllm.logger import init_logger logger = init_logger(__name__) MediaData = Union[object, str, bytes] # Can be image, audio, video, embeddings, etc. class MediaCache: """ Global cache for media data, using UUID or content hash as key. This allows reusing media data across multiple requests. """ def __init__(self, capacity_gb: float = 1.0) -> None: self._cache = LRUCache[str, MediaData]( capacity=int(GiB_bytes * capacity_gb) ) self._lock = threading.Lock() def _get_url_hash(self, url: str) -> str: """Generate a hash for a URL to use as cache key.""" return hashlib.sha256(url.encode()).hexdigest() def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]: """ Get media data from cache. Args: uuid: User-provided UUID for the media url: URL of the media (used as fallback key if UUID not provided) Returns: Cached media data if found, None otherwise """ if uuid: cache_key = uuid elif url: cache_key = self._get_url_hash(url) else: return None with self._lock: return self._cache.get(cache_key) def put(self, data: MediaData, uuid: Optional[str] = None, url: Optional[str] = None) -> str: """ Store media data in cache. Args: data: The media data to cache uuid: User-provided UUID for the media url: URL of the media (used to generate key if UUID not provided) Returns: The cache key used """ if uuid: cache_key = uuid elif url: cache_key = self._get_url_hash(url) else: raise ValueError("Either uuid or url must be provided") with self._lock: self._cache[cache_key] = data return cache_key def clear(self) -> None: """Clear all cached media data.""" with self._lock: self._cache.clear()

gemini-code-assist

Code Review

This pull request introduces media caching support with UUID-based lookup to enhance the efficiency of multimodal content processing. The changes include adding UUID fields to content part parameters and implementing cache-aware fetching. The implementation looks good, but the exception messages could be improved to be more informative.

gemini-code-assist · 2025-08-02T19:44:11Z

vllm/entrypoints/chat_utils.py

+                # No URL provided but UUID given - expecting cache hit
+                raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided")


If no URL is provided but a UUID is given, a ValueError is raised, but the exception message could be improved to be more informative about the expected media type.

raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided. Please provide the image URL for the first time.")

gemini-code-assist · 2025-08-02T19:44:11Z

vllm/entrypoints/chat_utils.py

+                # No URL provided but UUID given - expecting cache hit
+                raise ValueError(f"Audio with UUID '{uuid}' not found in cache and no URL provided")


If no URL is provided but a UUID is given, a ValueError is raised, but the exception message could be improved to be more informative about the expected media type.

raise ValueError(f"Audio with UUID '{uuid}' not found in cache and no URL provided. Please provide the audio URL for the first time.")

gemini-code-assist · 2025-08-02T19:44:11Z

vllm/entrypoints/chat_utils.py

+                # No URL provided but UUID given - expecting cache hit
+                raise ValueError(f"Video with UUID '{uuid}' not found in cache and no URL provided")


If no URL is provided but a UUID is given, a ValueError is raised, but the exception message could be improved to be more informative about the expected media type.

raise ValueError(f"Video with UUID '{uuid}' not found in cache and no URL provided. Please provide the video URL for the first time.")

gemini-code-assist · 2025-08-02T19:44:12Z

vllm/entrypoints/chat_utils.py

+                # No URL provided but UUID given - expecting cache hit
+                raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided")


If no URL is provided but a UUID is given, a ValueError is raised, but the exception message could be improved to be more informative about the expected media type.

raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided. Please provide the image URL for the first time.")

gemini-code-assist · 2025-08-02T19:44:12Z

vllm/multimodal/media_cache.py

+        elif url:
+            cache_key = self._get_url_hash(url)
+        else:
+            raise ValueError("Either uuid or url must be provided")


Consider raising a more specific exception type than ValueError, such as KeyError, to better reflect the nature of the error (cache miss).

raise KeyError("Either uuid or url must be provided")

vincentzed · 2025-08-02T19:49:17Z

/gemini is there an existing file that is ~ to mediacache i.e not requiring a new file?

gemini-code-assist · 2025-08-02T19:49:33Z

@vincentzed, that's a good question! The vllm/multimodal/media_cache.py file was introduced as a new component to specifically implement a global media cache for images, audio, and video, as outlined in the pull request description.

While the project may have other caching mechanisms (such as the ProcessingCache mentioned in the PR description), this new MediaCache class is designed to address specific inefficiencies related to repeated media inputs. Its purpose is to provide a dedicated, UUID-based or URL-hashed caching layer that can be checked before incurring costs like network downloads and decoding. This allows for content deduplication and significant performance improvements for multimodal inputs.

So, to directly answer your question, this new file was indeed necessary to introduce this specific media caching functionality and optimize the handling of multimodal data.

cubic-dev-ai

cubic analysis

2 issues found across 2 files • Review in cubic

_{React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-ai · 2025-08-02T19:51:15Z

vllm/multimodal/media_cache.py

+
+    def __init__(self, capacity_gb: float = 1.0) -> None:
+        self._cache = LRUCache[str, MediaData](
+            capacity=int(GiB_bytes * capacity_gb)


The LRUCache is initialised without a getsizeof function, so eviction is based on item count instead of bytes, defeating the intended 1 GiB memory cap.

Prompt for AI agents

Address the following comment on vllm/multimodal/media_cache.py at line 21: <comment>The LRUCache is initialised without a `getsizeof` function, so eviction is based on item count instead of bytes, defeating the intended 1 GiB memory cap.</comment> <file context> @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: Apache-2.0 +# SPDX-FileCopyrightText: Copyright contributors to the vLLM project + +from typing import Optional, Union +import hashlib +from vllm.utils import LRUCache, GiB_bytes +from vllm.logger import init_logger + +logger = init_logger(__name__) </file context>

cubic-dev-ai · 2025-08-02T19:51:15Z

vllm/multimodal/media_cache.py

+        """Generate a hash for a URL to use as cache key."""
+        return hashlib.sha256(url.encode()).hexdigest()
+
+    def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]:


uuid parameter is declared Optional but still mandatory, leading to runtime errors when only a URL is provided.

Prompt for AI agents

Address the following comment on vllm/multimodal/media_cache.py at line 28: <comment>`uuid` parameter is declared Optional but still mandatory, leading to runtime errors when only a URL is provided.</comment> <file context> @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: Apache-2.0 +# SPDX-FileCopyrightText: Copyright contributors to the vLLM project + +from typing import Optional, Union +import hashlib +from vllm.utils import LRUCache, GiB_bytes +from vllm.logger import init_logger + +logger = init_logger(__name__) </file context>

Suggested change

def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]:

def get(self, uuid: Optional[str] = None, url: Optional[str] = None) -> Optional[MediaData]:

github-actions · 2025-11-01T02:50:55Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2025-12-01T03:19:32Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

gemini-code-assist bot reviewed Aug 2, 2025

View reviewed changes

cubic-dev-ai bot reviewed Aug 2, 2025

View reviewed changes

github-actions bot added the stale label Nov 1, 2025

github-actions bot closed this Dec 1, 2025

		# No URL provided but UUID given - expecting cache hit
		raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided")

		# No URL provided but UUID given - expecting cache hit
		raise ValueError(f"Audio with UUID '{uuid}' not found in cache and no URL provided")

		# No URL provided but UUID given - expecting cache hit
		raise ValueError(f"Video with UUID '{uuid}' not found in cache and no URL provided")

	def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]:
	def get(self, uuid: Optional[str] = None, url: Optional[str] = None) -> Optional[MediaData]:

Conversation

vincentzed commented Aug 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

vincentzed commented Aug 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

vincentzed commented Aug 2, 2025

Uh oh!

gemini-code-assist bot commented Aug 2, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

cubic analysis

Uh oh!

cubic-dev-ai bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vincentzed commented Aug 2, 2025 •

edited by github-actions bot

Loading