feat(multimodal): add media caching support with UUID-based lookup#4
feat(multimodal): add media caching support with UUID-based lookup#4vincentzed wants to merge 1 commit intomainfrom
Conversation
- Introduce global media cache for images, audio, and video
- Add UUID field to multimodal content part parameters
- Modify parsers to accept optional UUIDs for caching
- Implement synchronous and asynchronous cache-aware fetching
- Create MediaCache class with LRU eviction policy
- Support content deduplication via UUID or URL hashing
Original Ticket:
Motivation.
The current processing flow of media inputs (image/video/audio/embeddings) has several inefficiencies when media is used repeatedly (e.g. multi-turn or when the same media is used in different requests). The current processing flow looks like this when a media is provided in the input:
MediaConnector runs to download the input (if a remote URL is provided), incurring network delays
MediaConnector loads the downloaded bytes into python objects (e.g. PIL.Image, audio/video decoding), incurring CPU delays
BaseMultiModalProcessor then hashes the entire media using its full bytes to do cache lookup from ProcessingCache, incurring CPU delays
All three are run before any cache is checked, therefore repeated media still incur these costs even if there’s a cache hit later on. Additionally, decoding long video & hashing based on all bytes can be slow (millisecs to tens of millisecs) and inefficient. For media that are sent repeatedly, we want to bypass all three steps.
Proposed Change.
Proposal
Add a user provided “uuid” field in media inputs as the media’s identifier & hash key
Example:
{"type": "image_url", "image_url": {"url": image_url}, "uuid": “abcde”}
{"type": "image_embeds", "image_embeds": “xyz”, "uuid": “abcde”}}
For URL inputs without a uuid set, hash only the URL instead of the entire bytes. For other inputs, we fall back to full hashing of bytes. We should then allow empty media inputs if uuids are provided (i.e. if the client expects a cache hit).
During input processing, first check ProcessingCache to see if any media is already cached, and bypass MediaConnector altogether on cache hit.
Potential Issues
Compatibility with OAI API
OAI API currently doesn’t support user-provided uuids, so adding this field would introduce more discrepancy between the APIs. However, given that this is an additional field, the impact should be minimal.
Race Condition Between Cache Lookup Cache Read
(Needs confirmation). Given that the cache is checked when parsing contents but the lookup is performed later, we could potentially have a cache miss after the initial check (i.e. eviction between lookup & cache read). We could just fail the one single request if that happens.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a global media cache to improve performance for repeated media inputs. The changes are generally well-structured, but there are a few key areas for improvement. The new MediaCache class is not thread-safe, which is a critical issue for a server environment. There is also significant code duplication in the caching logic for different media types, which should be refactored for better maintainability. Finally, caching support is missing for PIL images and image embeddings, which is inconsistent with the PR's goals.
| from typing import Optional, Union | ||
| import hashlib | ||
| from vllm.utils import LRUCache, GiB_bytes | ||
| from vllm.logger import init_logger | ||
|
|
||
| logger = init_logger(__name__) | ||
|
|
||
| MediaData = Union[object, str, bytes] # Can be image, audio, video, embeddings, etc. | ||
|
|
||
| class MediaCache: | ||
| """ | ||
| Global cache for media data, using UUID or content hash as key. | ||
| This allows reusing media data across multiple requests. | ||
| """ | ||
|
|
||
| def __init__(self, capacity_gb: float = 1.0) -> None: | ||
| self._cache = LRUCache[str, MediaData]( | ||
| capacity=int(GiB_bytes * capacity_gb) | ||
| ) | ||
|
|
||
| def _get_url_hash(self, url: str) -> str: | ||
| """Generate a hash for a URL to use as cache key.""" | ||
| return hashlib.sha256(url.encode()).hexdigest() | ||
|
|
||
| def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]: | ||
| """ | ||
| Get media data from cache. | ||
|
|
||
| Args: | ||
| uuid: User-provided UUID for the media | ||
| url: URL of the media (used as fallback key if UUID not provided) | ||
|
|
||
| Returns: | ||
| Cached media data if found, None otherwise | ||
| """ | ||
| if uuid: | ||
| cache_key = uuid | ||
| elif url: | ||
| cache_key = self._get_url_hash(url) | ||
| else: | ||
| return None | ||
|
|
||
| return self._cache.get(cache_key) | ||
|
|
||
| def put(self, data: MediaData, uuid: Optional[str] = None, url: Optional[str] = None) -> str: | ||
| """ | ||
| Store media data in cache. | ||
|
|
||
| Args: | ||
| data: The media data to cache | ||
| uuid: User-provided UUID for the media | ||
| url: URL of the media (used to generate key if UUID not provided) | ||
|
|
||
| Returns: | ||
| The cache key used | ||
| """ | ||
| if uuid: | ||
| cache_key = uuid | ||
| elif url: | ||
| cache_key = self._get_url_hash(url) | ||
| else: | ||
| raise ValueError("Either uuid or url must be provided") | ||
|
|
||
| self._cache[cache_key] = data | ||
| return cache_key | ||
|
|
||
| def clear(self) -> None: | ||
| """Clear all cached media data.""" | ||
| self._cache.clear() |
There was a problem hiding this comment.
The MediaCache is used as a global singleton and will be accessed from multiple threads in a server environment. The current implementation using LRUCache is not thread-safe. Concurrent access to get, put, and clear methods can lead to race conditions and corrupt the internal state of the cache.
To prevent this, you should add a threading.Lock to synchronize access to the cache.
from typing import Optional, Union
import hashlib
import threading
from vllm.utils import LRUCache, GiB_bytes
from vllm.logger import init_logger
logger = init_logger(__name__)
MediaData = Union[object, str, bytes] # Can be image, audio, video, embeddings, etc.
class MediaCache:
"""
Global cache for media data, using UUID or content hash as key.
This allows reusing media data across multiple requests.
"""
def __init__(self, capacity_gb: float = 1.0) -> None:
self._cache = LRUCache[str, MediaData](
capacity=int(GiB_bytes * capacity_gb)
)
self._lock = threading.Lock()
def _get_url_hash(self, url: str) -> str:
"""Generate a hash for a URL to use as cache key."""
return hashlib.sha256(url.encode()).hexdigest()
def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]:
"""
Get media data from cache.
Args:
uuid: User-provided UUID for the media
url: URL of the media (used as fallback key if UUID not provided)
Returns:
Cached media data if found, None otherwise
"""
if uuid:
cache_key = uuid
elif url:
cache_key = self._get_url_hash(url)
else:
return None
with self._lock:
return self._cache.get(cache_key)
def put(self, data: MediaData, uuid: Optional[str] = None, url: Optional[str] = None) -> str:
"""
Store media data in cache.
Args:
data: The media data to cache
uuid: User-provided UUID for the media
url: URL of the media (used to generate key if UUID not provided)
Returns:
The cache key used
"""
if uuid:
cache_key = uuid
elif url:
cache_key = self._get_url_hash(url)
else:
raise ValueError("Either uuid or url must be provided")
with self._lock:
self._cache[cache_key] = data
return cache_key
def clear(self) -> None:
"""Clear all cached media data."""
with self._lock:
self._cache.clear()There was a problem hiding this comment.
Code Review
This pull request introduces media caching support with UUID-based lookup to enhance the efficiency of multimodal content processing. The changes include adding UUID fields to content part parameters and implementing cache-aware fetching. The implementation looks good, but the exception messages could be improved to be more informative.
| # No URL provided but UUID given - expecting cache hit | ||
| raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided") |
There was a problem hiding this comment.
| # No URL provided but UUID given - expecting cache hit | ||
| raise ValueError(f"Audio with UUID '{uuid}' not found in cache and no URL provided") |
There was a problem hiding this comment.
| # No URL provided but UUID given - expecting cache hit | ||
| raise ValueError(f"Video with UUID '{uuid}' not found in cache and no URL provided") |
There was a problem hiding this comment.
| # No URL provided but UUID given - expecting cache hit | ||
| raise ValueError(f"Image with UUID '{uuid}' not found in cache and no URL provided") |
There was a problem hiding this comment.
| elif url: | ||
| cache_key = self._get_url_hash(url) | ||
| else: | ||
| raise ValueError("Either uuid or url must be provided") |
|
/gemini is there an existing file that is ~ to mediacache i.e not requiring a new file? |
|
@vincentzed, that's a good question! The While the project may have other caching mechanisms (such as the So, to directly answer your question, this new file was indeed necessary to introduce this specific media caching functionality and optimize the handling of multimodal data. |
There was a problem hiding this comment.
cubic analysis
2 issues found across 2 files • Review in cubic
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.
|
|
||
| def __init__(self, capacity_gb: float = 1.0) -> None: | ||
| self._cache = LRUCache[str, MediaData]( | ||
| capacity=int(GiB_bytes * capacity_gb) |
There was a problem hiding this comment.
The LRUCache is initialised without a getsizeof function, so eviction is based on item count instead of bytes, defeating the intended 1 GiB memory cap.
Prompt for AI agents
Address the following comment on vllm/multimodal/media_cache.py at line 21:
<comment>The LRUCache is initialised without a `getsizeof` function, so eviction is based on item count instead of bytes, defeating the intended 1 GiB memory cap.</comment>
<file context>
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+
+from typing import Optional, Union
+import hashlib
+from vllm.utils import LRUCache, GiB_bytes
+from vllm.logger import init_logger
+
+logger = init_logger(__name__)
</file context>
| """Generate a hash for a URL to use as cache key.""" | ||
| return hashlib.sha256(url.encode()).hexdigest() | ||
|
|
||
| def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]: |
There was a problem hiding this comment.
uuid parameter is declared Optional but still mandatory, leading to runtime errors when only a URL is provided.
Prompt for AI agents
Address the following comment on vllm/multimodal/media_cache.py at line 28:
<comment>`uuid` parameter is declared Optional but still mandatory, leading to runtime errors when only a URL is provided.</comment>
<file context>
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+
+from typing import Optional, Union
+import hashlib
+from vllm.utils import LRUCache, GiB_bytes
+from vllm.logger import init_logger
+
+logger = init_logger(__name__)
</file context>
| def get(self, uuid: Optional[str], url: Optional[str] = None) -> Optional[MediaData]: | |
| def get(self, uuid: Optional[str] = None, url: Optional[str] = None) -> Optional[MediaData]: |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
Original Ticket:
Motivation.
The current processing flow of media inputs (image/video/audio/embeddings) has several inefficiencies when media is used repeatedly (e.g. multi-turn or when the same media is used in different requests). The current processing flow looks like this when a media is provided in the input:
All three are run before any cache is checked, therefore repeated media still incur these costs even if there’s a cache hit later on. Additionally, decoding long video & hashing based on all bytes can be slow (millisecs to tens of millisecs) and inefficient. For media that are sent repeatedly, we want to bypass all three steps. Proposed Change.
Proposal
{"type": "image_url", "image_url": {"url": image_url}, "uuid": “abcde”}
{"type": "image_embeds", "image_embeds": “xyz”, "uuid": “abcde”}}
Potential Issues
Compatibility with OAI API
OAI API currently doesn’t support user-provided uuids, so adding this field would introduce more discrepancy between the APIs. However, given that this is an additional field, the impact should be minimal.
Race Condition Between Cache Lookup Cache Read
(Needs confirmation). Given that the cache is checked when parsing contents but the lookup is performed later, we could potentially have a cache miss after the initial check (i.e. eviction between lookup & cache read). We could just fail the one single request if that happens.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Test Plan
Test Result
(Optional) Documentation Update