⚡️ Speed up method SharedHealthCheckManager.get_cached_health_check_results
by 15%
#7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 15% (0.15x) speedup for
SharedHealthCheckManager.get_cached_health_check_results
inlitellm/proxy/health_check_utils/shared_health_check_manager.py
⏱️ Runtime :
860 milliseconds
→746 milliseconds
(best of40
runs)📝 Explanation and details
The optimized code achieves a 15% runtime improvement by moving potentially blocking JSON parsing operations off the main async event loop using
asyncio.to_thread()
.Key Optimizations:
Non-blocking JSON parsing in SharedHealthCheckManager: Replaced synchronous
json.loads(cached_data)
withawait asyncio.to_thread(json.loads, cached_data)
when parsing string cache data. This prevents the event loop from being blocked during JSON deserialization of potentially large health check payloads.Non-blocking cache parsing in RedisCache: Changed
response = self._get_cache_logic(cached_response)
toresponse = await asyncio.to_thread(self._get_cache_logic, cached_response)
in theasync_get_cache
method. This moves the cache parsing logic (which includes JSON deserialization and type conversion) to a thread pool.Why This Improves Performance:
Test Case Performance:
The optimization shows consistent benefits across all test scenarios, especially for concurrent access patterns (like the 100+ concurrent request tests) where maintaining event loop availability is crucial for overall system throughput. While throughput remains the same at 49,480 operations/second, the 15% runtime reduction indicates more efficient resource utilization and reduced latency per operation.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-SharedHealthCheckManager.get_cached_health_check_results-mh2kjryl
and push.