-
Notifications
You must be signed in to change notification settings - Fork 86
[GuideLLM Refactor] mock server package creation #357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GuideLLM Refactor] mock server package creation #357
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduces a comprehensive mock server implementation that simulates OpenAI and vLLM APIs with configurable timing characteristics and response patterns. This enables realistic performance testing and validation of GuideLLM benchmarking workflows without requiring actual model deployments.
- Modular architecture with configuration, handlers, models, server, and utilities components
- HTTP request handlers for OpenAI-compatible endpoints with streaming and non-streaming support
- High-performance Sanic-based server with CORS support and proper error handling
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/guidellm/mock_server/init.py | Package initialization exposing main MockServer and MockServerConfig classes |
src/guidellm/mock_server/config.py | Pydantic-based configuration management with environment variable support |
src/guidellm/mock_server/handlers/init.py | Handler module initialization exposing request handlers |
src/guidellm/mock_server/handlers/chat_completions.py | OpenAI chat completions endpoint implementation with streaming support |
src/guidellm/mock_server/handlers/completions.py | Legacy OpenAI completions endpoint with timing simulation |
src/guidellm/mock_server/handlers/tokenizer.py | vLLM-compatible tokenization and detokenization endpoints |
src/guidellm/mock_server/models.py | Pydantic models for request/response validation and API compatibility |
src/guidellm/mock_server/server.py | Sanic-based HTTP server with middleware, routes, and error handling |
src/guidellm/mock_server/utils.py | Mock tokenizer and text generation utilities for testing |
tests/unit/mock_server/init.py | Test package initialization |
tests/unit/mock_server/test_server.py | Comprehensive integration tests using real HTTP server instances |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
2515465
to
4834767
Compare
841e82c
to
b1cce19
Compare
Signed-off-by: Mark Kurtz <[email protected]>
ca2be85
to
bb98193
Compare
…nto features/refactor/base-draft [GuideLLM Refactor] mock server package creation #357
Summary
Introduces a comprehensive mock server implementation that simulates OpenAI and vLLM APIs with configurable timing characteristics and response patterns. The mock server enables realistic performance testing and validation of GuideLLM benchmarking workflows without requiring actual model deployments, supporting both streaming and non-streaming endpoints with proper token counting, latency simulation (TTFT/ITL), and error handling.
Details
mock_server
package with modular architecture including configuration, handlers, models, server, and utilitiesMockServerConfig
with Pydantic settings for centralized configuration management supporting environment variablesChatCompletionsHandler
for/v1/chat/completions
with streaming supportCompletionsHandler
for/v1/completions
legacy endpointTokenizerHandler
for vLLM-compatible/tokenize
and/detokenize
endpointsTest Plan
Related Issues
Use of AI
## WRITTEN BY AI ##
)