- 
                Notifications
    You must be signed in to change notification settings 
- Fork 37
Add failure injection mode to simulator #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @smarunich Thank you very much for the PR, failure generation support is a part of our roadmap, thanks for picking this up! Some more or less general comments, so I am writing them here and not in the code: We think that failure generation should not be a mode. (We have to update the explanation of  Please move failures.go to llm-d-inference-sim package. Please don’t call rand.Seed() - use  "Rate limit reached for %s in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1." and "The model ‘%s-nonexistent’ does not exist" can be defined as constants and reused. We may only have one function  In ErrorResponse  | 
8239ac1    to
    52edd56      
    Compare
  
    Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
) Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>
Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency. Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode. Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
aeb8bde    to
    08bcf08      
    Compare
  
    Signed-off-by: Sergey Marunich <[email protected]> KV cache and tokenization related configuration (llm-d#125) Signed-off-by: Ira <[email protected]> Publish kv-cache events (llm-d#126) * Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Use same version of tokenizer in both Dockerfile and Makefile (llm-d#132) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage. Signed-off-by: Sergey Marunich <[email protected]>
08bcf08    to
    106e276      
    Compare
  
    Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.
Signed-off-by: Sergey Marunich <[email protected]>
Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.
Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated
| @irar2 thank you a lot for your feedback, please take an initial look and let me know what you think i have incorporated your feedback, also adding this reference vllm-project/vllm#12886 to the  p.s. I am using the build with Envoy AI Gateway Demos for provider failover demo https://github.com/smarunich/envoy-ai-gateway-demos/tree/main/demos/03-provider-fallback also do have github actions run to showcase the run: https://github.com/smarunich/envoy-ai-gateway-demos/actions/runs/16978406193/job/48132952395?pr=3 | 
        
          
                pkg/llm-d-inference-sim/simulator.go
              
                Outdated
          
        
      | Param: nil, | ||
| // The first parameter can be either a string message or a FailureSpec | ||
| // isInjected indicates if this is an injected failure for logging purposes | ||
| func (s *VllmSimulator) sendCompletionError(ctx *fasthttp.RequestCtx, errorInfo interface{}, isInjected bool) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update according to the general comment
| @smarunich Thanks for the links and the changes. We really appreciate your contribution. We did some more digging and found out that even though the hierarchical error structure is already in the vLLM current code, but not in the latest release. So, we need to stay with the current structure for now, and upgrade it later. I created issue #135 for this. In addition,  I don't think there is a need in your FailureSpec struct, you can simply use the existing CompletionError. and call this constructor to create an error to pass to sendCompletionError and to create a map of predefinedFailures. Please run  Please see additional comments inside the code. | 
| @smarunich We are planning a release soon, and would like to include this feature in it. Will you have time to continue with the PR in the next couple of days? If your schedule doesn't allow this, please let us know and we will continue with this PR. | 
| 
 @irar2 I was off last week, let me resume by today and update the thread. | 
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change. Signed-off-by: Sergey Marunich <[email protected]>
| 
 @irar2 please feel free to take it forward, I have done few updates, but running out of cycles for this week as I have run into the conflicting priorities, I won't have cycle to work on this until very beginning of the next week, meanwhile I might miss on quality... don't want to delay you folks as truly appreciate your efforts, I would really appreciate if you can take this forward to completion! | 
| 
 @smarunich Thanks a lot! We will take it from here then. | 
Signed-off-by: Ira <[email protected]>
Signed-off-by: Ira Rosen <[email protected]>
| fixes #135 | 
        
          
                pkg/common/config.go
              
                Outdated
          
        
      | FakeMetrics *Metrics `yaml:"fake-metrics" json:"fake-metrics"` | ||
|  | ||
| // FailureInjectionRate is the probability (0-100) of injecting failures | ||
| FailureInjectionRate int `yaml:"failure-injection-rate"` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add json annotation
| ) | ||
| }) | ||
|  | ||
| Describe("Failure injection mode", func() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this test to failure_test please
| Hi Sergey @smarunich | 
Signed-off-by: Ira <[email protected]>
| @smarunich We added your signature to the extended comments of the squashed merge that we did. | 
| 
 thank you @irar2 @mayabar for taking this forward to completion, truly appreciated! | 
The initial PR for a discussion on
failuremode as doing some tests locally with client and gateway testing, wdyt in general from the mode standpoint? how it should be introduced? open to the feedback and work on it.Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.
This pull request adds a new "failure" simulation mode to the LLM-D inference simulator, enabling randomized injection of OpenAI-compatible API error responses for enhanced error-handling test scenarios. It introduces configuration options for controlling the failure injection rate and specifying which error types to inject, along with robust validation and documentation updates. The implementation includes a new error response system, supporting unit tests, and integration into the main completion request handler.
Simulator functionality expansion:
failuresimulation mode, which randomly injects OpenAI API-compatible error responses for testing purposes. This mode is selectable via themodeparameter and is documented inREADME.md. [1] [2]failure-injection-rate(controls probability of error injection) andfailure-types(specifies which error types to inject), with validation for allowed values and documentation updates. [1] [2] [3] [4] [5] [6] [7] [8] [9]Failure injection implementation:
pkg/common/failures.go, defining error types, failure injection logic, and random selection of error responses based on configuration.simulator.go, returning error responses when triggered. Added a dedicated method for sending structured error responses. [1] [2]Testing and validation:
pkg/common/failures_test.goto verify failure injection logic and error response generation.Other improvements:
simulator.goto use a consistent error response structure.