Skip to content

Conversation

@smarunich
Copy link
Contributor

The initial PR for a discussion on failure mode as doing some tests locally with client and gateway testing, wdyt in general from the mode standpoint? how it should be introduced? open to the feedback and work on it.

Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.

This pull request adds a new "failure" simulation mode to the LLM-D inference simulator, enabling randomized injection of OpenAI-compatible API error responses for enhanced error-handling test scenarios. It introduces configuration options for controlling the failure injection rate and specifying which error types to inject, along with robust validation and documentation updates. The implementation includes a new error response system, supporting unit tests, and integration into the main completion request handler.

Simulator functionality expansion:

  • Added a new failure simulation mode, which randomly injects OpenAI API-compatible error responses for testing purposes. This mode is selectable via the mode parameter and is documented in README.md. [1] [2]
  • Introduced new configuration parameters: failure-injection-rate (controls probability of error injection) and failure-types (specifies which error types to inject), with validation for allowed values and documentation updates. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Failure injection implementation:

  • Added pkg/common/failures.go, defining error types, failure injection logic, and random selection of error responses based on configuration.
  • Integrated failure injection into the main completion handler in simulator.go, returning error responses when triggered. Added a dedicated method for sending structured error responses. [1] [2]

Testing and validation:

  • Added unit tests in pkg/common/failures_test.go to verify failure injection logic and error response generation.

Other improvements:

  • Updated error response handling in simulator.go to use a consistent error response structure.
  • Minor fixes to test code for compatibility with new OpenAI Go SDK conventions. [1] [2] [3]

@irar2
Copy link
Collaborator

irar2 commented Aug 14, 2025

@smarunich Thank you very much for the PR, failure generation support is a part of our roadmap, thanks for picking this up!

Some more or less general comments, so I am writing them here and not in the code:

We think that failure generation should not be a mode. (We have to update the explanation of mode, it's misleading, it should be a completion response generation mode, and not the mode of the simulator).
failure-injection-rate = 0 will mean no failure injection
failure types - please define constants for the possible values

Please move failures.go to llm-d-inference-sim package.

Please don’t call rand.Seed() - use common.RandomInt which is protected by mutex and allows reproducible behavior.

"Rate limit reached for %s in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1." and "The model ‘%s-nonexistent’ does not exist" can be defined as constants and reused.

We may only have one function sendCompletionError instead of two (sendCompletionError and sendFailureResponse) which gets common.FailureSpec and boolean which defines whether this is an injected failure for logging.

In ErrorResponse object field is missing. In the documentation that we found, it should be present. Could you please share a link to the relevant documentation?

@smarunich smarunich force-pushed the failure-mode branch 3 times, most recently from 8239ac1 to 52edd56 Compare August 14, 2025 21:13
@smarunich smarunich marked this pull request as draft August 14, 2025 21:35
shmuelk and others added 15 commits August 14, 2025 17:39
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events

Signed-off-by: Ira <[email protected]>

* Fix lint errors

Signed-off-by: Ira <[email protected]>

* Review fixes

Signed-off-by: Ira <[email protected]>

* Sleep to allow prevous sub to close

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.

Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field.

Signed-off-by: Sergey Marunich <[email protected]>
Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency.

Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.

Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field.

Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events

Signed-off-by: Ira <[email protected]>

* Fix lint errors

Signed-off-by: Ira <[email protected]>

* Review fixes

Signed-off-by: Ira <[email protected]>

* Sleep to allow prevous sub to close

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
)

* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file

Signed-off-by: Maya Barnea <[email protected]>

* updates according PR's review

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode.

Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>

KV cache and tokenization related configuration (llm-d#125)

Signed-off-by: Ira <[email protected]>

Publish kv-cache events (llm-d#126)

* Publish kv-cache events

Signed-off-by: Ira <[email protected]>

* Fix lint errors

Signed-off-by: Ira <[email protected]>

* Review fixes

Signed-off-by: Ira <[email protected]>

* Sleep to allow prevous sub to close

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>

Use same version of tokenizer in both Dockerfile and Makefile (llm-d#132)

* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file

Signed-off-by: Maya Barnea <[email protected]>

* updates according PR's review

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>

Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.

Signed-off-by: Sergey Marunich <[email protected]>
Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.
Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.
Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated
@smarunich smarunich marked this pull request as ready for review August 14, 2025 22:26
@smarunich
Copy link
Contributor Author

smarunich commented Aug 14, 2025

@irar2 thank you a lot for your feedback, please take an initial look and let me know what you think i have incorporated your feedback, also adding this reference vllm-project/vllm#12886 to the ErrorResponse conversation... i am not contributing to the projects frequently, but do have a huge respect to what you are doing with this project so attempted to do my best, open for any modifications, etc.

p.s. I am using the build with Envoy AI Gateway Demos for provider failover demo https://github.com/smarunich/envoy-ai-gateway-demos/tree/main/demos/03-provider-fallback also do have github actions run to showcase the run: https://github.com/smarunich/envoy-ai-gateway-demos/actions/runs/16978406193/job/48132952395?pr=3

Param: nil,
// The first parameter can be either a string message or a FailureSpec
// isInjected indicates if this is an injected failure for logging purposes
func (s *VllmSimulator) sendCompletionError(ctx *fasthttp.RequestCtx, errorInfo interface{}, isInjected bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update according to the general comment

@irar2
Copy link
Collaborator

irar2 commented Aug 17, 2025

@smarunich Thanks for the links and the changes. We really appreciate your contribution.

We did some more digging and found out that even though the hierarchical error structure is already in the vLLM current code, but not in the latest release. So, we need to stay with the current structure for now, and upgrade it later. I created issue #135 for this.

In addition, code field is an int in vLLM, please don't change that, and it's the same code as the HTTP code.

I don't think there is a need in your FailureSpec struct, you can simply use the existing CompletionError.
Your ErrorTypes don't match the openai types.
It would be nice to have a constructor that will set the error type according to the code:

func NewCompletionError(message string, code int, param *string) CompletionError {
	errorType := ""
	switch code {
	case 400:
		errorType = "BadRequestError"
	case 401:
		errorType = "AuthenticationError"
	case 403:
		errorType = "PermissionDeniedError"
	case 404:
		errorType = "NotFoundError"
	case 422:
		errorType = "UnprocessableEntityError"
	case 429:
		errorType = "RateLimitError"
	default:
		if code >= 500 {
			errorType = "InternalServerError"
		} else {
			errorType = "APIConnectionError"
		}
	}
	return CompletionError{
		Object:  "error",
		Message: message,
		Code:    code,
		Type:    errorType,
		Param:   param,
	}
}

and call this constructor to create an error to pass to sendCompletionError and to create a map of predefinedFailures.

Please run make lint, our automatic testing fails because of the lint errors.

Please see additional comments inside the code.

@irar2
Copy link
Collaborator

irar2 commented Aug 24, 2025

@smarunich We are planning a release soon, and would like to include this feature in it. Will you have time to continue with the PR in the next couple of days? If your schedule doesn't allow this, please let us know and we will continue with this PR.

@smarunich
Copy link
Contributor Author

@smarunich We are planning a release soon, and would like to include this feature in it. Will you have time to continue with the PR in the next couple of days? If your schedule doesn't allow this, please let us know and we will continue with this PR.

@irar2 I was off last week, let me resume by today and update the thread.

Signed-off-by: Sergey Marunich <[email protected]>
Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change.

Signed-off-by: Sergey Marunich <[email protected]>
@smarunich
Copy link
Contributor Author

smarunich commented Aug 26, 2025

@smarunich We are planning a release soon, and would like to include this feature in it. Will you have time to continue with the PR in the next couple of days? If your schedule doesn't allow this, please let us know and we will continue with this PR.

@irar2 please feel free to take it forward, I have done few updates, but running out of cycles for this week as I have run into the conflicting priorities, I won't have cycle to work on this until very beginning of the next week, meanwhile I might miss on quality... don't want to delay you folks as truly appreciate your efforts, I would really appreciate if you can take this forward to completion!

@smarunich smarunich requested a review from irar2 August 26, 2025 12:57
@irar2
Copy link
Collaborator

irar2 commented Aug 27, 2025

@smarunich We are planning a release soon, and would like to include this feature in it. Will you have time to continue with the PR in the next couple of days? If your schedule doesn't allow this, please let us know and we will continue with this PR.

@irar2 please feel free to take it forward, I have done few updates, but running out of cycles for this week as I have run into the conflicting priorities, I won't have cycle to work on this until very beginning of the next week, meanwhile I might miss on quality... don't want to delay you folks as truly appreciate your efforts, I would really appreciate if you can take this forward to completion!

@smarunich Thanks a lot! We will take it from here then.

@irar2
Copy link
Collaborator

irar2 commented Aug 27, 2025

fixes #135

FakeMetrics *Metrics `yaml:"fake-metrics" json:"fake-metrics"`

// FailureInjectionRate is the probability (0-100) of injecting failures
FailureInjectionRate int `yaml:"failure-injection-rate"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add json annotation

)
})

Describe("Failure injection mode", func() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this test to failure_test please

@mayabar
Copy link
Collaborator

mayabar commented Aug 27, 2025

Hi Sergey @smarunich
Can you please fix the sing off problem , so we will be able to merge this PR
Ira is going to fix requested changes

Signed-off-by: Ira <[email protected]>
@irar2 irar2 merged commit 74fd1c5 into llm-d:main Aug 28, 2025
3 of 4 checks passed
@irar2
Copy link
Collaborator

irar2 commented Aug 28, 2025

@smarunich We added your signature to the extended comments of the squashed merge that we did.
Thank you very much for your contribution, it is much appreciated.

@smarunich
Copy link
Contributor Author

@smarunich We added your signature to the extended comments of the squashed merge that we did. Thank you very much for your contribution, it is much appreciated.

thank you @irar2 @mayabar for taking this forward to completion, truly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants