Skip to content

Commit 74fd1c5

Browse files
smarunichshmuelkirar2mayabar
authored
Add failure injection mode to simulator (#131)
* Add definition of new action input (#123) Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * KV cache and tokenization related configuration (#125) Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * Another attempt at adding a latest tag only on release builds (#124) Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * Publish kv-cache events (#126) * Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * Add failure injection mode to simulator Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]> * Refactor failure injection and update simulator error handling Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]> * Make tokenizer version configurable from Dockerfile Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency. Signed-off-by: Sergey Marunich <[email protected]> * Add failure injection mode to simulator Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]> * Refactor failure injection and update simulator error handling Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]> * KV cache and tokenization related configuration (#125) Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * Publish kv-cache events (#126) * Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * Use same version of tokenizer in both Dockerfile and Makefile (#132) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> * Clarify failure injection rate documentation Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode. Signed-off-by: Sergey Marunich <[email protected]> * Set default failure injection rate to 0 Signed-off-by: Sergey Marunich <[email protected]> * rebase duplicates Signed-off-by: Sergey Marunich <[email protected]> * re-base the changes Signed-off-by: Sergey Marunich <[email protected]> KV cache and tokenization related configuration (#125) Signed-off-by: Ira <[email protected]> Publish kv-cache events (#126) * Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Use same version of tokenizer in both Dockerfile and Makefile (#132) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage. Signed-off-by: Sergey Marunich <[email protected]> * Update option constructors in simulator tests Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage. Signed-off-by: Sergey Marunich <[email protected]> * Document failure injection options in README Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults. Signed-off-by: Sergey Marunich <[email protected]> * Set FailureInjectionRate default to 0 in config Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated Signed-off-by: Sergey Marunich <[email protected]> * Refactor failure type usage and error response format Signed-off-by: Sergey Marunich <[email protected]> * Refactor failure type flag handling and code formatting Signed-off-by: Sergey Marunich <[email protected]> * Fix config validation and simulator test argument handling Signed-off-by: Sergey Marunich <[email protected]> * remove duplicate Signed-off-by: Sergey Marunich <[email protected]> * Refactor failure handling to use CompletionError struct Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change. Signed-off-by: Sergey Marunich <[email protected]> * Use one type for all errors. Map code to type Signed-off-by: Ira <[email protected]> * Review comments Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Signed-off-by: Ira <[email protected]> Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Ira Rosen <[email protected]> Co-authored-by: Shmuel Kallner <[email protected]> Co-authored-by: Ira Rosen <[email protected]> Co-authored-by: Maya Barnea <[email protected]>
1 parent 974b611 commit 74fd1c5

File tree

8 files changed

+558
-28
lines changed

8 files changed

+558
-28
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,8 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
124124
- `zmq-endpoint`: ZMQ address to publish events
125125
- `zmq-max-connect-attempts`: the maximum number of ZMQ connection attempts, defaults to 0, maximum: 10
126126
- `event-batch-size`: the maximum number of kv-cache events to be sent together, defaults to 16
127+
- `failure-injection-rate`: probability (0-100) of injecting failures, optional, default is 0
128+
- `failure-types`: list of specific failure types to inject (rate_limit, invalid_api_key, context_length, server_error, invalid_request, model_not_found), optional, if empty all types are used
127129
- `fake-metrics`: represents a predefined set of metrics to be sent to Prometheus as a substitute for the real metrics. When specified, only these fake metrics will be reported — real metrics and fake metrics will never be reported together. The set should include values for
128130
- `running-requests`
129131
- `waiting-requests`
@@ -132,7 +134,6 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
132134

133135
Example:
134136
{"running-requests":10,"waiting-requests":30,"kv-cache-usage":0.4,"loras":[{"running":"lora4,lora2","waiting":"lora3","timestamp":1257894567},{"running":"lora4,lora3","waiting":"","timestamp":1257894569}]}
135-
136137

137138
In addition, as we are using klog, the following parameters are available:
138139
- `add_dir_header`: if true, adds the file directory to the header of the log messages

pkg/common/config.go

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,14 @@ const (
3434
vLLMDefaultPort = 8000
3535
ModeRandom = "random"
3636
ModeEcho = "echo"
37-
dummy = "dummy"
37+
// Failure type constants
38+
FailureTypeRateLimit = "rate_limit"
39+
FailureTypeInvalidAPIKey = "invalid_api_key"
40+
FailureTypeContextLength = "context_length"
41+
FailureTypeServerError = "server_error"
42+
FailureTypeInvalidRequest = "invalid_request"
43+
FailureTypeModelNotFound = "model_not_found"
44+
dummy = "dummy"
3845
)
3946

4047
type Configuration struct {
@@ -134,6 +141,11 @@ type Configuration struct {
134141

135142
// FakeMetrics is a set of metrics to send to Prometheus instead of the real data
136143
FakeMetrics *Metrics `yaml:"fake-metrics" json:"fake-metrics"`
144+
145+
// FailureInjectionRate is the probability (0-100) of injecting failures
146+
FailureInjectionRate int `yaml:"failure-injection-rate" json:"failure-injection-rate"`
147+
// FailureTypes is a list of specific failure types to inject (empty means all types)
148+
FailureTypes []string `yaml:"failure-types" json:"failure-types"`
137149
}
138150

139151
type Metrics struct {
@@ -357,6 +369,27 @@ func (c *Configuration) validate() error {
357369
if c.EventBatchSize < 1 {
358370
return errors.New("event batch size cannot less than 1")
359371
}
372+
373+
if c.FailureInjectionRate < 0 || c.FailureInjectionRate > 100 {
374+
return errors.New("failure injection rate should be between 0 and 100")
375+
}
376+
377+
validFailureTypes := map[string]bool{
378+
FailureTypeRateLimit: true,
379+
FailureTypeInvalidAPIKey: true,
380+
FailureTypeContextLength: true,
381+
FailureTypeServerError: true,
382+
FailureTypeInvalidRequest: true,
383+
FailureTypeModelNotFound: true,
384+
}
385+
for _, failureType := range c.FailureTypes {
386+
if !validFailureTypes[failureType] {
387+
return fmt.Errorf("invalid failure type '%s', valid types are: %s, %s, %s, %s, %s, %s", failureType,
388+
FailureTypeRateLimit, FailureTypeInvalidAPIKey, FailureTypeContextLength,
389+
FailureTypeServerError, FailureTypeInvalidRequest, FailureTypeModelNotFound)
390+
}
391+
}
392+
360393
if c.ZMQMaxConnectAttempts > 10 {
361394
return errors.New("zmq retries times cannot be more than 10")
362395
}
@@ -397,7 +430,7 @@ func ParseCommandParamsAndLoadConfig() (*Configuration, error) {
397430
f.IntVar(&config.MaxCPULoras, "max-cpu-loras", config.MaxCPULoras, "Maximum number of LoRAs to store in CPU memory")
398431
f.IntVar(&config.MaxModelLen, "max-model-len", config.MaxModelLen, "Model's context window, maximum number of tokens in a single request including input and output")
399432

400-
f.StringVar(&config.Mode, "mode", config.Mode, "Simulator mode, echo - returns the same text that was sent in the request, for chat completion returns the last message, random - returns random sentence from a bank of pre-defined sentences")
433+
f.StringVar(&config.Mode, "mode", config.Mode, "Simulator mode: echo - returns the same text that was sent in the request, for chat completion returns the last message; random - returns random sentence from a bank of pre-defined sentences")
401434
f.IntVar(&config.InterTokenLatency, "inter-token-latency", config.InterTokenLatency, "Time to generate one token (in milliseconds)")
402435
f.IntVar(&config.TimeToFirstToken, "time-to-first-token", config.TimeToFirstToken, "Time to first token (in milliseconds)")
403436
f.IntVar(&config.KVCacheTransferLatency, "kv-cache-transfer-latency", config.KVCacheTransferLatency, "Time for KV-cache transfer from a remote vLLM (in milliseconds)")
@@ -424,6 +457,13 @@ func ParseCommandParamsAndLoadConfig() (*Configuration, error) {
424457
f.UintVar(&config.ZMQMaxConnectAttempts, "zmq-max-connect-attempts", config.ZMQMaxConnectAttempts, "Maximum number of times to try ZMQ connect")
425458
f.IntVar(&config.EventBatchSize, "event-batch-size", config.EventBatchSize, "Maximum number of kv-cache events to be sent together")
426459

460+
f.IntVar(&config.FailureInjectionRate, "failure-injection-rate", config.FailureInjectionRate, "Probability (0-100) of injecting failures")
461+
462+
failureTypes := getParamValueFromArgs("failure-types")
463+
var dummyFailureTypes multiString
464+
f.Var(&dummyFailureTypes, "failure-types", "List of specific failure types to inject (rate_limit, invalid_api_key, context_length, server_error, invalid_request, model_not_found)")
465+
f.Lookup("failure-types").NoOptDefVal = dummy
466+
427467
// These values were manually parsed above in getParamValueFromArgs, we leave this in order to get these flags in --help
428468
var dummyString string
429469
f.StringVar(&dummyString, "config", "", "The path to a yaml configuration file. The command line values overwrite the configuration file values")
@@ -463,6 +503,9 @@ func ParseCommandParamsAndLoadConfig() (*Configuration, error) {
463503
if servedModelNames != nil {
464504
config.ServedModelNames = servedModelNames
465505
}
506+
if failureTypes != nil {
507+
config.FailureTypes = failureTypes
508+
}
466509

467510
if config.HashSeed == "" {
468511
hashSeed := os.Getenv("PYTHONHASHSEED")

pkg/common/config_test.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -370,6 +370,19 @@ var _ = Describe("Simulator configuration", func() {
370370
args: []string{"cmd", "--event-batch-size", "-35",
371371
"--config", "../../manifests/config.yaml"},
372372
},
373+
{
374+
name: "invalid failure injection rate > 100",
375+
args: []string{"cmd", "--model", "test-model", "--failure-injection-rate", "150"},
376+
},
377+
{
378+
name: "invalid failure injection rate < 0",
379+
args: []string{"cmd", "--model", "test-model", "--failure-injection-rate", "-10"},
380+
},
381+
{
382+
name: "invalid failure type",
383+
args: []string{"cmd", "--model", "test-model", "--failure-injection-rate", "50",
384+
"--failure-types", "invalid_type"},
385+
},
373386
{
374387
name: "invalid fake metrics: negative running requests",
375388
args: []string{"cmd", "--fake-metrics", "{\"running-requests\":-10,\"waiting-requests\":30,\"kv-cache-usage\":0.4}",
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
/*
2+
Copyright 2025 The llm-d-inference-sim Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package llmdinferencesim
18+
19+
import (
20+
"fmt"
21+
22+
"github.com/llm-d/llm-d-inference-sim/pkg/common"
23+
openaiserverapi "github.com/llm-d/llm-d-inference-sim/pkg/openai-server-api"
24+
)
25+
26+
const (
27+
// Error message templates
28+
rateLimitMessageTemplate = "Rate limit reached for %s in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1."
29+
modelNotFoundMessageTemplate = "The model '%s-nonexistent' does not exist"
30+
)
31+
32+
var predefinedFailures = map[string]openaiserverapi.CompletionError{
33+
common.FailureTypeRateLimit: openaiserverapi.NewCompletionError(rateLimitMessageTemplate, 429, nil),
34+
common.FailureTypeInvalidAPIKey: openaiserverapi.NewCompletionError("Incorrect API key provided.", 401, nil),
35+
common.FailureTypeContextLength: openaiserverapi.NewCompletionError(
36+
"This model's maximum context length is 4096 tokens. However, your messages resulted in 4500 tokens.",
37+
400, stringPtr("messages")),
38+
common.FailureTypeServerError: openaiserverapi.NewCompletionError(
39+
"The server is overloaded or not ready yet.", 503, nil),
40+
common.FailureTypeInvalidRequest: openaiserverapi.NewCompletionError(
41+
"Invalid request: missing required parameter 'model'.", 400, stringPtr("model")),
42+
common.FailureTypeModelNotFound: openaiserverapi.NewCompletionError(modelNotFoundMessageTemplate,
43+
404, stringPtr("model")),
44+
}
45+
46+
// shouldInjectFailure determines whether to inject a failure based on configuration
47+
func shouldInjectFailure(config *common.Configuration) bool {
48+
if config.FailureInjectionRate == 0 {
49+
return false
50+
}
51+
52+
return common.RandomInt(1, 100) <= config.FailureInjectionRate
53+
}
54+
55+
// getRandomFailure returns a random failure from configured types or all types if none specified
56+
func getRandomFailure(config *common.Configuration) openaiserverapi.CompletionError {
57+
var availableFailures []string
58+
if len(config.FailureTypes) == 0 {
59+
// Use all failure types if none specified
60+
for failureType := range predefinedFailures {
61+
availableFailures = append(availableFailures, failureType)
62+
}
63+
} else {
64+
availableFailures = config.FailureTypes
65+
}
66+
67+
if len(availableFailures) == 0 {
68+
// Fallback to server_error if no valid types
69+
return predefinedFailures[common.FailureTypeServerError]
70+
}
71+
72+
randomIndex := common.RandomInt(0, len(availableFailures)-1)
73+
randomType := availableFailures[randomIndex]
74+
75+
// Customize message with current model name
76+
failure := predefinedFailures[randomType]
77+
if randomType == common.FailureTypeRateLimit && config.Model != "" {
78+
failure.Message = fmt.Sprintf(rateLimitMessageTemplate, config.Model)
79+
} else if randomType == common.FailureTypeModelNotFound && config.Model != "" {
80+
failure.Message = fmt.Sprintf(modelNotFoundMessageTemplate, config.Model)
81+
}
82+
83+
return failure
84+
}
85+
86+
func stringPtr(s string) *string {
87+
return &s
88+
}

0 commit comments

Comments
 (0)