You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add definition of new action input (#123)
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* KV cache and tokenization related configuration (#125)
Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Another attempt at adding a latest tag only on release builds (#124)
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events (#126)
* Publish kv-cache events
Signed-off-by: Ira <[email protected]>
* Fix lint errors
Signed-off-by: Ira <[email protected]>
* Review fixes
Signed-off-by: Ira <[email protected]>
* Sleep to allow prevous sub to close
Signed-off-by: Ira <[email protected]>
---------
Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Add failure injection mode to simulator
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.
Signed-off-by: Sergey Marunich <[email protected]>
* Refactor failure injection and update simulator error handling
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field.
Signed-off-by: Sergey Marunich <[email protected]>
* Make tokenizer version configurable from Dockerfile
Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency.
Signed-off-by: Sergey Marunich <[email protected]>
* Add failure injection mode to simulator
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.
Signed-off-by: Sergey Marunich <[email protected]>
* Refactor failure injection and update simulator error handling
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field.
Signed-off-by: Sergey Marunich <[email protected]>
* KV cache and tokenization related configuration (#125)
Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events (#126)
* Publish kv-cache events
Signed-off-by: Ira <[email protected]>
* Fix lint errors
Signed-off-by: Ira <[email protected]>
* Review fixes
Signed-off-by: Ira <[email protected]>
* Sleep to allow prevous sub to close
Signed-off-by: Ira <[email protected]>
---------
Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Use same version of tokenizer in both Dockerfile and Makefile (#132)
* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file
Signed-off-by: Maya Barnea <[email protected]>
* updates according PR's review
Signed-off-by: Maya Barnea <[email protected]>
---------
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Clarify failure injection rate documentation
Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode.
Signed-off-by: Sergey Marunich <[email protected]>
* Set default failure injection rate to 0
Signed-off-by: Sergey Marunich <[email protected]>
* rebase duplicates
Signed-off-by: Sergey Marunich <[email protected]>
* re-base the changes
Signed-off-by: Sergey Marunich <[email protected]>
KV cache and tokenization related configuration (#125)
Signed-off-by: Ira <[email protected]>
Publish kv-cache events (#126)
* Publish kv-cache events
Signed-off-by: Ira <[email protected]>
* Fix lint errors
Signed-off-by: Ira <[email protected]>
* Review fixes
Signed-off-by: Ira <[email protected]>
* Sleep to allow prevous sub to close
Signed-off-by: Ira <[email protected]>
---------
Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Use same version of tokenizer in both Dockerfile and Makefile (#132)
* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file
Signed-off-by: Maya Barnea <[email protected]>
* updates according PR's review
Signed-off-by: Maya Barnea <[email protected]>
---------
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.
Signed-off-by: Sergey Marunich <[email protected]>
* Update option constructors in simulator tests
Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.
Signed-off-by: Sergey Marunich <[email protected]>
* Document failure injection options in README
Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.
Signed-off-by: Sergey Marunich <[email protected]>
* Set FailureInjectionRate default to 0 in config
Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated
Signed-off-by: Sergey Marunich <[email protected]>
* Refactor failure type usage and error response format
Signed-off-by: Sergey Marunich <[email protected]>
* Refactor failure type flag handling and code formatting
Signed-off-by: Sergey Marunich <[email protected]>
* Fix config validation and simulator test argument handling
Signed-off-by: Sergey Marunich <[email protected]>
* remove duplicate
Signed-off-by: Sergey Marunich <[email protected]>
* Refactor failure handling to use CompletionError struct
Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change.
Signed-off-by: Sergey Marunich <[email protected]>
* Use one type for all errors. Map code to type
Signed-off-by: Ira <[email protected]>
* Review comments
Signed-off-by: Ira <[email protected]>
---------
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Ira <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Ira Rosen <[email protected]>
Co-authored-by: Shmuel Kallner <[email protected]>
Co-authored-by: Ira Rosen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,6 +124,8 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
124
124
-`zmq-endpoint`: ZMQ address to publish events
125
125
-`zmq-max-connect-attempts`: the maximum number of ZMQ connection attempts, defaults to 0, maximum: 10
126
126
-`event-batch-size`: the maximum number of kv-cache events to be sent together, defaults to 16
127
+
-`failure-injection-rate`: probability (0-100) of injecting failures, optional, default is 0
128
+
-`failure-types`: list of specific failure types to inject (rate_limit, invalid_api_key, context_length, server_error, invalid_request, model_not_found), optional, if empty all types are used
127
129
-`fake-metrics`: represents a predefined set of metrics to be sent to Prometheus as a substitute for the real metrics. When specified, only these fake metrics will be reported — real metrics and fake metrics will never be reported together. The set should include values for
128
130
-`running-requests`
129
131
-`waiting-requests`
@@ -132,7 +134,6 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
f.IntVar(&config.MaxCPULoras, "max-cpu-loras", config.MaxCPULoras, "Maximum number of LoRAs to store in CPU memory")
398
431
f.IntVar(&config.MaxModelLen, "max-model-len", config.MaxModelLen, "Model's context window, maximum number of tokens in a single request including input and output")
399
432
400
-
f.StringVar(&config.Mode, "mode", config.Mode, "Simulator mode, echo - returns the same text that was sent in the request, for chat completion returns the last message, random - returns random sentence from a bank of pre-defined sentences")
433
+
f.StringVar(&config.Mode, "mode", config.Mode, "Simulator mode: echo - returns the same text that was sent in the request, for chat completion returns the last message; random - returns random sentence from a bank of pre-defined sentences")
401
434
f.IntVar(&config.InterTokenLatency, "inter-token-latency", config.InterTokenLatency, "Time to generate one token (in milliseconds)")
402
435
f.IntVar(&config.TimeToFirstToken, "time-to-first-token", config.TimeToFirstToken, "Time to first token (in milliseconds)")
403
436
f.IntVar(&config.KVCacheTransferLatency, "kv-cache-transfer-latency", config.KVCacheTransferLatency, "Time for KV-cache transfer from a remote vLLM (in milliseconds)")
f.Var(&dummyFailureTypes, "failure-types", "List of specific failure types to inject (rate_limit, invalid_api_key, context_length, server_error, invalid_request, model_not_found)")
465
+
f.Lookup("failure-types").NoOptDefVal=dummy
466
+
427
467
// These values were manually parsed above in getParamValueFromArgs, we leave this in order to get these flags in --help
428
468
vardummyStringstring
429
469
f.StringVar(&dummyString, "config", "", "The path to a yaml configuration file. The command line values overwrite the configuration file values")
0 commit comments