Failure mode #1

smarunich · 2025-08-14T21:50:45Z

This pull request introduces several configuration improvements and dependency updates for the simulator, focusing on enhanced control over the KV cache and tokenizers, better event publishing, and improved validation and testing. The most significant changes are grouped below.

Configuration and KV Cache Enhancements

Added new configuration options to Configuration struct: KVCacheSize, TokenBlockSize, TokenizersCacheDir, HashSeed, ZMQEndpoint, and EventBatchSize, with support for command-line flags, default values, and validation logic to ensure correct usage. Also, the code now reads HashSeed from the PYTHONHASHSEED environment variable if not set. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR116-R129), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR185-R188), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR290-R301), [[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR346-R353), [[5]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR388-R394))
Refactored blockCache to use the new configuration struct for initialization, including dynamic event batching, topic naming, and publisher setup. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681L47-R64), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681R221-R224))
Improved block removal logic in the KV cache to correctly reference the block being removed. ([pkg/kv-cache/block_cache.goL131-R135](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681L131-R135))

Dependency and Build Updates

Upgraded github.com/llm-d/llm-d-kv-cache-manager to a newer version and updated github.com/daulet/tokenizers to v1.22.1 to match the imported version, including changes to the Dockerfile and Makefile for consistent tokenizer version handling. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L11-R11), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L30-R29), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557L26-R28), [[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R42-R51))
Removed unused and obsolete dependencies from go.mod for a cleaner build. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L20), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L40), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L72))

Event Publishing Improvements

Updated publisher implementation and tests to use msgpack/v5 and configure the encoder for array-encoded structs, improving event batch serialization and compatibility. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482R20-R28), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482L65-R70), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482L76-R81), [[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-11e396f9556390402b5e8eb73136dd18add33faf873f48f08b9a5ce4ae9f2d62L28-R28), [[5]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-11e396f9556390402b5e8eb73136dd18add33faf873f48f08b9a5ce4ae9f2d62R47-R48))

CI/CD and Documentation

Added a prerelease input to the Docker build GitHub Action and fixed the logic for handling pre-release builds. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76cc00033cb7774949c1149f83441a859634155ccf9d2bf6dec2794d4b0cc107R16-R19), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76cc00033cb7774949c1149f83441a859634155ccf9d2bf6dec2794d4b0cc107L35-R39))
Updated documentation (README) to clarify and comment out new KV cache-related configuration options. ([README.mdR119-R127](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R119-R127))

Testing Improvements

Added and updated tests for new configuration options, including validation for invalid values (negative cache size, invalid block size, negative event batch size). ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b836ab7773c80e16c32843c9030f596048b5967467f691a35aae92f93125e71cR106-R112), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b836ab7773c80e16c32843c9030f596048b5967467f691a35aae92f93125e71cR286-R300))

Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

) Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

* Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]>

Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>

Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency. Signed-off-by: Sergey Marunich <[email protected]>

Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]>

Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>

Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

* Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode. Signed-off-by: Sergey Marunich <[email protected]>

Signed-off-by: Sergey Marunich <[email protected]>

Signed-off-by: Sergey Marunich <[email protected]> KV cache and tokenization related configuration (llm-d#125) Signed-off-by: Ira <[email protected]> Publish kv-cache events (llm-d#126) * Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Use same version of tokenizer in both Dockerfile and Makefile (llm-d#132) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage. Signed-off-by: Sergey Marunich <[email protected]>

Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.

Signed-off-by: Sergey Marunich <[email protected]>

Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.

Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated

) Signed-off-by: Maya Barnea <[email protected]>

* add pod name and ns headers Signed-off-by: npolshakova <[email protected]> * add pod name and ns env Signed-off-by: npolshakova <[email protected]> * Signed-off-by: npolshakova <[email protected]> feedback Signed-off-by: npolshakova <[email protected]> * reuse env var Signed-off-by: npolshakova <[email protected]> * feedback Signed-off-by: npolshakova <[email protected]> * add unset env tests Signed-off-by: npolshakova <[email protected]> --------- Signed-off-by: npolshakova <[email protected]>

Signed-off-by: Ira <[email protected]>

* Support fake metrics Signed-off-by: Ira <[email protected]> * Readme Signed-off-by: Ira <[email protected]> * Removed commented out code Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]>

Signed-off-by: Shmuel Kallner <[email protected]>

- fix serialization of BlockStored and BlockRemoved structures to be compatible to v0.2.1 Signed-off-by: Maya Barnea <[email protected]>

…-d#149) Signed-off-by: Maya Barnea <[email protected]>

Signed-off-by: Maya Barnea <[email protected]>

* Using the IP address 127.0.0.1 instead of localhost in test cases for zmq to prevent potential name resolution issues Signed-off-by: Qifan Deng <[email protected]> * Ignore vscode devcontainer config Signed-off-by: Qifan Deng <[email protected]> * Fix a formatting error introduced by commit 9235047 Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>

Signed-off-by: Maya Barnea <[email protected]>

* Add ZMQ connection retry configuration Signed-off-by: zhengkezhou1 <[email protected]> * add test & update readme Signed-off-by: zhengkezhou1 <[email protected]> * add retries test Signed-off-by: zhengkezhou1 <[email protected]> * more tests & rename Command line parameters Signed-off-by: zhengkezhou1 <[email protected]> --------- Signed-off-by: zhengkezhou1 <[email protected]>

* Added an OWNERS file to control who can review and approve PRs Signed-off-by: Shmuel Kallner <[email protected]> * Added Prow automation Signed-off-by: Shmuel Kallner <[email protected]> * Added automated marking of issues as stale Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]>

Signed-off-by: Maya Barnea <[email protected]>

* pass ctx to startServer Signed-off-by: npolshakova <[email protected]> * fix sim test to use ctx Signed-off-by: npolshakova <[email protected]> --------- Signed-off-by: npolshakova <[email protected]>

* Show final config in simulaor default logger at Info lvel Signed-off-by: Qifan Deng <[email protected]> * Remove unnecessary local var and update show config prompt Signed-off-by: Qifan Deng <[email protected]> * Resolve conflict due to new arg of zmq max retries Signed-off-by: Qifan Deng <[email protected]> * Clean fields when show final configuration Signed-off-by: Qifan Deng <[email protected]> * Simplify function syntax Signed-off-by: Qifan Deng <[email protected]> * Fix golangci-lint installation link in makefile Signed-off-by: Qifan Deng <[email protected]> * Fix err fmt when logger is invalid Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>

…oFirst (to int) (llm-d#163) * Cast bounds type in tests to func def: latency, interToken, and timeToFirst (to int) Signed-off-by: Qifan Deng <[email protected]> * Use float 32 Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>

Signed-off-by: Qifan Deng <[email protected]>

…ing nil pointer during runtime if run the only test suite (llm-d#166) * Add make flag to filter test case Signed-off-by: Qifan Deng <[email protected]> * Init random generator in Check random latencies test suite Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>

Signed-off-by: Sergey Marunich <[email protected]>

Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change. Signed-off-by: Sergey Marunich <[email protected]>

* Use channels for metrics updates. Metrics tests Signed-off-by: Ira <[email protected]> * Review comments Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]>

Signed-off-by: Ira <[email protected]>

Signed-off-by: Ira Rosen <[email protected]>

Signed-off-by: Ira <[email protected]>

shmuelk and others added 30 commits August 14, 2025 17:39

Add definition of new action input (llm-d#123)

638d0f7

Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

KV cache and tokenization related configuration (llm-d#125)

9ffe957

Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

Another attempt at adding a latest tag only on release builds (llm-d#124

a5a7d81

) Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

KV cache and tokenization related configuration (llm-d#125)

c35dbca

Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>

Clarify failure injection rate documentation

3ae7113

Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode. Signed-off-by: Sergey Marunich <[email protected]>

Set default failure injection rate to 0

f5ae85b

Signed-off-by: Sergey Marunich <[email protected]>

rebase duplicates

9dbb689

Signed-off-by: Sergey Marunich <[email protected]>

Update option constructors in simulator tests

5162226

Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.

Merge branch 'main' into failure-mode

7bd69e8

Signed-off-by: Sergey Marunich <[email protected]>

Document failure injection options in README

5182187

Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.

Set FailureInjectionRate default to 0 in config

b68115f

Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated

use newer version of kvcache-manager, update code accordingly (llm-d#133

7bcee36

) Signed-off-by: Maya Barnea <[email protected]>

Create UUID string under a lock (llm-d#143)

a080a17

Signed-off-by: Ira <[email protected]>

Support fake metrics (llm-d#144)

4309925

* Support fake metrics Signed-off-by: Ira <[email protected]> * Readme Signed-off-by: Ira <[email protected]> * Removed commented out code Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]>

Makefile fixes for MacOS (llm-d#146)

efa82a5

Signed-off-by: Shmuel Kallner <[email protected]>

- return to kv-cache-manager version v0.2.1 (llm-d#147)

54efd5b

- fix serialization of BlockStored and BlockRemoved structures to be compatible to v0.2.1 Signed-off-by: Maya Barnea <[email protected]>

present kv-cache related configuration parameters in readme file (llm…

c3aae8d

…-d#149) Signed-off-by: Maya Barnea <[email protected]>

updated readme file - added environment variables (llm-d#151)

ad487ee

Signed-off-by: Maya Barnea <[email protected]>

change user to not be root in the dockerfile (llm-d#153)

03050c7

Signed-off-by: Maya Barnea <[email protected]>

zhengkezhou1 and others added 19 commits August 21, 2025 11:18

small changes in texts (llm-d#156)

4076bd2

Signed-off-by: Maya Barnea <[email protected]>

Fix server interrupt (llm-d#161)

859d8c2

* pass ctx to startServer Signed-off-by: npolshakova <[email protected]> * fix sim test to use ctx Signed-off-by: npolshakova <[email protected]> --------- Signed-off-by: npolshakova <[email protected]>

Remvoe unnecessary deferal of server close (llm-d#162)

703735d

Signed-off-by: Qifan Deng <[email protected]>

Refactor failure type usage and error response format

bfa02ff

Signed-off-by: Sergey Marunich <[email protected]>

Refactor failure type flag handling and code formatting

700e36f

Signed-off-by: Sergey Marunich <[email protected]>

Merge branch 'main' into failure-mode

14860b3

Signed-off-by: Sergey Marunich <[email protected]>

Fix config validation and simulator test argument handling

8f6d56c

Signed-off-by: Sergey Marunich <[email protected]>

remove duplicate

e0183b7

Signed-off-by: Sergey Marunich <[email protected]>

Use channels for metrics updates, added metrics tests (llm-d#171)

57657bf

* Use channels for metrics updates. Metrics tests Signed-off-by: Ira <[email protected]> * Review comments Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]>

Remove rerun on comment action (llm-d#174)

974b611

Signed-off-by: Ira <[email protected]>

Use one type for all errors. Map code to type

72dde24

Signed-off-by: Ira <[email protected]>

Merge branch 'main' into failure-mode

13492fc

Signed-off-by: Ira Rosen <[email protected]>

Review comments

7994048

Signed-off-by: Ira <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failure mode #1

Failure mode #1

Uh oh!

smarunich commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Failure mode #1

Are you sure you want to change the base?

Failure mode #1

Uh oh!

Conversation

smarunich commented Aug 14, 2025

Configuration and KV Cache Enhancements

Dependency and Build Updates

Event Publishing Improvements

CI/CD and Documentation

Testing Improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants