Skip to content

Conversation

@smarunich
Copy link
Owner

This pull request introduces several configuration improvements and dependency updates for the simulator, focusing on enhanced control over the KV cache and tokenizers, better event publishing, and improved validation and testing. The most significant changes are grouped below.

Configuration and KV Cache Enhancements

  • Added new configuration options to Configuration struct: KVCacheSize, TokenBlockSize, TokenizersCacheDir, HashSeed, ZMQEndpoint, and EventBatchSize, with support for command-line flags, default values, and validation logic to ensure correct usage. Also, the code now reads HashSeed from the PYTHONHASHSEED environment variable if not set. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR116-R129), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR185-R188), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR290-R301), [[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR346-R353), [[5]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR388-R394))
  • Refactored blockCache to use the new configuration struct for initialization, including dynamic event batching, topic naming, and publisher setup. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681L47-R64), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681R221-R224))
  • Improved block removal logic in the KV cache to correctly reference the block being removed. ([pkg/kv-cache/block_cache.goL131-R135](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681L131-R135))

Dependency and Build Updates

  • Upgraded github.com/llm-d/llm-d-kv-cache-manager to a newer version and updated github.com/daulet/tokenizers to v1.22.1 to match the imported version, including changes to the Dockerfile and Makefile for consistent tokenizer version handling. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L11-R11), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L30-R29), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557L26-R28), [[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R42-R51))
  • Removed unused and obsolete dependencies from go.mod for a cleaner build. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L20), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L40), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L72))

Event Publishing Improvements

  • Updated publisher implementation and tests to use msgpack/v5 and configure the encoder for array-encoded structs, improving event batch serialization and compatibility. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482R20-R28), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482L65-R70), [[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482L76-R81), [[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-11e396f9556390402b5e8eb73136dd18add33faf873f48f08b9a5ce4ae9f2d62L28-R28), [[5]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-11e396f9556390402b5e8eb73136dd18add33faf873f48f08b9a5ce4ae9f2d62R47-R48))

CI/CD and Documentation

  • Added a prerelease input to the Docker build GitHub Action and fixed the logic for handling pre-release builds. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76cc00033cb7774949c1149f83441a859634155ccf9d2bf6dec2794d4b0cc107R16-R19), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76cc00033cb7774949c1149f83441a859634155ccf9d2bf6dec2794d4b0cc107L35-R39))
  • Updated documentation (README) to clarify and comment out new KV cache-related configuration options. ([README.mdR119-R127](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R119-R127))

Testing Improvements

  • Added and updated tests for new configuration options, including validation for invalid values (negative cache size, invalid block size, negative event batch size). ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b836ab7773c80e16c32843c9030f596048b5967467f691a35aae92f93125e71cR106-R112), [[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b836ab7773c80e16c32843c9030f596048b5967467f691a35aae92f93125e71cR286-R300))

shmuelk and others added 30 commits August 14, 2025 17:39
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events

Signed-off-by: Ira <[email protected]>

* Fix lint errors

Signed-off-by: Ira <[email protected]>

* Review fixes

Signed-off-by: Ira <[email protected]>

* Sleep to allow prevous sub to close

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.

Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field.

Signed-off-by: Sergey Marunich <[email protected]>
Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency.

Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality.

Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field.

Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events

Signed-off-by: Ira <[email protected]>

* Fix lint errors

Signed-off-by: Ira <[email protected]>

* Review fixes

Signed-off-by: Ira <[email protected]>

* Sleep to allow prevous sub to close

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
)

* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file

Signed-off-by: Maya Barnea <[email protected]>

* updates according PR's review

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode.

Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>

KV cache and tokenization related configuration (llm-d#125)

Signed-off-by: Ira <[email protected]>

Publish kv-cache events (llm-d#126)

* Publish kv-cache events

Signed-off-by: Ira <[email protected]>

* Fix lint errors

Signed-off-by: Ira <[email protected]>

* Review fixes

Signed-off-by: Ira <[email protected]>

* Sleep to allow prevous sub to close

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>

Use same version of tokenizer in both Dockerfile and Makefile (llm-d#132)

* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file

Signed-off-by: Maya Barnea <[email protected]>

* updates according PR's review

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>

Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.

Signed-off-by: Sergey Marunich <[email protected]>
Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.
Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.
Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated
* add pod name and ns headers

Signed-off-by: npolshakova <[email protected]>

* add pod name and ns env

Signed-off-by: npolshakova <[email protected]>

* Signed-off-by: npolshakova <[email protected]>

feedback

Signed-off-by: npolshakova <[email protected]>

* reuse env var

Signed-off-by: npolshakova <[email protected]>

* feedback

Signed-off-by: npolshakova <[email protected]>

* add unset env tests

Signed-off-by: npolshakova <[email protected]>

---------

Signed-off-by: npolshakova <[email protected]>
* Support fake metrics

Signed-off-by: Ira <[email protected]>

* Readme

Signed-off-by: Ira <[email protected]>

* Removed commented out code

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
- fix serialization of BlockStored and BlockRemoved structures to be compatible to v0.2.1

Signed-off-by: Maya Barnea <[email protected]>
* Using the IP address 127.0.0.1 instead of localhost in test cases for zmq to prevent potential name resolution issues

Signed-off-by: Qifan Deng <[email protected]>

* Ignore vscode devcontainer config

Signed-off-by: Qifan Deng <[email protected]>

* Fix a formatting error introduced by commit 9235047

Signed-off-by: Qifan Deng <[email protected]>

---------

Signed-off-by: Qifan Deng <[email protected]>
zhengkezhou1 and others added 19 commits August 21, 2025 11:18
* Add ZMQ connection retry configuration

Signed-off-by: zhengkezhou1 <[email protected]>

* add test & update readme

Signed-off-by: zhengkezhou1 <[email protected]>

* add retries test

Signed-off-by: zhengkezhou1 <[email protected]>

* more tests & rename Command line parameters

Signed-off-by: zhengkezhou1 <[email protected]>

---------

Signed-off-by: zhengkezhou1 <[email protected]>
* Added an OWNERS file to control who can review and approve PRs

Signed-off-by: Shmuel Kallner <[email protected]>

* Added Prow automation

Signed-off-by: Shmuel Kallner <[email protected]>

* Added automated marking of issues as stale

Signed-off-by: Shmuel Kallner <[email protected]>

---------

Signed-off-by: Shmuel Kallner <[email protected]>
* pass ctx to startServer

Signed-off-by: npolshakova <[email protected]>

* fix sim test to use ctx

Signed-off-by: npolshakova <[email protected]>

---------

Signed-off-by: npolshakova <[email protected]>
* Show final config in simulaor default logger at Info lvel

Signed-off-by: Qifan Deng <[email protected]>

* Remove unnecessary local var and update show config prompt

Signed-off-by: Qifan Deng <[email protected]>

* Resolve conflict due to new arg of zmq max retries

Signed-off-by: Qifan Deng <[email protected]>

* Clean fields when show final configuration

Signed-off-by: Qifan Deng <[email protected]>

* Simplify function syntax

Signed-off-by: Qifan Deng <[email protected]>

* Fix golangci-lint installation link in makefile

Signed-off-by: Qifan Deng <[email protected]>

* Fix err fmt when logger is invalid

Signed-off-by: Qifan Deng <[email protected]>

---------

Signed-off-by: Qifan Deng <[email protected]>
…oFirst (to int) (llm-d#163)

* Cast bounds type in tests to func def: latency, interToken, and timeToFirst (to int)

Signed-off-by: Qifan Deng <[email protected]>

* Use float 32

Signed-off-by: Qifan Deng <[email protected]>

---------

Signed-off-by: Qifan Deng <[email protected]>
…ing nil pointer during runtime if run the only test suite (llm-d#166)

* Add make flag to filter test case

Signed-off-by: Qifan Deng <[email protected]>

* Init random generator in Check random latencies test suite

Signed-off-by: Qifan Deng <[email protected]>

---------

Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change.

Signed-off-by: Sergey Marunich <[email protected]>
* Use channels for metrics updates. Metrics tests

Signed-off-by: Ira <[email protected]>

* Review comments

Signed-off-by: Ira <[email protected]>

---------

Signed-off-by: Ira <[email protected]>
Signed-off-by: Ira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants