forked from llm-d/llm-d-inference-sim
-
Notifications
You must be signed in to change notification settings - Fork 0
Failure mode #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
smarunich
wants to merge
49
commits into
failure-mode2
Choose a base branch
from
failure-mode
base: failure-mode2
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Failure mode #1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
) Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>
Extracts TOKENIZER_VERSION from the Dockerfile and uses it in the download-tokenizer target. This allows the Makefile to automatically use the correct tokenizer version specified in the Dockerfile, improving maintainability and consistency. Signed-off-by: Sergey Marunich <[email protected]>
Introduces a 'failure' mode to the simulator, allowing random injection of OpenAI API-compatible error responses for testing error handling. Adds configuration options for failure injection rate and specific failure types, implements error response logic, and updates documentation and tests to cover the new functionality. Signed-off-by: Sergey Marunich <[email protected]>
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
* Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]>
Removed redundant lines and updated comments and help text to clarify that 'failure-injection-rate' is the probability of injecting failures, not specifically tied to failure mode. Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]> KV cache and tokenization related configuration (llm-d#125) Signed-off-by: Ira <[email protected]> Publish kv-cache events (llm-d#126) * Publish kv-cache events Signed-off-by: Ira <[email protected]> * Fix lint errors Signed-off-by: Ira <[email protected]> * Review fixes Signed-off-by: Ira <[email protected]> * Sleep to allow prevous sub to close Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Use same version of tokenizer in both Dockerfile and Makefile (llm-d#132) * - Use same version of tokenizer in both Dockerfile and Makefile - Fixes in readme file Signed-off-by: Maya Barnea <[email protected]> * updates according PR's review Signed-off-by: Maya Barnea <[email protected]> --------- Signed-off-by: Maya Barnea <[email protected]> Signed-off-by: Sergey Marunich <[email protected]> Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage. Signed-off-by: Sergey Marunich <[email protected]>
Replaces usage of param.NewOpt with openai.Int for MaxTokens and openai.Bool with param.NewOpt for IncludeUsage in simulator_test.go to align with updated API usage.
Signed-off-by: Sergey Marunich <[email protected]>
Added descriptions for `failure-injection-rate` and `failure-types` configuration options to clarify their usage and defaults.
Changed the default value of FailureInjectionRate from 10 to 0 in newConfig to disable failure injection as was enabled by default with previous mode that deprecated
) Signed-off-by: Maya Barnea <[email protected]>
* add pod name and ns headers Signed-off-by: npolshakova <[email protected]> * add pod name and ns env Signed-off-by: npolshakova <[email protected]> * Signed-off-by: npolshakova <[email protected]> feedback Signed-off-by: npolshakova <[email protected]> * reuse env var Signed-off-by: npolshakova <[email protected]> * feedback Signed-off-by: npolshakova <[email protected]> * add unset env tests Signed-off-by: npolshakova <[email protected]> --------- Signed-off-by: npolshakova <[email protected]>
Signed-off-by: Ira <[email protected]>
* Support fake metrics Signed-off-by: Ira <[email protected]> * Readme Signed-off-by: Ira <[email protected]> * Removed commented out code Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
- fix serialization of BlockStored and BlockRemoved structures to be compatible to v0.2.1 Signed-off-by: Maya Barnea <[email protected]>
…-d#149) Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
* Using the IP address 127.0.0.1 instead of localhost in test cases for zmq to prevent potential name resolution issues Signed-off-by: Qifan Deng <[email protected]> * Ignore vscode devcontainer config Signed-off-by: Qifan Deng <[email protected]> * Fix a formatting error introduced by commit 9235047 Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
* Add ZMQ connection retry configuration Signed-off-by: zhengkezhou1 <[email protected]> * add test & update readme Signed-off-by: zhengkezhou1 <[email protected]> * add retries test Signed-off-by: zhengkezhou1 <[email protected]> * more tests & rename Command line parameters Signed-off-by: zhengkezhou1 <[email protected]> --------- Signed-off-by: zhengkezhou1 <[email protected]>
* Added an OWNERS file to control who can review and approve PRs Signed-off-by: Shmuel Kallner <[email protected]> * Added Prow automation Signed-off-by: Shmuel Kallner <[email protected]> * Added automated marking of issues as stale Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
* pass ctx to startServer Signed-off-by: npolshakova <[email protected]> * fix sim test to use ctx Signed-off-by: npolshakova <[email protected]> --------- Signed-off-by: npolshakova <[email protected]>
* Show final config in simulaor default logger at Info lvel Signed-off-by: Qifan Deng <[email protected]> * Remove unnecessary local var and update show config prompt Signed-off-by: Qifan Deng <[email protected]> * Resolve conflict due to new arg of zmq max retries Signed-off-by: Qifan Deng <[email protected]> * Clean fields when show final configuration Signed-off-by: Qifan Deng <[email protected]> * Simplify function syntax Signed-off-by: Qifan Deng <[email protected]> * Fix golangci-lint installation link in makefile Signed-off-by: Qifan Deng <[email protected]> * Fix err fmt when logger is invalid Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>
…oFirst (to int) (llm-d#163) * Cast bounds type in tests to func def: latency, interToken, and timeToFirst (to int) Signed-off-by: Qifan Deng <[email protected]> * Use float 32 Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
…ing nil pointer during runtime if run the only test suite (llm-d#166) * Add make flag to filter test case Signed-off-by: Qifan Deng <[email protected]> * Init random generator in Check random latencies test suite Signed-off-by: Qifan Deng <[email protected]> --------- Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Failure handling in the simulator now uses the CompletionError struct from the openai-server-api package, replacing custom error fields with a unified structure. This improves consistency in error responses and simplifies error injection logic. Associated tests and error handling code have been updated to reflect this change. Signed-off-by: Sergey Marunich <[email protected]>
* Use channels for metrics updates. Metrics tests Signed-off-by: Ira <[email protected]> * Review comments Signed-off-by: Ira <[email protected]> --------- Signed-off-by: Ira <[email protected]>
Signed-off-by: Ira <[email protected]>
Signed-off-by: Ira <[email protected]>
Signed-off-by: Ira Rosen <[email protected]>
Signed-off-by: Ira <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces several configuration improvements and dependency updates for the simulator, focusing on enhanced control over the KV cache and tokenizers, better event publishing, and improved validation and testing. The most significant changes are grouped below.
Configuration and KV Cache Enhancements
Configurationstruct:KVCacheSize,TokenBlockSize,TokenizersCacheDir,HashSeed,ZMQEndpoint, andEventBatchSize, with support for command-line flags, default values, and validation logic to ensure correct usage. Also, the code now readsHashSeedfrom thePYTHONHASHSEEDenvironment variable if not set. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR116-R129),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR185-R188),[[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR290-R301),[[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR346-R353),[[5]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-56da73936108421d5a3311d2eb69b8649cd188148cab46940795e848d4e8dbeaR388-R394))blockCacheto use the new configuration struct for initialization, including dynamic event batching, topic naming, and publisher setup. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681L47-R64),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681R221-R224))[pkg/kv-cache/block_cache.goL131-R135](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-4f7e65f6f540700e74fb5fc1432132384518109ee31fa67aa7f548fcd39a1681L131-R135))Dependency and Build Updates
github.com/llm-d/llm-d-kv-cache-managerto a newer version and updatedgithub.com/daulet/tokenizerstov1.22.1to match the imported version, including changes to the Dockerfile and Makefile for consistent tokenizer version handling. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L11-R11),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L30-R29),[[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557L26-R28),[[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R42-R51))go.modfor a cleaner build. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L20),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L40),[[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6L72))Event Publishing Improvements
msgpack/v5and configure the encoder for array-encoded structs, improving event batch serialization and compatibility. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482R20-R28),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482L65-R70),[[3]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-d28058992f73559fb3a5fd1eae56eceb8548c0b1a31f83fee179cf1a6b4be482L76-R81),[[4]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-11e396f9556390402b5e8eb73136dd18add33faf873f48f08b9a5ce4ae9f2d62L28-R28),[[5]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-11e396f9556390402b5e8eb73136dd18add33faf873f48f08b9a5ce4ae9f2d62R47-R48))CI/CD and Documentation
prereleaseinput to the Docker build GitHub Action and fixed the logic for handling pre-release builds. ([[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76cc00033cb7774949c1149f83441a859634155ccf9d2bf6dec2794d4b0cc107R16-R19),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-76cc00033cb7774949c1149f83441a859634155ccf9d2bf6dec2794d4b0cc107L35-R39))[README.mdR119-R127](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R119-R127))Testing Improvements
[[1]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b836ab7773c80e16c32843c9030f596048b5967467f691a35aae92f93125e71cR106-R112),[[2]](https://github.com/smarunich/llm-d-inference-sim/pull/1/files#diff-b836ab7773c80e16c32843c9030f596048b5967467f691a35aae92f93125e71cR286-R300))