29 Oct 11:06

mayabar

9a57299

v0.6.0 Latest

Latest

What's Changed

New requests queue by @irar2 in #214
Make writing to channels non-blocking by @irar2 in #225
Change packages' dependencies by @irar2 in #229
Added port header to response by @irar2 in #232
Test fix: number of running requests can be one request less when scheduling requests by @irar2 in #231
fix occasional ttft and tpot metrics test failures by @mayabar in #233
Configure the tool_choice option to use a specific tool by @MondayCha in #234
Additional latency related metrics by @mayabar in #237
Changed random from static to a field in the simulator by @irar2 in #238
Made workers' requests channel non-blocking by @irar2 in #239

New Contributors

@MondayCha made their first contribution in #234

Full Changelog: v0.5.2...v0.6.0

Contributors

mayabar, irar2, and MondayCha

Assets 2

22 Oct 07:48

mayabar

v0.5.2

1c3d559

v0.5.2

What's Changed

Use custom dataset as response source by @pancak3 in #200
Add vllm:time_per_output_token_seconds and vllm:time_to_first_token_seconds metrics by @mayabar in #217
Use openai-go v3.6.1 in the tests by @irar2 in #223
feat(metrics): add request prompt, generation, max_tokens and success metrics by @googs1025 in #202

Full Changelog: v0.5.1...v0.5.2

Contributors

mayabar, irar2, and 2 other contributors

Assets 2

18 Sep 15:08

shmuelk

v0.5.1

b8eb7a4

v0.5.1

New Features

The llm-d-inference-sim server can be run in TLS mode with the certificate and key supplied by the user or automatically generated.

What's Changed

Add golangci-lint version check by @npolshakova in #160
feat(server): enables TLS mode by @bartoszmajsak in #205
fix(make): properly resolves package manager for ZMQ installation by @bartoszmajsak in #204
feat(make): simplifies local tooling installation by @bartoszmajsak in #203

New Contributors

@bartoszmajsak made their first contribution in #205

Full Changelog: v0.5.0...v0.5.1

Contributors

bartoszmajsak and npolshakova

Assets 2

16 Sep 06:54

mayabar

v0.5.0

9c541b9

v0.5.0

New features

Processing time is affected by server load
Change TTFT parameter to be based on number of request tokens
KV cache affects prefill time
Support failure injection
Implement kv-cache usage and waiting loras Prometheus metrics
Randomize response length based when max-tokens is defined in the request
Support DP (data parallel)
Support /tokenize endpoint

What's Changed

Fix server interrupt by @npolshakova in #161
Show final config in simulaor default logger at Info lvel by @pancak3 in #154
Cast bounds type in tests to func def: latency, interToken, and timeToFirst (to int) by @pancak3 in #163
Remvoe unnecessary deferal of server close by @pancak3 in #162
Fix: Rand generator is not set in a test suite which result in accessing nil pointer during runtime if run the only test suite by @pancak3 in #166
Use channels for metrics updates, added metrics tests by @irar2 in #171
Remove rerun on comment action by @irar2 in #174
Add failure injection mode to simulator by @smarunich in #131
Add waiting loras list to loraInfo metrics by @mayabar in #175
feat: generate response length based on a histogram when max_tokens is defined in the request by @mayabar in #169
extend response length buckets calculation to have not necessary equally sized buckets by @mayabar in #176
Use dynamic ports in zmq tests by @pancak3 in #170
Change time-to-first-token parameter to be based on number of request tokens #137 by @pancak3 in #165
Bugfix: was accessing number of tokens from nil var; getting it from req instead by @pancak3 in #177
feat: add helm charts for Kubernetes deployment by @Blackoutta in #182
chore: Make the image smaller by @shmuelk in #183
Take cached prompt tokens into account in prefill time calculation by @irar2 in #184
Add ignore eos in request by @pancak3 in #187
Support DP by @irar2 in #188
Change RandomNorm from float types to int by @pancak3 in #190
KV cache usage metric by @irar2 in #192
Adjust request "processing time" to current load by @pancak3 in #189
Updates for the new release of kv-cache-manager by @irar2 in #194
DP bug fix: wait after starting rank 0 sim by @irar2 in #193
Support /tokenize endpoint by @irar2 in #198
add Service to expose vLLM deployment and update doc by @googs1025 in #201
Split simulator.go into several files by @irar2 in #199

New Contributors

@smarunich made their first contribution in #131
@Blackoutta made their first contribution in #182
@googs1025 made their first contribution in #201

Full Changelog: v0.4.0...v0.5.0

Contributors

mayabar, smarunich, and 6 other contributors

Assets 2

21 Aug 10:19

mayabar

v0.4.0

4076bd2

v0.4.0

New Features

KV Cache support: request prompts are tokenized, divided to blocks, hash values are calculated and stored in a cache. KV events batches are published for store/remove of a block from the cache.
Fake metrics: the configuration can contain a predefined set of metrics to be sent to Prometheus as a substitute for the actual data. When specified, only these fake metrics will be reported — real metrics and fake metrics will never be reported together.
Adds headers for pod name and namespace to help in tests to be able to check which vLLM instance actually received the request.

What's Changed

Publish kv-cache events by @irar2 in #126
Use same version of tokenizer in both Dockerfile and Makefile by @mayabar in #132
use newer version of kvcache-manager, update code accordingly by @mayabar in #133
Add support to echo the sim's pod name and namespace by @npolshakova in #128
Create UUID string under a lock by @irar2 in #143
Support fake metrics by @irar2 in #144
fix: Makefile fixes for MacOS by @shmuelk in #146
Use kv-cache-manager version v0.2.1 by @mayabar in #147
Present kv-cache related configuration parameters in readme file by @mayabar in #149
updated readme file - added environment variables by @mayabar in #151
Fix zmq endpoints in test cases by @pancak3 in #150
Change user to not be root in the dockerfile by @mayabar in #153
Add ZMQ connection retry configuration by @zhengkezhou1 in #152
Added CI automation by @shmuelk in #155
small changes in texts by @mayabar in #156

New Contributors

@npolshakova made their first contribution in #128
@pancak3 made their first contribution in #150
@zhengkezhou1 made their first contribution in #152

Full Changelog: v0.3.2...v0.4.0

Contributors

mayabar, npolshakova, and 4 other contributors

Assets 2

07 Aug 08:48

shmuelk

v0.3.2

9bbb64d

v0.3.2

What's Changed

Work on the CI pipeline
Additional work on KV-Cache support, work in progress

Change details

KV cache and tokenization related configuration by @irar2 in #125
Another attempt at adding a latest tag only on release builds by @shmuelk in #124

Full Changelog: v0.3.1...v0.3.2

Contributors

shmuelk and irar2

Assets 2

06 Aug 14:53

shmuelk

v0.3.1

0308c8f

v0.3.1

What's Changed

Support long responses
Beginnings of Kv cache support, Work in Progress

Change details

Support long responses and additional fixes by @mayabar in #104
Change project structure - separate main package to three by @mayabar in #105
Code reorganization: moved configuration related code to common by @irar2 in #109
Kv cache support without KV events by @mayabar in #107
chore: Added a LICENSE file by @shmuelk in #117
feat: Add new issue templates by @shmuelk in #114
chore: Added common badges by @shmuelk in #118
ZMQ publisher by @irar2 in #119
Only create image with latest tag on release by @shmuelk in #120
Kv events sender by @mayabar in #121
Add definition of new action input by @shmuelk in #123

Full Changelog: v0.3.0...v0.3.1

Contributors

mayabar, shmuelk, and irar2

Assets 2

06 Aug 13:58

shmuelk

v0.3.1-rc.2

0308c8f

v0.3.1-rc.2 Pre-release

Pre-release

What's Changed

Support long responses
Initial work on KV Cache event support, still Work In Progress

Details of changes

Support long responses and additional fixes by @mayabar in #104
Change project structure - separate main package to three by @mayabar in #105
Code reorganization: moved configuration related code to common by @irar2 in #109
Kv cache support without KV events by @mayabar in #107
chore: Added a LICENSE file by @shmuelk in #117
feat: Add new issue templates by @shmuelk in #114
chore: Added common badges by @shmuelk in #118
ZMQ publisher by @irar2 in #119
Only create image with latest tag on release by @shmuelk in #120
Kv events sender by @mayabar in #121
Add definition of new action input by @shmuelk in #123

Full Changelog: v0.3.0...v0.3.1-rc.2

Contributors

mayabar, shmuelk, and irar2

Assets 2

20 Jul 08:29

shmuelk

v0.3.0

7f1f766

v0.3.0 Pre-release

Pre-release

Release Notes

Compatibility with vLLM

Aligned command-line parameters with real vLLM. All parameters supported by both the simulator and the vLLM now share the same name and format:
- Support for --served-model-name
- Support for --seed
- Support for --max-model-len
Added support for tools in chat completions
Included usage in the response
Added object field to the response JSON
Added support for multimodal inputs in chat completions
Added health and readiness endpoints
Added P/D support; the connector type must be set to nixl

Additional Features

Introduced configuration file support. All parameters can now be loaded from a configuration file in addition to being set via the command line.
Added new test coverage
Changed the Docker base image
Added the ability to randomize time to first token, inter token latency, and KV-cache transfer latency

Migration Notes (for users upgrading from versions prior to v0.2.0)

max-running-requests has been renamed to max-num-seqs
lora has been replaced by lora-modules, which now accepts a list of JSON strings, e.g, '{"name": "name", "path": "lora_path", "base_model_name": "id"}'

Change details since v0.2.2

feat: add max-model-len configuration and validation for context window (#82) by @mohitpalsingh in #85
Fixed readme, removed error for --help by @irar2 in #89
Pd support by @mayabar in #94
fix: crash when omitted stream_options by @jasonmadigan in #95
style: 🔨 splits all import blocks into different sections by @yafengio in #98
Fixed deployment.yaml by @irar2 in #99
Enable configuration of various parameters in tools by @irar2 in #100
Choose latencies randomly by @irar2 in #103

New Contributors

@mohitpalsingh made their first contribution in #85
@jasonmadigan made their first contribution in #95

Full Changelog: v0.2.2...v0.3.0

Contributors

jasonmadigan, mayabar, and 3 other contributors

Assets 2

13 Jul 10:02

mayabar

v0.2.2

7656a3c

v0.2.2 Pre-release

Pre-release

What's Changed

Initialize rand once, added seed to configuration by @irar2 in #79
use string when storing lora adapters in simulator by @mayabar in #81
Improved support for empty command line arguments by @irar2 in #80
Added tests for LoRA configuration, load and unload by @irar2 in #86

Full Changelog: v0.2.1...v0.2.2

Contributors

mayabar and irar2

Assets 2

Uh oh!

Releases: llm-d/llm-d-inference-sim

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.2

What's Changed

Contributors

Uh oh!

v0.5.1

New Features

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.0

New features

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

New Features

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.2

What's Changed

Change details

Contributors

Uh oh!

v0.3.1

What's Changed

Change details

Contributors

Uh oh!

v0.3.1-rc.2

What's Changed

Details of changes

Contributors

Uh oh!

v0.3.0

Release Notes

Compatibility with vLLM

Additional Features

Migration Notes (for users upgrading from versions prior to v0.2.0)

Change details since v0.2.2

New Contributors

Contributors

Uh oh!

v0.2.2

What's Changed

Contributors

Uh oh!