You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Dockerfile
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,9 @@ COPY . .
23
23
24
24
# HuggingFace tokenizer bindings
25
25
RUN mkdir -p lib
26
-
RUN curl -L https://github.com/daulet/tokenizers/releases/download/v1.22.1/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
26
+
# Ensure that the TOKENIZER_VERSION matches the one used in the imported llm-d-kv-cache-manager version
27
+
ARG TOKENIZER_VERSION=v1.22.1
28
+
RUN curl -L https://github.com/daulet/tokenizers/releases/download/${TOKENIZER_VERSION}/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
Copy file name to clipboardExpand all lines: README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,15 +120,16 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
120
120
-`min-tool-call-array-param-length`: the minimum possible length of array parameters in a tool call, optional, defaults to 1
121
121
-`tool-call-not-required-param-probability`: the probability to add a parameter, that is not required, in a tool call, optional, defaults to 50
122
122
-`object-tool-call-not-required-field-probability`: the probability to add a field, that is not required, in an object in a tool call, optional, defaults to 50
123
-
-`enable-kvcache`: if true, the KV cache support will be enabled in the simulator. In this case, the KV cache will be simulated, and ZQM events will be published when a KV cache block is added or evicted.
123
+
-`enable-kvcache`: if true, the KV cache support will be enabled in the simulator. In this case, the KV cache will be simulated, and ZQM events will be published when a KV cache block is added or evicted.
124
124
-`kv-cache-size`: the maximum number of token blocks in kv cache
125
125
-`block-size`: token block size for contiguous chunks of tokens, possible values: 8,16,32,64,128
126
126
-`tokenizers-cache-dir`: the directory for caching tokenizers
127
127
-`hash-seed`: seed for hash generation (if not set, is read from PYTHONHASHSEED environment variable)
128
128
-`zmq-endpoint`: ZMQ address to publish events
129
129
-`failure-injection-rate`: probability (0-100) of injecting failures when in failure mode, optional, default is 10
130
130
-`failure-types`: list of specific failure types to inject (rate_limit, invalid_api_key, context_length, server_error, invalid_request, model_not_found), optional, if empty all types are used
131
-
131
+
-`event-batch-size`: the maximum number of kv-cache events to be sent together, defaults to 16
132
+
-->
132
133
In addition, as we are using klog, the following parameters are available:
133
134
-`add_dir_header`: if true, adds the file directory to the header of the log messages
134
135
-`alsologtostderr`: log to standard error as well as files (no effect when -logtostderr=true)
f.StringVar(&config.ZMQEndpoint, "zmq-endpoint", config.ZMQEndpoint, "ZMQ address to publish events")
387
387
f.IntVar(&config.EventBatchSize, "event-batch-size", config.EventBatchSize, "Maximum number of kv-cache events to be sent together")
388
388
389
-
f.IntVar(&config.FailureInjectionRate, "failure-injection-rate", config.FailureInjectionRate, "Probability (0-100) of injecting failures when in failure mode")
389
+
f.IntVar(&config.FailureInjectionRate, "failure-injection-rate", config.FailureInjectionRate, "Probability (0-100) of injecting failures when in failure mode")
0 commit comments