You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use same version of tokenizer in both Dockerfile and Makefile (#132)
* - Use same version of tokenizer in both Dockerfile and Makefile
- Fixes in readme file
Signed-off-by: Maya Barnea <[email protected]>
* updates according PR's review
Signed-off-by: Maya Barnea <[email protected]>
---------
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Sergey Marunich <[email protected]>
Copy file name to clipboardExpand all lines: Dockerfile
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,9 @@ COPY . .
23
23
24
24
# HuggingFace tokenizer bindings
25
25
RUN mkdir -p lib
26
-
RUN curl -L https://github.com/daulet/tokenizers/releases/download/v1.22.1/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
26
+
# Ensure that the TOKENIZER_VERSION matches the one used in the imported llm-d-kv-cache-manager version
27
+
ARG TOKENIZER_VERSION=v1.22.1
28
+
RUN curl -L https://github.com/daulet/tokenizers/releases/download/${TOKENIZER_VERSION}/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
Copy file name to clipboardExpand all lines: README.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,13 +116,15 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
116
116
-`min-tool-call-array-param-length`: the minimum possible length of array parameters in a tool call, optional, defaults to 1
117
117
-`tool-call-not-required-param-probability`: the probability to add a parameter, that is not required, in a tool call, optional, defaults to 50
118
118
-`object-tool-call-not-required-field-probability`: the probability to add a field, that is not required, in an object in a tool call, optional, defaults to 50
119
-
-`enable-kvcache`: if true, the KV cache support will be enabled in the simulator. In this case, the KV cache will be simulated, and ZQM events will be published when a KV cache block is added or evicted.
119
+
<!--
120
+
- `enable-kvcache`: if true, the KV cache support will be enabled in the simulator. In this case, the KV cache will be simulated, and ZQM events will be published when a KV cache block is added or evicted.
120
121
- `kv-cache-size`: the maximum number of token blocks in kv cache
121
122
- `block-size`: token block size for contiguous chunks of tokens, possible values: 8,16,32,64,128
122
123
- `tokenizers-cache-dir`: the directory for caching tokenizers
123
124
- `hash-seed`: seed for hash generation (if not set, is read from PYTHONHASHSEED environment variable)
124
125
- `zmq-endpoint`: ZMQ address to publish events
125
-
126
+
- `event-batch-size`: the maximum number of kv-cache events to be sent together, defaults to 16
127
+
-->
126
128
In addition, as we are using klog, the following parameters are available:
127
129
-`add_dir_header`: if true, adds the file directory to the header of the log messages
128
130
-`alsologtostderr`: log to standard error as well as files (no effect when -logtostderr=true)
0 commit comments