Skip to content

Commit 7545048

Browse files
Update test-pipeline.yaml (#599)
* Update test-pipeline.yaml Disabling the "Tensorizer Test". The test is seen to generate exceptions while still reporting as successful. That needs to be verified before re-enabling the test in the production environment. Signed-off-by: Alexei V. Ivanov <[email protected]> * Fixing pre-commit complaints. Signed-off-by: Alexei V. Ivanov <[email protected]> * . Signed-off-by: Alexei V. Ivanov <[email protected]> --------- Signed-off-by: Alexei V. Ivanov <[email protected]>
1 parent f94ec9b commit 7545048

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

.buildkite/test-pipeline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -420,7 +420,7 @@ steps:
420420
- pytest -v -s kernels/mamba
421421

422422
- label: Tensorizer Test # 11min
423-
mirror_hardwares: [amdexperimental, amdproduction]
423+
mirror_hardwares: [amdexperimental]
424424
soft_fail: true
425425
source_file_dependencies:
426426
- vllm/model_executor/model_loader

docs/dev-docker/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,8 @@ python3 /app/vllm/benchmarks/benchmark_throughput.py \
291291
--num-prompts $PROMPTS \
292292
--max-num-seqs $MAX_NUM_SEQS
293293
```
294-
For FP16 models, remove `--kv-cache-dtype fp8`.
294+
295+
For FP16 models, remove `--kv-cache-dtype fp8`.
295296

296297
When measuring models with long context lengths, performance may improve by setting `--max-model-len` to a smaller value (8192 in this example). It is important, however, to ensure that the `--max-model-len` is at least as large as the IN + OUT token counts.
297298

@@ -325,6 +326,7 @@ vllm serve amd/Llama-3.1-70B-Instruct-FP8-KV \
325326
--gpu-memory-utilization 0.99 \
326327
--num_scheduler-steps 10
327328
```
329+
328330
For FP16 models, remove `--kv-cache-dtype fp8`. Change port (for example --port 8005) if port=8000 is currently being used by other processes.
329331

330332
Run client in a separate terminal. Use port_id from previous step else port-id=8000.

0 commit comments

Comments
 (0)