Skip to content

Commit ba81b6d

Browse files
didier-durandskyloevil
authored andcommitted
[Doc]: fixing typos to improve docs (vllm-project#24480)
Signed-off-by: Didier Durand <[email protected]>
1 parent c1522cd commit ba81b6d

File tree

9 files changed

+12
-12
lines changed

9 files changed

+12
-12
lines changed

docs/features/tool_calling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ All Llama 3.1, 3.2 and 4 models should be supported.
169169

170170
The tool calling that is supported is the [JSON-based tool calling](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/#json-based-tool-calling). For [pythonic tool calling](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/text_prompt_format.md#zero-shot-function-calling) introduced by the Llama-3.2 models, see the `pythonic` tool parser below. As for Llama 4 models, it is recommended to use the `llama4_pythonic` tool parser.
171171

172-
Other tool calling formats like the built in python tool calling or custom tool calling are not supported.
172+
Other tool calling formats like the built-in python tool calling or custom tool calling are not supported.
173173

174174
Known issues:
175175

docs/getting_started/installation/gpu/rocm.inc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ Currently, there are no pre-built ROCm wheels.
119119
This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.
120120

121121
!!! tip
122-
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm up step before collecting perf numbers.
122+
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm-up step before collecting perf numbers.
123123
- Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
124124
- To use CK flash-attention or PyTorch naive attention, please use this flag `export VLLM_USE_TRITON_FLASH_ATTN=0` to turn off triton flash attention.
125125
- The ROCm version of PyTorch, ideally, should match the ROCm driver version.

examples/tool_chat_template_phi4_mini.jinja

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<|system|>
1010
{{ system_message }}
1111
{%- if tools %}
12-
In addition to plain text responses, you can chose to call one or more of the provided functions.
12+
In addition to plain text responses, you can choose to call one or more of the provided functions.
1313

1414
Use the following rule to decide when to call a function:
1515
* if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so
@@ -19,7 +19,7 @@ If you decide to call functions:
1919
* prefix function calls with functools marker (no closing marker required)
2020
* all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...]
2121
* follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples
22-
* respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0
22+
* respect the argument type formatting. E.g., if the type is number and format is float, write value 7 as 7.0
2323
* make sure you pick the right functions that match the user intent
2424

2525

tests/engine/test_executor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def collective_rpc(self,
2525
timeout: Optional[float] = None,
2626
args: tuple = (),
2727
kwargs: Optional[dict] = None) -> list[Any]:
28-
# Drop marker to show that this was ran
28+
# Drop marker to show that this was run
2929
with open(".marker", "w"):
3030
...
3131
return super().collective_rpc(method, timeout, args, kwargs)

tests/entrypoints/offline_mode/test_offline_mode.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def disable_connect(*args, **kwargs):
7979
)
8080

8181
# Need to re-import huggingface_hub
82-
# and friends to setup offline mode
82+
# and friends to set up offline mode
8383
_re_import_modules()
8484
# Cached model files should be used in offline mode
8585
for model_config in MODEL_CONFIGS:
@@ -136,7 +136,7 @@ def disable_connect(*args, **kwargs):
136136
disable_connect,
137137
)
138138
# Need to re-import huggingface_hub
139-
# and friends to setup offline mode
139+
# and friends to set up offline mode
140140
_re_import_modules()
141141
engine_args = EngineArgs(model="facebook/opt-125m")
142142
LLM(**dataclasses.asdict(engine_args))

tests/kernels/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1247,7 +1247,7 @@ def baseline_scaled_mm(a: torch.Tensor,
12471247
# then we would expand a to:
12481248
# a = [[1, 1, 2, 2],
12491249
# [3, 3, 4, 4]]
1250-
# NOTE this function this function does not explicitly broadcast dimensions
1250+
# NOTE this function does not explicitly broadcast dimensions
12511251
# with an extent of 1, since this can be done implicitly by pytorch
12521252
def group_broadcast(t, shape):
12531253
for i, s in enumerate(shape):

tests/models/language/generation/test_hybrid.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,7 @@ def test_fail_upon_inc_requests_and_finished_requests_lt_available_blocks(
301301
finished_requests_ids is larger than the maximum mamba block capacity.
302302
303303
This could generally happen due to the fact that hybrid does support
304-
statelessness mechanism where it can cleanup new incoming requests in
304+
statelessness mechanism where it can clean up new incoming requests in
305305
a single step.
306306
"""
307307
try:
@@ -322,7 +322,7 @@ def test_state_cleanup(
322322
This test is for verifying that the Hybrid state is cleaned up between
323323
steps.
324324
325-
If its not cleaned, an error would be expected.
325+
If it's not cleaned, an error would be expected.
326326
"""
327327
try:
328328
with vllm_runner(model, max_num_seqs=MAX_NUM_SEQS) as vllm_model:

tests/tpu/test_quantization_accuracy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ def get_model_args(self) -> str:
2828
expected_value=0.76), # no bias
2929
# NOTE(rob): We cannot re-initialize vLLM in the same process for TPU,
3030
# so only one of these tests can run in a single call to pytest. As
31-
# a follow up, move this into the LM-EVAL section of the CI.
31+
# a follow-up, move this into the LM-EVAL section of the CI.
3232
# GSM8KAccuracyTestConfig(
3333
# model_name="neuralmagic/Qwen2-7B-Instruct-quantized.w8a8",
3434
# expected_value=0.66), # bias in QKV layers

vllm/distributed/parallel_state.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1117,7 +1117,7 @@ def initialize_model_parallel(
11171117
"decode context model parallel group is already initialized")
11181118
# Note(hc): In the current implementation of decode context parallel,
11191119
# dcp_size must not exceed tp_size, because the world size does not
1120-
# change by DCP, it simply reuse the GPUs of TP group, and split one
1120+
# change by DCP, it simply reuses the GPUs of TP group, and split one
11211121
# TP group into tp_size//dcp_size DCP groups.
11221122
group_ranks = all_ranks.reshape(
11231123
-1, decode_context_model_parallel_size).unbind(0)

0 commit comments

Comments
 (0)