Skip to content

Commit 1bd007f

Browse files
authored
fix some typos (vllm-project#24071)
Signed-off-by: co63oc <[email protected]>
1 parent 136d853 commit 1bd007f

File tree

32 files changed

+39
-39
lines changed

32 files changed

+39
-39
lines changed

benchmarks/benchmark_block_pool.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ def invoke_main() -> None:
5757
"--num-iteration",
5858
type=int,
5959
default=1000,
60-
help="Number of iterations to run to stablize final data readings",
60+
help="Number of iterations to run to stabilize final data readings",
6161
)
6262
parser.add_argument(
6363
"--allocate-blocks",

benchmarks/benchmark_ngram_proposer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def invoke_main() -> None:
7777
"--num-iteration",
7878
type=int,
7979
default=100,
80-
help="Number of iterations to run to stablize final data readings",
80+
help="Number of iterations to run to stabilize final data readings",
8181
)
8282
parser.add_argument(
8383
"--num-req", type=int, default=128, help="Number of requests in the batch"

csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ struct W4A8GemmKernel {
181181
auto A_ptr = static_cast<MmaType const*>(A.const_data_ptr());
182182
auto B_ptr = static_cast<QuantType const*>(B.const_data_ptr());
183183
auto D_ptr = static_cast<ElementD*>(D.data_ptr());
184-
// can we avoid harcode the 8 here
184+
// can we avoid hardcode the 8 here
185185
auto S_ptr =
186186
static_cast<cutlass::Array<ElementScale, ScalePackSize> const*>(
187187
group_scales.const_data_ptr());

docs/configuration/optimization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ vllm serve Qwen/Qwen2.5-VL-3B-Instruct --api-server-count 4 -dp 2
210210

211211
!!! note
212212
API server scale-out disables [multi-modal IPC caching](#ipc-caching)
213-
because it requires a one-to-one correspondance between API and engine core processes.
213+
because it requires a one-to-one correspondence between API and engine core processes.
214214

215215
This does not impact [multi-modal processor caching](#processor-caching).
216216

@@ -227,7 +227,7 @@ to avoid repeatedly processing the same multi-modal inputs in `BaseMultiModalPro
227227
### IPC Caching
228228

229229
Multi-modal IPC caching is automatically enabled when
230-
there is a one-to-one correspondance between API (`P0`) and engine core (`P1`) processes,
230+
there is a one-to-one correspondence between API (`P0`) and engine core (`P1`) processes,
231231
to avoid repeatedly transferring the same multi-modal inputs between them.
232232

233233
### Configuration

docs/design/io_processor_plugins.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
IO Processor plugins are a feature that allows pre and post processing of the model input and output for pooling models. The idea is that users are allowed to pass a custom input to vLLM that is converted into one or more model prompts and fed to the model `encode` method. One potential use-case of such plugins is that of using vLLM for generating multi-modal data. Say users feed an image to vLLM and get an image in output.
44

5-
When performing an inference with IO Processor plugins, the prompt type is defined by the plugin and the same is valid for the final request output. vLLM does not perform any validation of input/output data, and it is up to the plugin to ensure the correct data is being fed to the model and returned to the user. As of now these plugins support only pooling models and can be triggerd via the `encode` method in `LLM` and `AsyncLLM`, or in online serving mode via the `/pooling` endpoint.
5+
When performing an inference with IO Processor plugins, the prompt type is defined by the plugin and the same is valid for the final request output. vLLM does not perform any validation of input/output data, and it is up to the plugin to ensure the correct data is being fed to the model and returned to the user. As of now these plugins support only pooling models and can be triggered via the `encode` method in `LLM` and `AsyncLLM`, or in online serving mode via the `/pooling` endpoint.
66

77
## Writing an IO Processor Plugin
88

examples/offline_inference/prithvi_geospatial_mae_io_processor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# multimodal data. In this specific case this example will take a geotiff
1313
# image as input, process it using the multimodal data processor, and
1414
# perform inference.
15-
# Reuirement - install plugin at:
15+
# Requirement - install plugin at:
1616
# https://github.com/christian-pinto/prithvi_io_processor_plugin
1717

1818

examples/online_serving/prithvi_geospatial_mae.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
# multimodal data. In this specific case this example will take a geotiff
1111
# image as input, process it using the multimodal data processor, and
1212
# perform inference.
13-
# Reuirements :
13+
# Requirements :
1414
# - install plugin at:
1515
# https://github.com/christian-pinto/prithvi_io_processor_plugin
1616
# - start vllm in serving mode with the below args

tests/compile/piecewise/test_multiple_graphs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ def __init__(self,
134134
# Test will fail without set_model_tag here with error:
135135
# "ValueError: too many values to unpack (expected 3)"
136136
# This is because CompiledAttention and CompiledAttentionTwo
137-
# have different implmentations but the same torch.compile
137+
# have different implementations but the same torch.compile
138138
# cache dir will be used as default prefix is 'model_tag'
139139
with set_model_tag("attn_one"):
140140
self.attn_one = CompiledAttention(

tests/kernels/moe/test_mxfp4_moe.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ def tg_mxfp4_moe(
224224
assert (w2_bias.dim() == 2 and w2_bias.shape[0] == num_experts
225225
and w2_bias.shape[1] == hidden_size)
226226

227-
# Swap w1 and w3 as the defenition of
227+
# Swap w1 and w3 as the definition of
228228
# swiglu is different in the trtllm-gen
229229
w13_weight_scale_ = w13_weight_scale.clone()
230230
w13_weight_ = w13_weight.clone()

tests/models/multimodal/processing/test_mllama4.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def test_profiling(model_id: str, max_model_len: int):
5252
chunks_per_image = prod(mm_data["patches_per_image"])
5353
total_num_patches = chunks_per_image * tokens_per_patch
5454
num_tiles = mm_data["aspect_ratios"][0][0] * mm_data["aspect_ratios"][0][
55-
1] # x-y seperator tokens
55+
1] # x-y separator tokens
5656
total_tokens = total_num_patches.item() + num_tiles.item(
5757
) + 3 # image start, image, image end
5858

0 commit comments

Comments
 (0)