Skip to content

Commit 00976db

Browse files
authored
[Docs] Fix warnings in docs build (#22588)
Signed-off-by: Harry Mellor <[email protected]>
1 parent d411df0 commit 00976db

File tree

10 files changed

+80
-90
lines changed

10 files changed

+80
-90
lines changed

docs/api/summary.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# Summary
22

3-
[](){ #configuration }
4-
53
## Configuration
64

75
API documentation for vLLM's configuration classes.

docs/configuration/tpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Although it’s common to do this with GPUs, don't try to fragment 2 or 8 differ
9696

9797
### Tune your workloads
9898

99-
Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](../../benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
99+
Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](gh-file:benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
100100

101101
### Future Topics We'll Cover
102102

docs/contributing/model/multimodal.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -540,8 +540,10 @@ return a schema of the tensors outputted by the HF processor that are related to
540540
The shape of `image_patches` outputted by `FuyuImageProcessor` is therefore
541541
`(1, num_images, num_patches, patch_width * patch_height * num_channels)`.
542542

543-
In order to support the use of [MultiModalFieldConfig.batched][] like in LLaVA,
544-
we remove the extra batch dimension by overriding [BaseMultiModalProcessor._call_hf_processor][]:
543+
In order to support the use of
544+
[MultiModalFieldConfig.batched][vllm.multimodal.inputs.MultiModalFieldConfig.batched]
545+
like in LLaVA, we remove the extra batch dimension by overriding
546+
[BaseMultiModalProcessor._call_hf_processor][vllm.multimodal.processing.BaseMultiModalProcessor._call_hf_processor]:
545547

546548
??? code
547549

@@ -816,7 +818,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
816818
After you have defined [BaseProcessingInfo][vllm.multimodal.processing.BaseProcessingInfo] (Step 2),
817819
[BaseDummyInputsBuilder][vllm.multimodal.profiling.BaseDummyInputsBuilder] (Step 3),
818820
and [BaseMultiModalProcessor][vllm.multimodal.processing.BaseMultiModalProcessor] (Step 4),
819-
decorate the model class with [MULTIMODAL_REGISTRY.register_processor][vllm.multimodal.processing.MultiModalRegistry.register_processor]
821+
decorate the model class with [MULTIMODAL_REGISTRY.register_processor][vllm.multimodal.registry.MultiModalRegistry.register_processor]
820822
to register them to the multi-modal registry:
821823

822824
```diff

docs/models/generative_models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ vLLM provides first-class support for generative models, which covers most of LL
44

55
In vLLM, generative models implement the[VllmModelForTextGeneration][vllm.model_executor.models.VllmModelForTextGeneration] interface.
66
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
7-
which are then passed through [Sampler][vllm.model_executor.layers.Sampler] to obtain the final text.
7+
which are then passed through [Sampler][vllm.model_executor.layers.sampler.Sampler] to obtain the final text.
88

99
## Configuration
1010

@@ -19,7 +19,7 @@ Run a model in generation mode via the option `--runner generate`.
1919
## Offline Inference
2020

2121
The [LLM][vllm.LLM] class provides various methods for offline inference.
22-
See [configuration][configuration] for a list of options when initializing the model.
22+
See [configuration](../api/summary.md#configuration) for a list of options when initializing the model.
2323

2424
### `LLM.generate`
2525

docs/models/pooling_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ which takes priority over both the model's and Sentence Transformers's defaults.
8181
## Offline Inference
8282

8383
The [LLM][vllm.LLM] class provides various methods for offline inference.
84-
See [configuration][configuration] for a list of options when initializing the model.
84+
See [configuration](../api/summary.md#configuration) for a list of options when initializing the model.
8585

8686
### `LLM.embed`
8787

docs/models/supported_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -770,7 +770,7 @@ The following table lists those that are tested in vLLM.
770770
Cross-encoder and reranker models are a subset of classification models that accept two prompts as input.
771771
These models primarily support the [`LLM.score`](./pooling_models.md#llmscore) API.
772772

773-
| Architecture | Models | Inputs | Example HF Models | [LoRA][lora-adapter] | [PP][parallelism-scaling] | [V1](gh-issue:8779) |
773+
| Architecture | Models | Inputs | Example HF Models | [LoRA](../features/lora.md) | [PP](../serving/parallelism_scaling.md) | [V1](gh-issue:8779) |
774774
|-------------------------------------|--------------------|----------|--------------------------|------------------------|-----------------------------|-----------------------|
775775
| `JinaVLForSequenceClassification` | JinaVL-based | T + I<sup>E+</sup> | `jinaai/jina-reranker-m0`, etc. | | | ✅︎ |
776776

vllm/attention/layers/__init__.py

Whitespace-only changes.

vllm/inputs/__init__.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
# SPDX-License-Identifier: Apache-2.0
22
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
33

4-
from .data import (DecoderOnlyInputs, EmbedsInputs, EncoderDecoderInputs,
5-
ExplicitEncoderDecoderPrompt, ProcessorInputs, PromptType,
6-
SingletonInputs, SingletonPrompt, TextPrompt, TokenInputs,
7-
TokensPrompt, build_explicit_enc_dec_prompt, embeds_inputs,
4+
from .data import (DecoderOnlyInputs, EmbedsInputs, EmbedsPrompt,
5+
EncoderDecoderInputs, ExplicitEncoderDecoderPrompt,
6+
ProcessorInputs, PromptType, SingletonInputs,
7+
SingletonPrompt, TextPrompt, TokenInputs, TokensPrompt,
8+
build_explicit_enc_dec_prompt, embeds_inputs,
89
to_enc_dec_tuple_list, token_inputs, zip_enc_dec_prompts)
910
from .registry import (DummyData, InputContext, InputProcessingContext,
1011
InputRegistry)
@@ -24,6 +25,7 @@
2425
"ExplicitEncoderDecoderPrompt",
2526
"TokenInputs",
2627
"EmbedsInputs",
28+
"EmbedsPrompt",
2729
"token_inputs",
2830
"embeds_inputs",
2931
"DecoderOnlyInputs",

vllm/model_executor/warmup/__init__.py

Whitespace-only changes.

vllm/sampling_params.py

Lines changed: 64 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -103,113 +103,89 @@ class SamplingParams(
103103
Overall, we follow the sampling parameters from the OpenAI text completion
104104
API (https://platform.openai.com/docs/api-reference/completions/create).
105105
In addition, we support beam search, which is not supported by OpenAI.
106-
107-
Args:
108-
n: Number of output sequences to return for the given prompt.
109-
best_of: Number of output sequences that are generated from the prompt.
110-
From these `best_of` sequences, the top `n` sequences are returned.
111-
`best_of` must be greater than or equal to `n`. By default,
112-
`best_of` is set to `n`. Warning, this is only supported in V0.
113-
presence_penalty: Float that penalizes new tokens based on whether they
114-
appear in the generated text so far. Values > 0 encourage the model
115-
to use new tokens, while values < 0 encourage the model to repeat
116-
tokens.
117-
frequency_penalty: Float that penalizes new tokens based on their
118-
frequency in the generated text so far. Values > 0 encourage the
119-
model to use new tokens, while values < 0 encourage the model to
120-
repeat tokens.
121-
repetition_penalty: Float that penalizes new tokens based on whether
122-
they appear in the prompt and the generated text so far. Values > 1
123-
encourage the model to use new tokens, while values < 1 encourage
124-
the model to repeat tokens.
125-
temperature: Float that controls the randomness of the sampling. Lower
126-
values make the model more deterministic, while higher values make
127-
the model more random. Zero means greedy sampling.
128-
top_p: Float that controls the cumulative probability of the top tokens
129-
to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
130-
top_k: Integer that controls the number of top tokens to consider. Set
131-
to 0 (or -1) to consider all tokens.
132-
min_p: Float that represents the minimum probability for a token to be
133-
considered, relative to the probability of the most likely token.
134-
Must be in [0, 1]. Set to 0 to disable this.
135-
seed: Random seed to use for the generation.
136-
stop: list of strings that stop the generation when they are generated.
137-
The returned output will not contain the stop strings.
138-
stop_token_ids: list of tokens that stop the generation when they are
139-
generated. The returned output will contain the stop tokens unless
140-
the stop tokens are special tokens.
141-
bad_words: list of words that are not allowed to be generated.
142-
More precisely, only the last token of a corresponding
143-
token sequence is not allowed when the next generated token
144-
can complete the sequence.
145-
include_stop_str_in_output: Whether to include the stop strings in
146-
output text. Defaults to False.
147-
ignore_eos: Whether to ignore the EOS token and continue generating
148-
tokens after the EOS token is generated.
149-
max_tokens: Maximum number of tokens to generate per output sequence.
150-
min_tokens: Minimum number of tokens to generate per output sequence
151-
before EOS or stop_token_ids can be generated
152-
logprobs: Number of log probabilities to return per output token.
153-
When set to None, no probability is returned. If set to a non-None
154-
value, the result includes the log probabilities of the specified
155-
number of most likely tokens, as well as the chosen tokens.
156-
Note that the implementation follows the OpenAI API: The API will
157-
always return the log probability of the sampled token, so there
158-
may be up to `logprobs+1` elements in the response.
159-
When set to -1, return all `vocab_size` log probabilities.
160-
prompt_logprobs: Number of log probabilities to return per prompt token.
161-
detokenize: Whether to detokenize the output. Defaults to True.
162-
skip_special_tokens: Whether to skip special tokens in the output.
163-
spaces_between_special_tokens: Whether to add spaces between special
164-
tokens in the output. Defaults to True.
165-
logits_processors: list of functions that modify logits based on
166-
previously generated tokens, and optionally prompt tokens as
167-
a first argument.
168-
truncate_prompt_tokens: If set to -1, will use the truncation size
169-
supported by the model. If set to an integer k, will use only
170-
the last k tokens from the prompt (i.e., left truncation).
171-
Defaults to None (i.e., no truncation).
172-
guided_decoding: If provided, the engine will construct a guided
173-
decoding logits processor from these parameters. Defaults to None.
174-
logit_bias: If provided, the engine will construct a logits processor
175-
that applies these logit biases. Defaults to None.
176-
allowed_token_ids: If provided, the engine will construct a logits
177-
processor which only retains scores for the given token ids.
178-
Defaults to None.
179-
extra_args: Arbitrary additional args, that can be used by custom
180-
sampling implementations, plugins, etc. Not used by any in-tree
181-
sampling implementations.
182106
"""
183107

184108
n: int = 1
109+
"""Number of output sequences to return for the given prompt."""
185110
best_of: Optional[int] = None
111+
"""Number of output sequences that are generated from the prompt. From
112+
these `best_of` sequences, the top `n` sequences are returned. `best_of`
113+
must be greater than or equal to `n`. By default, `best_of` is set to `n`.
114+
Warning, this is only supported in V0."""
186115
_real_n: Optional[int] = None
187116
presence_penalty: float = 0.0
117+
"""Penalizes new tokens based on whether they appear in the generated text
118+
so far. Values > 0 encourage the model to use new tokens, while values < 0
119+
encourage the model to repeat tokens."""
188120
frequency_penalty: float = 0.0
121+
"""Penalizes new tokens based on their frequency in the generated text so
122+
far. Values > 0 encourage the model to use new tokens, while values < 0
123+
encourage the model to repeat tokens."""
189124
repetition_penalty: float = 1.0
125+
"""Penalizes new tokens based on whether they appear in the prompt and the
126+
generated text so far. Values > 1 encourage the model to use new tokens,
127+
while values < 1 encourage the model to repeat tokens."""
190128
temperature: float = 1.0
129+
"""Controls the randomness of the sampling. Lower values make the model
130+
more deterministic, while higher values make the model more random. Zero
131+
means greedy sampling."""
191132
top_p: float = 1.0
133+
"""Controls the cumulative probability of the top tokens to consider. Must
134+
be in (0, 1]. Set to 1 to consider all tokens."""
192135
top_k: int = 0
136+
"""Controls the number of top tokens to consider. Set to 0 (or -1) to
137+
consider all tokens."""
193138
min_p: float = 0.0
139+
"""Represents the minimum probability for a token to be considered,
140+
relative to the probability of the most likely token. Must be in [0, 1].
141+
Set to 0 to disable this."""
194142
seed: Optional[int] = None
143+
"""Random seed to use for the generation."""
195144
stop: Optional[Union[str, list[str]]] = None
145+
"""String(s) that stop the generation when they are generated. The returned
146+
output will not contain the stop strings."""
196147
stop_token_ids: Optional[list[int]] = None
148+
"""Token IDs that stop the generation when they are generated. The returned
149+
output will contain the stop tokens unless the stop tokens are special
150+
tokens."""
197151
ignore_eos: bool = False
152+
"""Whether to ignore the EOS token and continue generating
153+
tokens after the EOS token is generated."""
198154
max_tokens: Optional[int] = 16
155+
"""Maximum number of tokens to generate per output sequence."""
199156
min_tokens: int = 0
157+
"""Minimum number of tokens to generate per output sequence before EOS or
158+
`stop_token_ids` can be generated"""
200159
logprobs: Optional[int] = None
160+
"""Number of log probabilities to return per output token. When set to
161+
`None`, no probability is returned. If set to a non-`None` value, the
162+
result includes the log probabilities of the specified number of most
163+
likely tokens, as well as the chosen tokens. Note that the implementation
164+
follows the OpenAI API: The API will always return the log probability of
165+
the sampled token, so there may be up to `logprobs+1` elements in the
166+
response. When set to -1, return all `vocab_size` log probabilities."""
201167
prompt_logprobs: Optional[int] = None
168+
"""Number of log probabilities to return per prompt token."""
202169
# NOTE: This parameter is only exposed at the engine level for now.
203170
# It is not exposed in the OpenAI API server, as the OpenAI API does
204171
# not support returning only a list of token IDs.
205172
detokenize: bool = True
173+
"""Whether to detokenize the output."""
206174
skip_special_tokens: bool = True
175+
"""Whether to skip special tokens in the output."""
207176
spaces_between_special_tokens: bool = True
177+
"""Whether to add spaces between special tokens in the output."""
208178
# Optional[list[LogitsProcessor]] type. We use Any here because
209179
# Optional[list[LogitsProcessor]] type is not supported by msgspec.
210180
logits_processors: Optional[Any] = None
181+
"""Functions that modify logits based on previously generated tokens, and
182+
optionally prompt tokens as a first argument."""
211183
include_stop_str_in_output: bool = False
184+
"""Whether to include the stop strings in output text."""
212185
truncate_prompt_tokens: Optional[Annotated[int, msgspec.Meta(ge=1)]] = None
186+
"""If set to -1, will use the truncation size supported by the model. If
187+
set to an integer k, will use only the last k tokens from the prompt
188+
(i.e., left truncation). If set to `None`, truncation is disabled."""
213189
output_kind: RequestOutputKind = RequestOutputKind.CUMULATIVE
214190

215191
# The below fields are not supposed to be used as an input.
@@ -219,12 +195,24 @@ class SamplingParams(
219195

220196
# Fields used to construct logits processors
221197
guided_decoding: Optional[GuidedDecodingParams] = None
198+
"""If provided, the engine will construct a guided decoding logits
199+
processor from these parameters."""
222200
logit_bias: Optional[dict[int, float]] = None
201+
"""If provided, the engine will construct a logits processor that applies
202+
these logit biases."""
223203
allowed_token_ids: Optional[list[int]] = None
204+
"""If provided, the engine will construct a logits processor which only
205+
retains scores for the given token ids."""
224206
extra_args: Optional[dict[str, Any]] = None
207+
"""Arbitrary additional args, that can be used by custom sampling
208+
implementations, plugins, etc. Not used by any in-tree sampling
209+
implementations."""
225210

226211
# Fields used for bad words
227212
bad_words: Optional[list[str]] = None
213+
"""Words that are not allowed to be generated. More precisely, only the
214+
last token of a corresponding token sequence is not allowed when the next
215+
generated token can complete the sequence."""
228216
_bad_words_token_ids: Optional[list[list[int]]] = None
229217

230218
@staticmethod

0 commit comments

Comments
 (0)