Skip to content

Commit be54a95

Browse files
authored
[Docs] Fix hardcoded links in docs (#21287)
Signed-off-by: Harry Mellor <[email protected]>
1 parent 042af0c commit be54a95

File tree

5 files changed

+6
-7
lines changed

5 files changed

+6
-7
lines changed

docs/design/v1/metrics.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..
6161

6262
### Grafana Dashboard
6363

64-
vLLM also provides [a reference example](https://docs.vllm.ai/en/stable/examples/online_serving/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
64+
vLLM also provides [a reference example](../../examples/online_serving/prometheus_grafana.md) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
6565

6666
The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
6767

@@ -672,8 +672,7 @@ v0 has support for OpenTelemetry tracing:
672672
`--collect-detailed-traces`
673673
- [OpenTelemetry blog
674674
post](https://opentelemetry.io/blog/2024/llm-observability/)
675-
- [User-facing
676-
docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
675+
- [User-facing docs](../../examples/online_serving/opentelemetry.md)
677676
- [Blog
678677
post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
679678
- [IBM product

docs/features/multimodal_inputs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ To substitute multiple images inside the same text prompt, you can pass in a lis
9898

9999
Full example: <gh-file:examples/offline_inference/vision_language_multi_image.py>
100100

101-
If using the [LLM.chat](https://docs.vllm.ai/en/stable/models/generative_models.html#llmchat) method, you can pass images directly in the message content using various formats: image URLs, PIL Image objects, or pre-computed embeddings:
101+
If using the [LLM.chat](../models/generative_models.md#llmchat) method, you can pass images directly in the message content using various formats: image URLs, PIL Image objects, or pre-computed embeddings:
102102

103103
```python
104104
from vllm import LLM

docs/features/quantization/bitblas.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more effic
55
!!! note
66
Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
77
Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
8-
For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
8+
For details see [supported hardware](supported_hardware.md).
99

1010
Below are the steps to utilize BitBLAS with vLLM.
1111

docs/features/tool_calling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ specify the `name` of one of the tools in the `tool_choice` parameter of the cha
9595

9696
## Required Function Calling
9797

98-
vLLM supports the `tool_choice='required'` option in the chat completion API. Similar to the named function calling, it also uses guided decoding, so this is enabled by default and will work with any supported model. The required guided decoding features (JSON schema with `anyOf`) are currently only supported in the V0 engine with the guided decoding backend `outlines`. However, support for alternative decoding backends are on the [roadmap](https://docs.vllm.ai/en/latest/usage/v1_guide.html#feature-model) for the V1 engine.
98+
vLLM supports the `tool_choice='required'` option in the chat completion API. Similar to the named function calling, it also uses guided decoding, so this is enabled by default and will work with any supported model. The required guided decoding features (JSON schema with `anyOf`) are currently only supported in the V0 engine with the guided decoding backend `outlines`. However, support for alternative decoding backends are on the [roadmap](../usage/v1_guide.md#features) for the V1 engine.
9999

100100
When tool_choice='required' is set, the model is guaranteed to generate one or more tool calls based on the specified tool list in the `tools` parameter. The number of tool calls depends on the user's query. The output format strictly follows the schema defined in the `tools` parameter.
101101

docs/models/extensions/tensorizer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor
77

88
For more information on CoreWeave's Tensorizer, please refer to
99
[CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
10-
the [vLLM example script](https://docs.vllm.ai/en/latest/examples/others/tensorize_vllm_model.html).
10+
the [vLLM example script](../../examples/others/tensorize_vllm_model.md).
1111

1212
!!! note
1313
Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.

0 commit comments

Comments
 (0)