[Docs] Fix hardcoded links in docs (#21287)

hmellor · web-flow · commit be54a951a3bd · 2025-07-21T02:23:57.000-07:00
Signed-off-by: Harry Mellor &lt;19981378+hmellor@users.noreply.github.com&gt;
diff --git a/docs/design/v1/metrics.md b/docs/design/v1/metrics.md
@@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..
 
 ### Grafana Dashboard
 
-vLLM also provides [a reference example](https://docs.vllm.ai/en/stable/examples/online_serving/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
+vLLM also provides [a reference example](../../examples/online_serving/prometheus_grafana.md) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
 
 The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
 
@@ -672,8 +672,7 @@ v0 has support for OpenTelemetry tracing:
   `--collect-detailed-traces`
 - [OpenTelemetry blog
   post](https://opentelemetry.io/blog/2024/llm-observability/)
-- [User-facing
-  docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
+- [User-facing docs](../../examples/online_serving/opentelemetry.md)
 - [Blog
   post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
 - [IBM product
diff --git a/docs/features/multimodal_inputs.md b/docs/features/multimodal_inputs.md
@@ -98,7 +98,7 @@ To substitute multiple images inside the same text prompt, you can pass in a lis
 
 Full example: <gh-file:examples/offline_inference/vision_language_multi_image.py>
 
-If using the [LLM.chat](https://docs.vllm.ai/en/stable/models/generative_models.html#llmchat) method, you can pass images directly in the message content using various formats: image URLs, PIL Image objects, or pre-computed embeddings:
+If using the [LLM.chat](../models/generative_models.md#llmchat) method, you can pass images directly in the message content using various formats: image URLs, PIL Image objects, or pre-computed embeddings:
 
 ```python
 from vllm import LLM
diff --git a/docs/features/quantization/bitblas.md b/docs/features/quantization/bitblas.md
@@ -5,7 +5,7 @@ vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more effic
 !!! note
     Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
     Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
-    For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
+    For details see [supported hardware](supported_hardware.md).
 
 Below are the steps to utilize BitBLAS with vLLM.
 
diff --git a/docs/features/tool_calling.md b/docs/features/tool_calling.md
@@ -95,7 +95,7 @@ specify the `name` of one of the tools in the `tool_choice` parameter of the cha
 
 ## Required Function Calling
 
-vLLM supports the `tool_choice='required'` option in the chat completion API. Similar to the named function calling, it also uses guided decoding, so this is enabled by default and will work with any supported model. The required guided decoding features (JSON schema with `anyOf`) are currently only supported in the V0 engine with the guided decoding backend `outlines`. However, support for alternative decoding backends are on the [roadmap](https://docs.vllm.ai/en/latest/usage/v1_guide.html#feature-model) for the V1 engine.
+vLLM supports the `tool_choice='required'` option in the chat completion API. Similar to the named function calling, it also uses guided decoding, so this is enabled by default and will work with any supported model. The required guided decoding features (JSON schema with `anyOf`) are currently only supported in the V0 engine with the guided decoding backend `outlines`. However, support for alternative decoding backends are on the [roadmap](../usage/v1_guide.md#features) for the V1 engine.
 
 When tool_choice='required' is set, the model is guaranteed to generate one or more tool calls based on the specified tool list in the `tools` parameter. The number of tool calls depends on the user's query. The output format strictly follows the schema defined in the `tools` parameter.
 
diff --git a/docs/models/extensions/tensorizer.md b/docs/models/extensions/tensorizer.md
@@ -7,7 +7,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor
 
 For more information on CoreWeave's Tensorizer, please refer to
 [CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
-the [vLLM example script](https://docs.vllm.ai/en/latest/examples/others/tensorize_vllm_model.html).
+the [vLLM example script](../../examples/others/tensorize_vllm_model.md).
 
 !!! note
     Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.