NVIDIA-NeMo · Kipok · Jan 5, 2026 · Jan 5, 2026
diff --git a/docs/evaluation/index.md b/docs/evaluation/index.md
@@ -9,7 +9,7 @@ We support many popular benchmarks and it's easy to add new in the future. The f
 - [**Instruction following**](./instruction-following.md): e.g. [ifbench](./instruction-following.md#ifbench), [ifeval](./instruction-following.md#ifeval)
 - [**Long-context**](./long-context.md): e.g. [ruler](./long-context.md#ruler), [mrcr](./long-context.md#mrcr)
 - [**Tool-calling**](./tool-calling.md): e.g. [bfcl_v3](./tool-calling.md#bfcl_v3)
-- [**Multilingual**](./multilingual.md): e.g. [mmlu-prox](./multilingual.md#mmlu-prox), [flores-200](./multilingual.md#FLORES-200), [wmt24pp](./multilingual.md#wmt24pp)
+- [**Multilingual**](./multilingual.md): e.g. [mmlu-prox](./multilingual.md#mmlu-prox), [flores-200](./multilingual.md#flores-200), [wmt24pp](./multilingual.md#wmt24pp)
 - [**Speech & Audio**](./speech-audio.md): e.g. [asr-leaderboard](./speech-audio.md#asr-leaderboard), [mmau-pro](./speech-audio.md#mmau-pro)
 
 See [nemo_skills/dataset](https://github.com/NVIDIA-NeMo/Skills/blob/main/nemo_skills/dataset) where each folder is a benchmark we support.
@@ -177,7 +177,7 @@ code execution timeout for scicode benchmark
 !!! tip "Passing Main Arguments with Config Files"
 
     For parameters that are difficult to escape on the command line (like `end_reasoning_string='</think>'`),
-    you can use YAML config files instead. See [Passing Main Arguments with Config Files](../pipelines/index.md###passing-main-arguments-with-config-files) for details.
+    you can use YAML config files instead. See [Passing Main Arguments with Config Files](../pipelines/index.md#passing-main-arguments-with-config-files) for details.
 
 
 ## Using data on cluster

diff --git a/docs/index.md b/docs/index.md
@@ -21,7 +21,7 @@ Here are some of the features we support:
         - [**Instruction following**](./evaluation/instruction-following.md): e.g. [ifbench](./evaluation/instruction-following.md#ifbench), [ifeval](./evaluation/instruction-following.md#ifeval)
         - [**Long-context**](./evaluation/long-context.md): e.g. [ruler](./evaluation/long-context.md#ruler), [mrcr](./evaluation/long-context.md#mrcr)
         - [**Tool-calling**](./evaluation/tool-calling.md): e.g. [bfcl_v3](./evaluation/tool-calling.md#bfcl_v3)
-        - [**Multilingual capabilities**](./evaluation/multilingual.md): e.g. [mmlu-prox](./evaluation/multilingual.md#mmlu-prox), [flores-200](./evaluation/multilingual.md#FLORES-200), [wmt24pp](./evaluation/multilingual.md#wmt24pp)
+        - [**Multilingual capabilities**](./evaluation/multilingual.md): e.g. [mmlu-prox](./evaluation/multilingual.md#mmlu-prox), [flores-200](./evaluation/multilingual.md#flores-200), [wmt24pp](./evaluation/multilingual.md#wmt24pp)
         - [**Speech & Audio**](./evaluation/speech-audio.md): e.g. [asr-leaderboard](./evaluation/speech-audio.md#asr-leaderboard), [mmau-pro](./evaluation/speech-audio.md#mmau-pro)
         - [**Robustness evaluation**](./evaluation/robustness.md): Evaluate model sensitvity against changes in prompt.
     - Easily parallelize each evaluation across many Slurm jobs, self-host LLM judges, bring your own prompts or change benchmark configuration in any other way.
@@ -36,4 +36,3 @@ You can find more examples of how to use Nemo-Skills in the [tutorials](./tutori
 We've built and released many popular models and datasets using Nemo-Skills. See all of them in the [Papers & Releases](./releases/index.md) documentation.
 
 We support many popular benchmarks and it's easy to add new in the future. The following categories of benchmarks are supported
-
diff --git a/docs/pipelines/generation.md b/docs/pipelines/generation.md
@@ -98,7 +98,7 @@ See [nemo_skills/inference/generate.py](https://github.com/NVIDIA-NeMo/Skills/bl
 !!! tip "Passing Main Arguments with Config Files"
 
     For parameters that are difficult to escape on the command line (like `end_reasoning_string='</think>'`),
-    you can use YAML config files instead. See [Passing Main Arguments with Config Files](index.md###passing-main-arguments-with-config-files) for details.
+    you can use YAML config files instead. See [Passing Main Arguments with Config Files](index.md#passing-main-arguments-with-config-files) for details.
 
 
 ## Sampling multiple generations
@@ -470,5 +470,3 @@ We support three methods for automatic trimming of generation budget or context:
         ++server.enable_soft_fail=True
         ++server.context_limit_retry_strategy=reduce_prompt_from_end
     ```
-
-
diff --git a/docs/pipelines/start-server.md b/docs/pipelines/start-server.md
@@ -64,7 +64,7 @@ Similarly, the local port for the sandbox server can be changed using `--sandbox
 
 ## Using the Server
 
-To use this started server in [Evaluation](/Skills/pipelines/evaluation/) or [Generation](/Skills/pipelines/generation/),
+To use this started server in [Evaluation](evaluation.md) or [Generation](generation.md),
 all the model-related arguments can now be replaced with `--server_type=openai` and `server_address` arguments.
 
 For instance, for the vLLM model server above, the `eval` pipeline arguments can be modified as,

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,3 +1,4 @@
+strict: true
 site_name: Nemo-Skills
 site_url: https://nvidia-nemo.github.io/Skills
 extra_css: