kubernetes-sigs · letmerecall · Nov 12, 2025 · Nov 12, 2025
diff --git a/site-src/concepts/roles-and-personas.md b/site-src/concepts/roles-and-personas.md
@@ -6,21 +6,21 @@ Before diving into the details of the API, descriptions of the personas these AP
 
 The Inference Platform Admin creates and manages the infrastructure necessary to run LLM workloads, including handling Ops for:
 
-  - Hardware
-  - Model Server
-  - Base Model
-  - Resource Allocation for Workloads
-  - Gateway configuration
-  - etc
+- Hardware
+- Model Server
+- Base Model
+- Resource Allocation for Workloads
+- Gateway configuration
+- etc
 
 ## Inference Workload Owner
 
 An Inference Workload Owner persona owns and manages one or many Generative AI Workloads (LLM focused *currently*). This includes:
 
 - Defining priority
 - Managing fine-tunes
-  - LoRA Adapters
-  - System Prompts
-  - Prompt Cache
-  - etc.
+    - LoRA Adapters
+    - System Prompts
+    - Prompt Cache
+    - etc.
 - Managing rollout of adapters
diff --git a/site-src/guides/epp-configuration/config-text.md b/site-src/guides/epp-configuration/config-text.md
@@ -74,9 +74,9 @@ The fields in a schedulingProfile entry are:
 - *name* specifies the scheduling profile's name.
 - *plugins* specifies the set of plugins to be used when this scheduling profile is chosen for a request.
 Each entry in the schedulingProfile's plugins section has the following fields:
-  - *pluginRef* is a reference to the name of the plugin instance to be used
-  - *weight* is the weight to be used if the referenced plugin is a scorer. If omitted, a weight of one
-    will be used.
+    - *pluginRef* is a reference to the name of the plugin instance to be used
+    - *weight* is the weight to be used if the referenced plugin is a scorer. If omitted, a weight of one
+      will be used.
 
 A complete configuration might look like this:
 ```yaml
@@ -201,12 +201,12 @@ Scores pods based on the amount of the prompt is believed to be in the pod's KvC
 
 - *Type*: prefix-cache-scorer
 - *Parameters*:
-  - `blockSize` specified the size of the blocks to break up the input prompt when
-    calculating the block hashes. If not specified defaults to `64`
-  - `maxPrefixBlocksToMatch` specifies the maximum number of prefix blocks to match. If
-   not specified defaults to `256`
-  - `lruCapacityPerServer` specifies the capacity of the LRU indexer in number of entries
-    per server (pod). If not specified defaults to `31250`
+    - `blockSize` specified the size of the blocks to break up the input prompt when
+      calculating the block hashes. If not specified defaults to `64`
+    - `maxPrefixBlocksToMatch` specifies the maximum number of prefix blocks to match. If
+      not specified defaults to `256`
+    - `lruCapacityPerServer` specifies the capacity of the LRU indexer in number of entries
+      per server (pod). If not specified defaults to `31250`
 
 #### **LoRAAffinityScorer**
 
@@ -222,27 +222,27 @@ Picks the pod with the maximum score from the list of candidates. This is the de
 if not specified.
 
 - *Type*: max-score-picker
-- *Parameters*: 
-  - `maxNumOfEndpoints`: Maximum number of endpoints to pick from the list of candidates, based on
-    the scores of those endpoints. If not specified defaults to `1`.
+- *Parameters*:
+    - `maxNumOfEndpoints`: Maximum number of endpoints to pick from the list of candidates, based on
+      the scores of those endpoints. If not specified defaults to `1`.
 
 #### **RandomPicker**
 
 Picks a random pod from the list of candidates.
 
 - *Type*: random-picker
-- *Parameters*: 
-  - `maxNumOfEndpoints`: Maximum number of endpoints to pick from the list of candidates. If not
-    specified defaults to `1`.
+- *Parameters*:
+    - `maxNumOfEndpoints`: Maximum number of endpoints to pick from the list of candidates. If not
+      specified defaults to `1`.
 
 #### **WeightedRandomPicker**
 
 Picks pod(s) from the list of candidates based on weighted random sampling using A-Res algorithm.
 
 - *Type*: weighted-random-picker
 - *Parameters*:
-  - `maxNumOfEndpoints`: Maximum number of endpoints to pick from the list of candidates. If not
-    specified defaults to `1`.
+    - `maxNumOfEndpoints`: Maximum number of endpoints to pick from the list of candidates. If not
+      specified defaults to `1`.
 
 #### **KvCacheScorer**
 

diff --git a/site-src/implementations/gateways.md b/site-src/implementations/gateways.md
@@ -3,12 +3,12 @@
 This project has several implementations that are planned or in progress:
 
 - [Gateway Implementations](#gateway-implementations)
-  - [Alibaba Cloud Container Service for Kubernetes](#alibaba-cloud-container-service-for-kubernetes)
-  - [Envoy AI Gateway](#envoy-ai-gateway)
-  - [Google Kubernetes Engine](#google-kubernetes-engine)
-  - [Istio](#istio)
-  - [Kgateway](#kgateway)
-  - [Kubvernor](#kubvernor)
+    - [Alibaba Cloud Container Service for Kubernetes](#alibaba-cloud-container-service-for-kubernetes)
+    - [Envoy AI Gateway](#envoy-ai-gateway)
+    - [Google Kubernetes Engine](#google-kubernetes-engine)
+    - [Istio](#istio)
+    - [Kgateway](#kgateway)
+    - [Kubvernor](#kubvernor)
 
 [1]:#alibaba-cloud-container-service-for-kubernetes
 [2]:#envoy-ai-gateway

diff --git a/site-src/performance/regression-testing/index.md b/site-src/performance/regression-testing/index.md
@@ -60,9 +60,9 @@ Refer to example manifest:
 - **Model:** [Llama 3 (8B)](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 - **LoRA Adapters:** 15 adapters (`nvidia/llama-3.1-nemoguard-8b-topic-control`, rank 8, critical)
 - **Traffic Distribution:**  
-  - 60 % on first 5 adapters (12 % each)  
-  - 30 % on next 5 adapters (6 % each)  
-  - 10 % on last 5 adapters (2 % each)  
+    - 60 % on first 5 adapters (12 % each)  
+    - 30 % on next 5 adapters (6 % each)  
+    - 10 % on last 5 adapters (2 % each)  
 - **Max LoRA:** 3
 - **Replicas:** 10 (vLLM)
 - **Request Rates:** 20–200 QPS (increments of 20)
@@ -99,8 +99,8 @@ Use the provided Jupyter notebook (`./tools/benchmark/benchmark.ipynb`) to analy
 - Update benchmark IDs to `regression-before` and `regression-after`.
 - Compare latency and throughput metrics, performing regression analysis.
 - Check R² values specifically:
-  - **Prompts Attempted/Succeeded:** Expect R² ≈ 1
-  - **Output Tokens per Minute, P90 per Output Token Latency, P90 Latency:** Expect R² close to 1 (allow minor variance).
+    - **Prompts Attempted/Succeeded:** Expect R² ≈ 1
+    - **Output Tokens per Minute, P90 per Output Token Latency, P90 Latency:** Expect R² close to 1 (allow minor variance).
 
 Identify significant deviations, investigate causes, and confirm performance meets expected standards.
 

diff --git a/site-src/reference/spec.md b/site-src/reference/spec.md
@@ -32,11 +32,13 @@ Invalid values include:
 * "foo.example.com" - must include path
 
 _Validation:_
+
 - MaxLength: 253
 - MinLength: 1
 - Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*\/[A-Za-z0-9\/\-._~%!$&'()*+,;=:]+$`
 
 _Appears in:_
+
 - [ParentStatus](#parentstatus)
 
 
@@ -49,9 +51,11 @@ EndpointPickerFailureMode defines the options for how the parent handles the cas
 Endpoint Picker extension is non-responsive.
 
 _Validation:_
+
 - Enum: [FailOpen FailClose]
 
 _Appears in:_
+
 - [EndpointPickerRef](#endpointpickerref)
 
 | Field | Description |
@@ -70,6 +74,7 @@ associated configuration.
 
 
 _Appears in:_
+
 - [InferencePoolSpec](#inferencepoolspec)
 
 | Field | Description | Default | Validation |
@@ -102,11 +107,13 @@ Invalid values include:
 * "example.com/bar" - "/" is an invalid character
 
 _Validation:_
+
 - MaxLength: 253
 - MinLength: 0
 - Pattern: `^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
 
 _Appears in:_
+
 - [EndpointPickerRef](#endpointpickerref)
 - [ParentReference](#parentreference)
 
@@ -145,6 +152,7 @@ InferencePoolSpec defines the desired state of the InferencePool.
 
 
 _Appears in:_
+
 - [InferencePool](#inferencepool)
 
 | Field | Description | Default | Validation |
@@ -163,6 +171,7 @@ InferencePoolStatus defines the observed state of the InferencePool.
 
 
 _Appears in:_
+
 - [InferencePool](#inferencepool)
 
 | Field | Description | Default | Validation |
@@ -186,11 +195,13 @@ Invalid values include:
 * "invalid/kind" - "/" is an invalid character
 
 _Validation:_
+
 - MaxLength: 63
 - MinLength: 1
 - Pattern: `^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$`
 
 _Appears in:_
+
 - [EndpointPickerRef](#endpointpickerref)
 - [ParentReference](#parentreference)
 
@@ -220,11 +231,13 @@ Invalid values include:
 * example.com. - can not start or end with "."
 
 _Validation:_
+
 - MaxLength: 253
 - MinLength: 1
 - Pattern: `^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$`
 
 _Appears in:_
+
 - [LabelSelector](#labelselector)
 
 
@@ -239,6 +252,7 @@ This simplified version uses only the matchLabels field.
 
 
 _Appears in:_
+
 - [InferencePoolSpec](#inferencepoolspec)
 
 | Field | Description | Default | Validation |
@@ -263,11 +277,13 @@ Valid values include:
 * 123-my-value
 
 _Validation:_
+
 - MaxLength: 63
 - MinLength: 0
 - Pattern: `^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$`
 
 _Appears in:_
+
 - [LabelSelector](#labelselector)
 
 
@@ -293,11 +309,13 @@ Invalid values include:
 * "example.com" - "." is an invalid character
 
 _Validation:_
+
 - MaxLength: 63
 - MinLength: 1
 - Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
 
 _Appears in:_
+
 - [ParentReference](#parentreference)
 
 
@@ -311,10 +329,12 @@ Object names can have a variety of forms, including RFC 1123 subdomains,
 RFC 1123 labels, or RFC 1035 labels.
 
 _Validation:_
+
 - MaxLength: 253
 - MinLength: 1
 
 _Appears in:_
+
 - [EndpointPickerRef](#endpointpickerref)
 - [ParentReference](#parentreference)
 
@@ -330,6 +350,7 @@ parent resource, such as a Gateway.
 
 
 _Appears in:_
+
 - [ParentStatus](#parentstatus)
 
 | Field | Description | Default | Validation |
@@ -349,6 +370,7 @@ ParentStatus defines the observed state of InferencePool from a Parent, i.e. Gat
 
 
 _Appears in:_
+
 - [InferencePoolStatus](#inferencepoolstatus)
 
 | Field | Description | Default | Validation |
@@ -367,6 +389,7 @@ Port defines the network port that will be exposed by this InferencePool.
 
 
 _Appears in:_
+
 - [EndpointPickerRef](#endpointpickerref)
 - [InferencePoolSpec](#inferencepoolspec)
 
@@ -382,11 +405,10 @@ _Underlying type:_ _integer_
 PortNumber defines a network port.
 
 _Validation:_
+
 - Maximum: 65535
 - Minimum: 1
 
 _Appears in:_
-- [Port](#port)
-
-
 
+- [Port](#port)
diff --git a/site-src/reference/x-v1a1-spec.md b/site-src/reference/x-v1a1-spec.md
@@ -22,10 +22,12 @@ _Underlying type:_ _string_
 ClusterName is the name of a cluster that exported the InferencePool.
 
 _Validation:_
+
 - MaxLength: 253
 - MinLength: 1
 
 _Appears in:_
+
 - [ExportingCluster](#exportingcluster)
 
 
@@ -38,19 +40,21 @@ ControllerName is the name of a controller that manages a resource. It must be a
 
 Valid values include:
 
-  - "example.com/bar"
+- "example.com/bar"
 
 Invalid values include:
 
-  - "example.com" - must include path
-  - "foo.example.com" - must include path
+- "example.com" - must include path
+- "foo.example.com" - must include path
 
 _Validation:_
+
 - MaxLength: 253
 - MinLength: 1
 - Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*\/[A-Za-z0-9\/\-._~%!$&'()*+,;=:]+$`
 
 _Appears in:_
+
 - [ImportController](#importcontroller)
 
 
@@ -64,6 +68,7 @@ ExportingCluster defines a cluster that exported the InferencePool that backs th
 
 
 _Appears in:_
+
 - [ImportController](#importcontroller)
 
 | Field | Description | Default | Validation |
@@ -80,6 +85,7 @@ ImportController defines a controller that is responsible for managing the Infer
 
 
 _Appears in:_
+
 - [InferencePoolImportStatus](#inferencepoolimportstatus)
 
 | Field | Description | Default | Validation |
@@ -117,10 +123,9 @@ InferencePoolImportStatus defines the observed state of the InferencePoolImport.
 
 
 _Appears in:_
+
 - [InferencePoolImport](#inferencepoolimport)
 
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
 | `controllers` _[ImportController](#importcontroller) array_ | Controllers is a list of controllers that are responsible for managing the InferencePoolImport. |  | MaxItems: 8 <br />Required: \{\} <br /> |
-
-