Added documentation to the configuration guide

shmuelk · shmuelk · commit 8f16d985538b · 2025-08-28T13:16:09.000+03:00
Signed-off-by: Shmuel Kallner &lt;kallner@il.ibm.com&gt;
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -70,7 +70,7 @@ nav:
         - InferencePool Rollout: guides/inferencepool-rollout.md
       - Metrics and Observability: guides/metrics-and-observability.md
       - Configuration Guide:
-          - Configuring the plugins via configuration files or text: guides/epp-configuration/config-text.md    
+          - Configuring the EndPoint Picker via configuration files or text: guides/epp-configuration/config-text.md    
           - Prefix Cache Aware Plugin: guides/epp-configuration/prefix-aware.md
       - Troubleshooting Guide: guides/troubleshooting.md
     - Implementer Guides:
diff --git a/site-src/guides/epp-configuration/config-text.md b/site-src/guides/epp-configuration/config-text.md
@@ -1,26 +1,14 @@
-# Configuring Plugins via text
+# Configuring via text
 
-The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
-it is configured. The IGW can be configured in several ways, either by code or via text.
-
-If configured by code either a set of predetermined environment variables must be used or one must
-fork the IGW and change code.
+The Inference Gateway (IGW) can be configured via a text based configuration.
 
-A simpler way to congigure the IGW is to use a text based configuration. This text is in YAML format
-and can either be in a file or specified in-line as a parameter. The configuration defines the set of
-plugins to be instantiated along with their parameters. Each plugin can also be given a name, enabling
-the same plugin type to be instantiated multiple times, if needed.
+At this time the text based configuration allows for:
 
-Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request. If one is not defailed, a default one names `default` will be added and will reference all of the
-instantiated plugins.
-
-The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
-will be used for a particular request. A Profile Handler must be specified, unless the configuration only
-contains one profile, in which case the `SingleProfileHandler` will be used.
+1. The configuration of the lifecycle hooks (plugins) that are used by the IGW.
+2. The configuration of the saturation detector
+3. A set of feature gates that are used to enable experimental features.
 
-In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
-the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
-instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
+The configuration text is in YAML format and can either be in a file or specified in-line as a parameter.
 
 It should be noted that while the configuration text looks like a Kubernetes Custom Resource, it is
 **NOT** a Kubernetes Custom Resource. Kubernetes infrastructure is used to load the configuration
@@ -39,10 +27,49 @@ plugins:
 schedulingProfiles:
 - ....
 - ....
+saturationDetector:
+  ...
+featureGates:
+  ...
 ```
 
 The first two lines of the configuration are constant and must appear as is.
 
+The plugins section defines the set of plugins that will be instantiated and their parameters. This section is described in more detail in the section [Configuring Plugins via text](#configuring-plugins-via-text)
+
+The schedulingProfiles section defines the set of scheduling profiles that can be used in scheduling
+requests to pods. This section is described in more detail in the section [Configuring Plugins via text](#configuring-plugins-via-text)
+
+The saturationDetector section configures the saturation detector, which is used to determine if special
+action needs to eb taken due to the system being overloaded or saturated. This section is described in more detail in the section [Saturation Detector configuration](#saturation-detector-configuration)
+
+The featureGates sections allows the enablement of experimental features of the IGW. This section is
+described in more detail in the section [Feature Gates](#feature-gates)
+
+## Configuring Plugins via text
+
+The set of plugins that are used by the IGW is determined by how
+it is configured. The IGW can be configured in several ways, either by code or via text.
+
+If configured by code either a set of predetermined environment variables must be used or one must
+fork the IGW and change code.
+
+A simpler way to configure the IGW is to use a text based configuration. The configuration defines the
+set of plugins to be instantiated along with their parameters. Each plugin can also be given a name,
+enabling the same plugin type to be instantiated multiple times, if needed.
+
+Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling
+a request. If one is not defined, a default one names `default` will be added and will reference all of
+the instantiated plugins.
+
+The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
+will be used for a particular request. A Profile Handler must be specified, unless the configuration only
+contains one profile, in which case the `SingleProfileHandler` will be used.
+
+In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
+the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
+instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
+
 The plugins section defines the set of plugins that will be instantiated and their parameters.
 Each entry in this section has the following form:
 
@@ -190,7 +217,7 @@ schedulingProfiles:
   -pluginRef: max-score-picker
 ```
 
-## Plugin Configuration
+### Plugin Configuration
 
 This section describes how to setup the various plugins that are available with the IGW.
 
@@ -266,3 +293,58 @@ scored higher (since it's more available to serve new request).
 
 - *Type*: lora-affinity-scorer
 - *Parameters*: none
+
+## Saturation Detector configuration
+
+The Saturation Detector is used to determine if the the cluster is overloaded, i.e. saturated. When
+the cluster is saturated special actions will be taken depending what has been enabled. At this time, sheddable requests will be dropped.
+
+The Saturation Detector determines that the cluster is saturated by looking at the following metrics provided by the inference servers:
+
+- Backed waiting queue size
+- KV cache utilization
+- Metrics staleness
+
+The Saturation Detector is configured via the saturationDetector section of the overall configuration.
+It has the following form:
+
+```yaml
+saturationDetector:
+  queueDepthThreshold: 8
+  kvCacheUtilThreshold: 0.75
+  metricsStalenessThreshold: 150ms
+```
+
+The various sub-fields of the saturationDetector section are:
+
+- The `queueDepthThreshold` field which defines the backend waiting queue size above which a
+pod is considered to have insufficient capacity for new requests. This field is optional, if
+omitted a value of `5` will be used.
+- The `kvCacheUtilThreshold` field which defines the KV cache utilization (0.0 to 1.0) above
+which a pod is considered to have insufficient capacity. This field is optional, if omitted
+a value of `0.8` will be used.
+- The `metricsStalenessThreshold` field which defines how old a pod's metrics can be. If a pod's
+metrics are older than this, it might be excluded from "good capacity" considerations or treated
+as having no capacity for safety. This field is optional, if omitted a value of `200ms` will be used.
+
+## Feature Gates
+
+The Feature Gates section allows for the enabling of experimental features of the IGW. These experimental
+features are all disabled unless you explicitly enable them one by one.
+
+The Feature Gates section has the follwoing form:
+
+```yaml
+featureGates:
+  enableDataLayer: true
+  enableFlowControl: false
+```
+
+Each sub-field of the Feature Gates section enables one experimental feature. The sub-fields are:
+
+- `enableDataLayer` which, if present and has a value of true, enables the experimental Datalayer APIs.
+- `enableFlowControl` which, if present and has a value of true, enables the experimental FlowControl
+feature.
+
+In all cases if the sub-field isn't present or has a value of false, that experimental feature will
+be disabled.