Skip to content

Commit 8f16d98

Browse files
committed
Added documentation to the configuration guide
Signed-off-by: Shmuel Kallner <[email protected]>
1 parent a2f557b commit 8f16d98

File tree

2 files changed

+103
-21
lines changed

2 files changed

+103
-21
lines changed

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ nav:
7070
- InferencePool Rollout: guides/inferencepool-rollout.md
7171
- Metrics and Observability: guides/metrics-and-observability.md
7272
- Configuration Guide:
73-
- Configuring the plugins via configuration files or text: guides/epp-configuration/config-text.md
73+
- Configuring the EndPoint Picker via configuration files or text: guides/epp-configuration/config-text.md
7474
- Prefix Cache Aware Plugin: guides/epp-configuration/prefix-aware.md
7575
- Troubleshooting Guide: guides/troubleshooting.md
7676
- Implementer Guides:

site-src/guides/epp-configuration/config-text.md

Lines changed: 102 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,14 @@
1-
# Configuring Plugins via text
1+
# Configuring via text
22

3-
The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
4-
it is configured. The IGW can be configured in several ways, either by code or via text.
5-
6-
If configured by code either a set of predetermined environment variables must be used or one must
7-
fork the IGW and change code.
3+
The Inference Gateway (IGW) can be configured via a text based configuration.
84

9-
A simpler way to congigure the IGW is to use a text based configuration. This text is in YAML format
10-
and can either be in a file or specified in-line as a parameter. The configuration defines the set of
11-
plugins to be instantiated along with their parameters. Each plugin can also be given a name, enabling
12-
the same plugin type to be instantiated multiple times, if needed.
5+
At this time the text based configuration allows for:
136

14-
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request. If one is not defailed, a default one names `default` will be added and will reference all of the
15-
instantiated plugins.
16-
17-
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
18-
will be used for a particular request. A Profile Handler must be specified, unless the configuration only
19-
contains one profile, in which case the `SingleProfileHandler` will be used.
7+
1. The configuration of the lifecycle hooks (plugins) that are used by the IGW.
8+
2. The configuration of the saturation detector
9+
3. A set of feature gates that are used to enable experimental features.
2010

21-
In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
22-
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
23-
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
11+
The configuration text is in YAML format and can either be in a file or specified in-line as a parameter.
2412

2513
It should be noted that while the configuration text looks like a Kubernetes Custom Resource, it is
2614
**NOT** a Kubernetes Custom Resource. Kubernetes infrastructure is used to load the configuration
@@ -39,10 +27,49 @@ plugins:
3927
schedulingProfiles:
4028
- ....
4129
- ....
30+
saturationDetector:
31+
...
32+
featureGates:
33+
...
4234
```
4335
4436
The first two lines of the configuration are constant and must appear as is.
4537
38+
The plugins section defines the set of plugins that will be instantiated and their parameters. This section is described in more detail in the section [Configuring Plugins via text](#configuring-plugins-via-text)
39+
40+
The schedulingProfiles section defines the set of scheduling profiles that can be used in scheduling
41+
requests to pods. This section is described in more detail in the section [Configuring Plugins via text](#configuring-plugins-via-text)
42+
43+
The saturationDetector section configures the saturation detector, which is used to determine if special
44+
action needs to eb taken due to the system being overloaded or saturated. This section is described in more detail in the section [Saturation Detector configuration](#saturation-detector-configuration)
45+
46+
The featureGates sections allows the enablement of experimental features of the IGW. This section is
47+
described in more detail in the section [Feature Gates](#feature-gates)
48+
49+
## Configuring Plugins via text
50+
51+
The set of plugins that are used by the IGW is determined by how
52+
it is configured. The IGW can be configured in several ways, either by code or via text.
53+
54+
If configured by code either a set of predetermined environment variables must be used or one must
55+
fork the IGW and change code.
56+
57+
A simpler way to configure the IGW is to use a text based configuration. The configuration defines the
58+
set of plugins to be instantiated along with their parameters. Each plugin can also be given a name,
59+
enabling the same plugin type to be instantiated multiple times, if needed.
60+
61+
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling
62+
a request. If one is not defined, a default one names `default` will be added and will reference all of
63+
the instantiated plugins.
64+
65+
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
66+
will be used for a particular request. A Profile Handler must be specified, unless the configuration only
67+
contains one profile, in which case the `SingleProfileHandler` will be used.
68+
69+
In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
70+
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
71+
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
72+
4673
The plugins section defines the set of plugins that will be instantiated and their parameters.
4774
Each entry in this section has the following form:
4875

@@ -190,7 +217,7 @@ schedulingProfiles:
190217
-pluginRef: max-score-picker
191218
```
192219

193-
## Plugin Configuration
220+
### Plugin Configuration
194221

195222
This section describes how to setup the various plugins that are available with the IGW.
196223

@@ -266,3 +293,58 @@ scored higher (since it's more available to serve new request).
266293

267294
- *Type*: lora-affinity-scorer
268295
- *Parameters*: none
296+
297+
## Saturation Detector configuration
298+
299+
The Saturation Detector is used to determine if the the cluster is overloaded, i.e. saturated. When
300+
the cluster is saturated special actions will be taken depending what has been enabled. At this time, sheddable requests will be dropped.
301+
302+
The Saturation Detector determines that the cluster is saturated by looking at the following metrics provided by the inference servers:
303+
304+
- Backed waiting queue size
305+
- KV cache utilization
306+
- Metrics staleness
307+
308+
The Saturation Detector is configured via the saturationDetector section of the overall configuration.
309+
It has the following form:
310+
311+
```yaml
312+
saturationDetector:
313+
queueDepthThreshold: 8
314+
kvCacheUtilThreshold: 0.75
315+
metricsStalenessThreshold: 150ms
316+
```
317+
318+
The various sub-fields of the saturationDetector section are:
319+
320+
- The `queueDepthThreshold` field which defines the backend waiting queue size above which a
321+
pod is considered to have insufficient capacity for new requests. This field is optional, if
322+
omitted a value of `5` will be used.
323+
- The `kvCacheUtilThreshold` field which defines the KV cache utilization (0.0 to 1.0) above
324+
which a pod is considered to have insufficient capacity. This field is optional, if omitted
325+
a value of `0.8` will be used.
326+
- The `metricsStalenessThreshold` field which defines how old a pod's metrics can be. If a pod's
327+
metrics are older than this, it might be excluded from "good capacity" considerations or treated
328+
as having no capacity for safety. This field is optional, if omitted a value of `200ms` will be used.
329+
330+
## Feature Gates
331+
332+
The Feature Gates section allows for the enabling of experimental features of the IGW. These experimental
333+
features are all disabled unless you explicitly enable them one by one.
334+
335+
The Feature Gates section has the follwoing form:
336+
337+
```yaml
338+
featureGates:
339+
enableDataLayer: true
340+
enableFlowControl: false
341+
```
342+
343+
Each sub-field of the Feature Gates section enables one experimental feature. The sub-fields are:
344+
345+
- `enableDataLayer` which, if present and has a value of true, enables the experimental Datalayer APIs.
346+
- `enableFlowControl` which, if present and has a value of true, enables the experimental FlowControl
347+
feature.
348+
349+
In all cases if the sub-field isn't present or has a value of false, that experimental feature will
350+
be disabled.

0 commit comments

Comments
 (0)