diff --git a/docs/scheduler-flowchart.png b/docs/scheduler-flowchart.png deleted file mode 100644 index 4459ef1ca..000000000 Binary files a/docs/scheduler-flowchart.png and /dev/null differ diff --git a/pkg/epp/README.md b/pkg/epp/README.md index df5c21375..966aed5f2 100644 --- a/pkg/epp/README.md +++ b/pkg/epp/README.md @@ -20,9 +20,4 @@ An EPP instance handles a single `InferencePool` (and so for each `InferencePool - The EPP generates metrics to enhance observability. - It reports InferenceObjective-level metrics, further broken down by target model. - Detailed information regarding metrics can be found on the [website](https://gateway-api-inference-extension.sigs.k8s.io/guides/metrics/). - - -## Scheduling Algorithm -The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request. The following flow chart summarizes the current scheduling algorithm - -Scheduling Algorithm + \ No newline at end of file