Commit db08577
Experimental SLO-Aware Routing and Latency Prediction (#1568)
* add latency predictor
* add cv in model and update epp deployment
* bug fix
* track mape for predictions
* add running queue size to metrics
* add xgboost regressor and update tpot sampling logic
* emit predicted and actual ttft tpot in body
* seperate servers for training and prediction
* add latency predictor
put the predictor functions in director in a helper function
add scores to reqcxt
record prediction duration metrics
add prefix cache score to model input
slo based routing changes
retreive request priority queue from the datastore
update scoring logic
* better inital implemenation
Add scheduling profile, working state
remove latencypredictor from director
Move all latency prediction logic out of director and into scheduling profile. Make all Request/Response plugins take in RequestContext
* progress towards fixing up merge conflicts from latency predictor merge
* More refactor progress, fixing and adding tests
* working state, latency prediction
* Clean up changes, remove unneeded files, working functionality without latency flag and scheduling plugins
* Rebase cleanup, remove duplicate lines
* Integrate new alpha-beta slo scoring into scoring plugin
* Fix prefix cache scoring for slo-aware routing
* Add pycache or latency predictor to gitignore
* Rebase with main
* Fix prefix cache scoring being piped to latencyprediction_helper
* add dependancies in scorer
* chage to single profile
* chage to single profile
* restore two profiles
* restore two profiles
* restore two profiles
* update admit request to shed based on predictions
* add TODOs for future changes
* Change artifact registry references to personal compiled images
* Fix existing non-slo aware routing unit tests
* update latency predictor with better eval metrics
* Fix saturation detector unit test
* Change naming of SLO headers and prediction based routing header
* Remove port 9002 service on InferencePool causing make test to fail
* Fix epp hermetic integration test to expect ProcessingMode Send in response header
---------
Co-authored-by: kaushikmitr <[email protected]>1 parent 8b154ba commit db08577
File tree
70 files changed
+13567
-257
lines changed- cmd/epp/runner
- config/manifests
- gateway/gke
- vllm
- latencypredictor-v1
- __pycache__
- manifests
- pkg/epp
- backend/metrics
- config/loader
- datalayer
- metrics
- datastore
- handlers
- latencypredictorasync
- metrics
- testdata
- requestcontrol
- plugins/slorequest
- saturationdetector
- scheduling
- framework
- plugins
- multi/prefix
- profile
- scorer
- types
- server
- util/request
- test/integration
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
70 files changed
+13567
-257
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| 54 | + | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
56 | 58 | | |
| 59 | + | |
57 | 60 | | |
58 | 61 | | |
59 | 62 | | |
| |||
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
| 95 | + | |
92 | 96 | | |
93 | 97 | | |
94 | 98 | | |
| |||
107 | 111 | | |
108 | 112 | | |
109 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
110 | 117 | | |
111 | 118 | | |
112 | 119 | | |
| |||
233 | 240 | | |
234 | 241 | | |
235 | 242 | | |
236 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
237 | 264 | | |
238 | | - | |
| 265 | + | |
239 | 266 | | |
240 | 267 | | |
241 | 268 | | |
| |||
268 | 295 | | |
269 | 296 | | |
270 | 297 | | |
| 298 | + | |
271 | 299 | | |
272 | 300 | | |
273 | 301 | | |
| |||
310 | 338 | | |
311 | 339 | | |
312 | 340 | | |
313 | | - | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
314 | 355 | | |
315 | 356 | | |
316 | 357 | | |
| |||
329 | 370 | | |
330 | 371 | | |
331 | 372 | | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
332 | 379 | | |
333 | 380 | | |
334 | 381 | | |
| |||
358 | 405 | | |
359 | 406 | | |
360 | 407 | | |
| 408 | + | |
361 | 409 | | |
362 | 410 | | |
363 | 411 | | |
| |||
402 | 450 | | |
403 | 451 | | |
404 | 452 | | |
| 453 | + | |
405 | 454 | | |
406 | 455 | | |
407 | 456 | | |
| |||
510 | 559 | | |
511 | 560 | | |
512 | 561 | | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
0 commit comments