Feat/workers per api instance#646
Conversation
- `_resolve_workers_per_device_config` make `resolve_workers_per_device` into dict[api_path, workers_per_device_int] - `_inference_workers_config_for_api` to instantiate workers per device
|
The CI and e2e tests stuck at |
…id` by iterating through APIs since the number of workers can now vary per API
for more information, see https://pre-commit.ci
|
Precommit and tests successful. I hope this PR could get reviewed |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #646 +/- ##
===================================
- Coverage 85% 85% -0%
===================================
Files 39 39
Lines 3212 3261 +49
===================================
+ Hits 2721 2762 +41
- Misses 491 499 +8 🚀 New features to boost your workflow:
|
|
Hmmm, failed on macos. Any suggestion? |
|
request for review. Need approval for pending checks, cc @bhimrazy @aniketmaurya |
|
I've see that all tests (automatic and dispatch) and codecov is passed. Looking to see review about this PR |
|
Weird, it passed before in 3.12. Can elaborate @bhimrazy? |
|
@bhimrazy looks like in main it is passed now, but somehow only fail in ubuntu 3.10 |
yeah, it could be another flaky test 😅 |
|
I actually have no clue here, should I make some change or else? @bhimrazy |
Maybe try triggering the CI again with an empty commit. btw, we’re still waiting on a review/decision from @andyland @aniketmaurya. |
|
Alright, I've added some minimal comment commit to trigger the test. Can you retrigger the pending checks? @bhimrazy |
|
Seems like it passed all the checks required, looking for review of the code owner. cc @bhimrazy @aniketmaurya |
andyland
left a comment
There was a problem hiding this comment.
PR description is inadequate, could you at least detail how the interface changes w/ some example usage of the new functionality
|
Hi @andyland I've updated the details in the PR message. Another examples of this PR coverage is as follows:
server = ls.LitServer(
[sentiment_api, generate_api],
accelerator="cuda",
devices=[0, 1],
workers_per_device=2, # same for all routes
)
server = ls.LitServer(
[sentiment_api, generate_api],
accelerator="cuda",
devices=[0, 1],
workers_per_device={
"/sentiment": 2, # 2 workers per GPU for sentiment
"/generate": 3, # 3 workers per GPU for generation
},
)What this means in worker counts:
server = ls.LitServer(
[sentiment_api, generate_api],
accelerator="cuda",
devices=[0, 1],
workers_per_device=[2, 3], # sentiment then generate (same order as API list)
) |
…te and per-api position as requested
What does this PR do?
Added feature requested on enabling ability to control
workers_per_deviceper API instance #572 .Interface change: workers_per_device now accepts either:
int(previous behavior, applied to all APIs), orlist[int](per-API in connector order), ordict[str, int]mappingapi_path -> workers_per_device(per-route)Example:
This starts
2 * len(devices)=4inference workers for/sentimentand3 * len(devices)=6for/generate.This directly addresses #572 by allowing a single endpoint (e.g.
/generate) to be backed by multiple worker processes without duplicating API instances into multiple routes.Before submitting
Tests are passing locally. I added coverage in
tests/unit/test_lit_server.pyand verifiedThe PR adds unit coverage in
tests/unit/test_lit_server.py:test_workers_per_device_can_be_configured_per_routevalidates:api_path{"/sentiment": 2, "/generate": 3} with devices=[0,1]{"/sentiment": 4, "/generate": 6}test_workers_per_device_per_route_raises_on_unknown_routevalidates:api_path, otherwise raisesValueErrorOn my device (4 x T4) the tests results in:
Repro (4× T4):
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Definitely!