-
Notifications
You must be signed in to change notification settings - Fork 188
Multi model deployment #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
TosinSeg
wants to merge
74
commits into
deepspeedai:main
Choose a base branch
from
TosinSeg:multi-model-deployment
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
4eac006
Removing load balancing config
TosinSeg c68e999
Reformatting tests
TosinSeg 5ce1a92
Fixed the formatting
TosinSeg fa10e19
Removed print statement
TosinSeg f9cbd74
Merging main
TosinSeg 8970f4e
Removing unused import
TosinSeg 517bea8
Fixing tests
TosinSeg 58dd2b2
Fixing merge issue
TosinSeg bb0d551
Creating hostfile when one is not provided
TosinSeg e2bb9d5
Merge branch 'main' into Always_enable_load_balancing
TosinSeg 3823534
Fixing import statements removed by merge
TosinSeg 6f9b4ad
Removing load_balancing check
TosinSeg 499b9ad
Removing redudant definitions
TosinSeg 5419ef6
Removing hostfile from test
TosinSeg a70b6de
Removing hostfile from non-persistent test
TosinSeg eea658b
initial changes
TosinSeg 20f0878
Merge branch 'main' into multi-model-deployment
TosinSeg c21c31b
Maintaining current behavior
TosinSeg f525329
Reading from score file
TosinSeg 3c0937f
fixing syntax errors
TosinSeg 156ac83
Fixing more syntax errors
TosinSeg 38e270e
Fixing more syntax issues
TosinSeg 4d4e0d8
initial lb changes
TosinSeg 01c8e59
Merge branch 'main' into multi-model-deployment
TosinSeg f801b36
More load balancing changes
TosinSeg fd4e2ed
LB changes and syntax
TosinSeg 0a3b7e5
Refactor client, and unpack request in load balancer
TosinSeg 6523c04
First working queries
TosinSeg 06b40f5
Fixing conversational and q&a args
TosinSeg 96d0dcb
Updates to _allocate_processes and fixing example
TosinSeg ab41d24
Adding host map for allocating processes and formatting
TosinSeg 8673a9a
Fixing terminate functionality
TosinSeg 8d09b37
Refactored client
TosinSeg 7a136d6
More Refactoring and q/a example
TosinSeg 2c6ec08
Reformatting to maintain previous syntax
TosinSeg 0cb88a9
Removing print/debug statements
TosinSeg 7c0ee12
Fixing non-persistent deloyments
TosinSeg 7a956d5
Refactoring Load balancer launch
TosinSeg f8cfe28
Fixing restful gateway client
TosinSeg 079807d
Fixing replica issue
TosinSeg ea1e47e
Fixing non persistent client
TosinSeg 98b6129
Adding trust_remote_code support (#203)
msinha251 daab5e6
Refactoring
TosinSeg 84073f9
Update mii/models/score/generate.py
TosinSeg 3ee3410
Merge branch 'multi-model-deployment' of github.com:TosinSeg/DeepSpee…
b4edc2b
Refactoring Load Balancer and request_proto
6346194
Formatting
94b6699
Fixing the client
710c20b
Initial partial deployment commit
c2636b7
More partial deploy updates
189e75c
Partial deploy started
adee843
fixing add deploy api queries
a145be5
Support for empty deployment 'group'
082c05e
Support for empty deployment 'group'
3ce77d2
Partial Termination
b40ecbd
Refactoring
72dd95c
formatting
a4e3d56
fixing bug for partial termination
4b5bb47
Removing comments
30d2b03
Including GPU index map in score file
c5d5996
Refactoring deployment
3ae1781
Refactoring and formatting
4b8f02f
Refactoring
c51ce37
Fixing Readme
43479db
Refactoring GRPC
e1b6d23
Fixing LB process not terminating
1675bd8
Adding multi_deployment and partial deploy/terminate unit tests
8684a61
Removing comments
56a7fce
Fixing spelling issues
fb70c3d
Update mii/client.py
TosinSeg e2cfe8a
Update mii/client.py
TosinSeg 1312738
Removing AML from addDeploy
b0f0da4
Refactoring MIIConfig and DeploymentConfig
b78068e
Partial deploy/termination example
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # DeepSpeed Team | ||
|
|
||
| import mii | ||
|
|
||
| deployments = [] | ||
| results = [] | ||
| name = 'bigscience/bloom-560m' | ||
| mii_configs1 = {"tensor_parallel": 1, "dtype": "fp16"} | ||
| deployments.append( | ||
| mii.DeploymentConfig(task='text-generation', | ||
| model=name, | ||
| deployment_name=name + "_deployment5", | ||
| mii_configs=mii.config.MIIConfig(**mii_configs1) | ||
| )) | ||
|
|
||
| generator = mii.mii_query_handle("multi_models") | ||
| generator.add_models(deployments=deployments) | ||
|
|
||
| result = generator.query( | ||
| { | ||
| "query": ["DeepSpeed is", | ||
| "Seattle is"], | ||
| "deployment_name": "bigscience/bloom-560m_deployment5" | ||
| }, | ||
| do_sample=True, | ||
| max_new_tokens=30, | ||
| ) | ||
| print(result) | ||
| generator.delete_model("bigscience/bloom-560m_deployment5") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # DeepSpeed Team | ||
| import mii | ||
|
|
||
| gpu_index_map1 = {'master': [0]} | ||
| gpu_index_map2 = {'master': [1]} | ||
| gpu_index_map3 = {'master': [0, 1]} | ||
|
|
||
| deployments = [] | ||
|
|
||
| mii_configs1 = {"tensor_parallel": 2, "dtype": "fp16"} | ||
| mii_configs2 = {"tensor_parallel": 1} | ||
|
|
||
| name = "bigscience/bloom-560m" | ||
| deployments.append( | ||
| mii.DeploymentConfig(task='text-generation', | ||
| model=name, | ||
| deployment_name=name + "_deployment", | ||
| GPU_index_map=gpu_index_map3, | ||
| tensor_parallel=2, | ||
| dtype="fp16")) | ||
|
|
||
| # gpt2 | ||
| name = "microsoft/DialogRPT-human-vs-rand" | ||
| deployments.append( | ||
| mii.DeploymentConfig(task='text-classification', | ||
| model=name, | ||
| deployment_name=name + "_deployment", | ||
| GPU_index_map=gpu_index_map2)) | ||
|
|
||
| name = "microsoft/DialoGPT-large" | ||
| deployments.append( | ||
| mii.DeploymentConfig( | ||
| task='conversational', | ||
| model=name, | ||
| deployment_name=name + "_deployment", | ||
| GPU_index_map=gpu_index_map1, | ||
| )) | ||
|
|
||
| name = "deepset/roberta-large-squad2" | ||
| deployments.append( | ||
| mii.DeploymentConfig(task="question-answering", | ||
| model=name, | ||
| deployment_name=name + "-qa-deployment", | ||
| GPU_index_map=gpu_index_map2)) | ||
|
|
||
| mii.deploy(deployment_tag="multi_models", deployments=deployments) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # DeepSpeed Team | ||
|
|
||
| import mii | ||
|
|
||
| results = [] | ||
| generator = mii.mii_query_handle("multi_models") | ||
| result = generator.query( | ||
| { | ||
| "query": ["DeepSpeed is", | ||
| "Seattle is"], | ||
| "deployment_name": "bigscience/bloom-560m_deployment" | ||
| }, | ||
| do_sample=True, | ||
| max_new_tokens=30, | ||
| ) | ||
| results.append(result) | ||
| print(result) | ||
|
|
||
| result = generator.query({ | ||
| 'query': | ||
| "DeepSpeed is the greatest", | ||
| "deployment_name": | ||
| "microsoft/DialogRPT-human-vs-rand_deployment" | ||
| }) | ||
| results.append(result) | ||
| print(result) | ||
|
|
||
| result = generator.query({ | ||
| 'text': "DeepSpeed is the greatest", | ||
| 'conversation_id': 3, | ||
| 'past_user_inputs': [], | ||
| 'generated_responses': [], | ||
| "deployment_name": "microsoft/DialoGPT-large_deployment" | ||
| }) | ||
| results.append(result) | ||
| print(result) | ||
|
|
||
| result = generator.query({ | ||
| 'question': | ||
| "What is the greatest?", | ||
| 'context': | ||
| "DeepSpeed is the greatest", | ||
| "deployment_name": | ||
| "deepset/roberta-large-squad2" + "-qa-deployment" | ||
| }) | ||
| results.append(result) | ||
| print(result) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # DeepSpeed Team | ||
| import mii | ||
|
|
||
| mii.terminate("multi_models") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.