Skip to content

Commit 86fef36

Browse files
authored
fix: update project name, version, system data support matrix (#3)
1 parent 3853d3b commit 86fef36

File tree

5 files changed

+18
-9
lines changed

5 files changed

+18
-9
lines changed

README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@ SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All
33
SPDX-License-Identifier: Apache-2.0
44
-->
55

6-
# aiconfigurator
6+
# AIConfigurator
77
Today, in disaggregated serving, it's quite difficult to find a proper config
88
to get benefits from disaggregation such as how many prefill workers and decode workers
99
do I need and what about the parallelism for each worker. Combined with SLA:
1010
TTFT(Time-To-First-Token) and TPOT(Time-Per-Output-Token), it becomes even more complicated
1111
to solve the throughput @ latency problem.
1212

13-
We're introducing aiconfigurator to help you find a good reference to start with in your
13+
We're introducing AIConfigurator to help you find a good reference to start with in your
1414
disaggregated serving journey. The tool will try to search the space to get a good deployment config
1515
based on your requirement including which model you want to serve, how many GPUs you have and what's
1616
the GPU. Automatically generate the config files for you to deploy with Dynamo.
@@ -52,7 +52,7 @@ With **-h**, you can have more information about optional args to customize your
5252

5353
```
5454
********************************************************************************
55-
* Dynamo aiconfigurator Final Results *
55+
* Dynamo AIConfigurator Final Results *
5656
********************************************************************************
5757
----------------------------------------------------------------------------
5858
Input Configuration & SLA Target:
@@ -205,6 +205,15 @@ TRTLLM Versions: 0.20.0, 1.0.0rc3
205205
Parallel modes: Tensor-parallel; Pipeline-parallel; Expert Tensor-parallel/Expert-parallell; Attention DP for DEEPSEEK and MoE
206206
Scheduling: Static; IFB(continuous batching); Disaggregated serving; MTP for DEEPSEEK
207207

208+
### System Data Support Matrix
209+
210+
| System | Framework(Version) | Status |
211+
|--------|-------------------|--------|
212+
| h100_sxm | TRTLLM(0.20.0, 1.0.0rc3) ||
213+
| h200_sxm | TRTLLM(0.20.0, 1.0.0rc3) ||
214+
| b200_sxm | TRTLLM(NA) | 🚧 |
215+
216+
208217
## Data Collection
209218
Data collection is a standalone process for collecting the database for aiconfigurator. By default, you don't have to collect the data by yourself.
210219
Small versions of database will not introduce huge perf difference. Say, you can use 1.0.0rc3 data of trtllm on h200_sxm and deploy the generated

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
[project]
55
name = "aiconfigurator"
6-
version = "0.1.0"
6+
version = "0.1.1"
77
authors = [
88
{ name = "NVIDIA Inc.", email = "sw-dl-dynamo@nvidia.com" },
99
]

src/aiconfigurator/cli/main.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ class AIConfiguratorConfig:
8989

9090
@dataclass
9191
class AIConfiguratorResult:
92-
"""Result of Dynamo aiconfigurator"""
92+
"""Result of Dynamo AIConfigurator"""
9393
chosen_system_type: str # "agg" or "disagg" or "none"
9494
has_disagg_benefit: bool
9595
benefit_ratio: float # disagg_throughput / agg_throughput
@@ -997,7 +997,7 @@ def main(args):
997997
logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO,
998998
format='%(levelname)s %(asctime)s %(filename)s:%(lineno)d] %(message)s')
999999

1000-
logger.info(f"Loading Dynamo aiconfigurator version: {__version__}")
1000+
logger.info(f"Loading Dynamo AIConfigurator version: {__version__}")
10011001

10021002
# Create aiconfigurator and its config
10031003
aiconfigurator = AIConfigurator()
@@ -1037,7 +1037,7 @@ def main(args):
10371037
logger.info(f"Configuration completed in {end_time - start_time:.2f} seconds")
10381038

10391039
if __name__ == "__main__":
1040-
parser = argparse.ArgumentParser(description="Dynamo aiconfigurator for Disaggregated Serving Deployment")
1040+
parser = argparse.ArgumentParser(description="Dynamo AIConfigurator for Disaggregated Serving Deployment")
10411041
configure_parser(parser)
10421042
args = parser.parse_args()
10431043
main(args)

src/aiconfigurator/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
def main():
1010
parser = argparse.ArgumentParser(
11-
description='Dynamo aiconfigurator for disaggregated serving deployment.'
11+
description='Dynamo AIConfigurator for disaggregated serving deployment.'
1212
)
1313
subparsers = parser.add_subparsers(dest='command', help='Command to run', required=True)
1414

tools/simple_sdk_demo/sla_service/sla_service.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ class PrettyJSONResponse(Response):
2727
def render(self, content: Any) -> bytes:
2828
return orjson.dumps(content, option=orjson.OPT_INDENT_2)
2929

30-
app = FastAPI(title="Dynamo aiconfigurator SLA API", description="Dynamo aiconfigurator SLA API", default_response_class=PrettyJSONResponse)
30+
app = FastAPI(title="Dynamo AIConfigurator SLA API", description="Dynamo AIConfigurator SLA API", default_response_class=PrettyJSONResponse)
3131

3232
@app.get("/sla/supported_models")
3333
def get_supported_models():

0 commit comments

Comments
 (0)