Skip to content

Commit 85c7acc

Browse files
authored
Merge branch 'main' into unify-dist
2 parents 898db5c + ff7eb93 commit 85c7acc

File tree

49 files changed

+857
-232
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+857
-232
lines changed
220 KB
Loading
512 KB
Loading
329 KB
Loading
598 KB
Loading

docs/source/blogs/tech_blog/blog15_Optimizing_DeepSeek_V32_on_NVIDIA_Blackwell_GPUs.md

Lines changed: 423 additions & 0 deletions
Large diffs are not rendered by default.

examples/layer_wise_benchmarks/run.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,11 @@
1010
import yaml
1111

1212
from tensorrt_llm._torch.autotuner import AutoTuner, autotune
13+
from tensorrt_llm._torch.distributed import MPIDist, TorchDist
1314
from tensorrt_llm._torch.modules.fused_moe.fused_moe_cutlass import CutlassFusedMoE
1415
from tensorrt_llm._torch.modules.fused_moe.interface import AlltoallMethodType
1516
from tensorrt_llm._torch.modules.multi_stream_utils import with_multi_stream
16-
from tensorrt_llm._utils import local_mpi_rank, mpi_rank, mpi_world_size
17+
from tensorrt_llm._utils import local_mpi_rank, mpi_disabled, mpi_rank, mpi_world_size
1718
from tensorrt_llm.logger import logger
1819
from tensorrt_llm.tools.layer_wise_benchmarks import BalanceMethod, get_runner_cls, mark_ranges
1920

@@ -173,6 +174,8 @@ def comma_separated_floats(s):
173174
)
174175
if args.enable_autotuner:
175176
cache_path = os.getenv("TLLM_AUTOTUNER_CACHE_PATH") or None
177+
dist = TorchDist(mapping=mapping) if mpi_disabled() else MPIDist(mapping=mapping)
178+
AutoTuner.get().setup_distributed_state(mapping, dist)
176179
with autotune(cache_path=cache_path):
177180
run_pack()
178181
else:

requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,4 @@ opentelemetry-exporter-otlp>=1.26.0
3737
opentelemetry-semantic-conventions-ai>=0.4.1
3838
fuzzywuzzy==0.18.0
3939
aiperf==0.3.0
40+
nanobind>=2.9.0

security_scanning/examples/auto_deploy/poetry.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

security_scanning/examples/models/contrib/stdit/poetry.lock

Lines changed: 5 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

security_scanning/examples/models/core/qwen/poetry.lock

Lines changed: 20 additions & 20 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)