File tree Expand file tree Collapse file tree 6 files changed +10
-3
lines changed
tensorrt_llm/_torch/auto_deploy/custom_ops
tests/integration/test_lists/test-db Expand file tree Collapse file tree 6 files changed +10
-3
lines changed Original file line number Diff line number Diff line change @@ -476,9 +476,6 @@ def update_input_ids_with_new_tokens(
476476 idx = self .previous_batch_indices_cuda [: len (previous_batch_indices )]
477477 idx .copy_ (host_idx , non_blocking = True )
478478
479- # sort them so that masked_scatter_ lines up correctly
480- idx , _ = idx .sort ()
481-
482479 # gather the exact values you want to write
483480 src = new_tokens [0 , idx , 0 ]
484481
Original file line number Diff line number Diff line change @@ -69,6 +69,8 @@ l0_b200:
6969 - unittest/_torch/modeling -k "modeling_deepseek"
7070 - unittest/_torch/modeling -k "modeling_gpt_oss"
7171 - unittest/_torch/auto_deploy/unit/singlegpu -k "not test_trtllm_bench_backend_comparison"
72+ # ------------- AutoDeploy tests ---------------
73+ - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
7274- condition :
7375 ranges :
7476 system_gpu_count :
Original file line number Diff line number Diff line change @@ -89,3 +89,5 @@ l0_dgx_b200:
8989 - disaggregated/test_disaggregated.py::test_disaggregated_benchmark_on_diff_backends[DeepSeek-V3-Lite-fp8]
9090 - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_nixl_backend
9191 - accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_nixl_backend
92+ # ------------- AutoDeploy tests ---------------
93+ - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
Original file line number Diff line number Diff line change @@ -61,6 +61,8 @@ l0_dgx_h100:
6161 - test_e2e.py::test_ptp_quickstart_advanced_bs1
6262 - test_e2e.py::test_ptp_quickstart_advanced_deepseek_v3_lite_4gpus_adp_balance[DeepSeek-V3-Lite-FP8-DeepSeek-V3-Lite/fp8]
6363 - unittest/_torch/modeling/test_modeling_pixtral.py::test_tensor_parallelism
64+ # ------------- AutoDeploy tests ---------------
65+ - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
6466- condition :
6567 ranges :
6668 system_gpu_count :
Original file line number Diff line number Diff line change @@ -34,6 +34,8 @@ l0_dgx_h200:
3434 - unittest/_torch/multi_gpu_modeling/test_llama4.py::test_llama4[pp1-ep1-disable_adp-enable_graph-tp8-trtllm-scout]
3535 - unittest/_torch/multi_gpu_modeling/test_llama4.py::test_llama4[pp1-ep4-enable_adp-enable_graph-tp8-trtllm-scout]
3636 - unittest/llmapi/test_llm_pytorch.py::test_nemotron_nas_lora
37+ # ------------- AutoDeploy tests ---------------
38+ - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
3739- condition :
3840 ranges :
3941 system_gpu_count :
Original file line number Diff line number Diff line change @@ -102,6 +102,8 @@ l0_h100:
102102 - test_e2e.py::test_trtllm_bench_request_rate_and_concurrency[enable_concurrency-enable_request_rate] # negative test
103103 - test_e2e.py::test_trtllm_bench_help_sanity[meta-llama/Llama-3.1-8B]
104104 - test_e2e.py::test_ptp_quickstart_multimodal[gemma-3-27b-it-gemma/gemma-3-27b-it-image-True]
105+ # ------------- AutoDeploy tests ---------------
106+ - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
105107- condition :
106108 ranges :
107109 system_gpu_count :
You can’t perform that action at this time.
0 commit comments