@@ -21,7 +21,10 @@ Please refer to [this guide](https://nvidia.github.io/TensorRT-LLM/installation/
2121 - [ Quick Start] ( #quick-start )
2222 - [ Run a single inference] ( #run-a-single-inference )
2323 - [ Multi-Token Prediction (MTP)] ( #multi-token-prediction-mtp )
24+ - [ Relaxed acceptance] ( #relaxed-acceptance )
2425 - [ Long context support] ( #long-context-support )
26+ - [ ISL-64k-OSL-1024] ( #isl-64k-osl-1024 )
27+ - [ ISL-128k-OSL-1024] ( #isl-128k-osl-1024 )
2528 - [ Evaluation] ( #evaluation )
2629 - [ Serving] ( #serving )
2730 - [ Use trtllm-serve] ( #use-trtllm-serve )
@@ -36,6 +39,7 @@ Please refer to [this guide](https://nvidia.github.io/TensorRT-LLM/installation/
3639 - [ FP8 KV Cache and MLA] ( #fp8-kv-cache-and-mla )
3740 - [ W4AFP8] ( #w4afp8 )
3841 - [ Notes and Troubleshooting] ( #notes-and-troubleshooting )
42+ - [ Known Issues] ( #known-issues )
3943
4044
4145## Hardware Requirements
@@ -136,7 +140,6 @@ python /app/tensorrt_llm/benchmarks/cpp/prepare_dataset.py \
136140
137141cat << EOF > /tmp/extra-llm-api-config.yml
138142pytorch_backend_config:
139- enable_overlap_scheduler: true
140143 use_cuda_graph: true
141144 cuda_graph_padding_enabled: true
142145 cuda_graph_batch_sizes: [1, 4, 8, 12]
@@ -165,7 +168,6 @@ python /app/tensorrt_llm/benchmarks/cpp/prepare_dataset.py \
165168
166169cat << EOF > /tmp/extra-llm-api-config.yml
167170pytorch_backend_config:
168- enable_overlap_scheduler: true
169171 use_cuda_graph: true
170172 cuda_graph_padding_enabled: true
171173 cuda_graph_batch_sizes: [1, 2]
@@ -192,7 +194,6 @@ Evaluate the model accuracy using `trtllm-eval`.
192194cat > ./extra-llm-api-config.yml << EOF
193195pytorch_backend_config:
194196 use_cuda_graph: true
195- enable_overlap_scheduler: true
196197enable_attention_dp: true
197198EOF
198199```
@@ -249,7 +250,6 @@ pytorch_backend_config:
249250 - 256
250251 - 384
251252 print_iter_log: true
252- enable_overlap_scheduler: true
253253enable_attention_dp: true
254254EOF
255255
@@ -441,7 +441,6 @@ pytorch_backend_config:
441441 - 256
442442 - 384
443443 print_iter_log: true
444- enable_overlap_scheduler: true
445444enable_attention_dp: true
446445EOF
447446```
0 commit comments