-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
GPU: 5090d
Library version: tensorrt-llm 1.2.0rc6
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
running script:
trtllm-bench --model mbox_qwen_3b_2_raw
--model_path mbox_qwen_3b_2_raw
throughput
--dataset dataset.jsonl
--backend pytorch
--num_requests 500
--max_batch_size 1
--extra_llm_api_options eagle_draft_tree_chain4.yml
eagle_draft_tree_chain4.yml:
enable_attention_dp: false
enable_chunked_prefill: false
speculative_config:
max_draft_len: 4
decoding_type: Eagle
speculative_model_dir: ./draft_path
eagle_choices: [[0], [0, 0], [0, 0, 0], [0, 0, 0, 0]]
eagle3_one_model: false
use_dynamic_tree: false
disable_overlap_scheduler: true
max_seq_len: 2048
Expected behavior
Program completes without exception.
actual behavior
additional notes
I suspect the step parameter passed to finish_if_reason in sampler.py: TorchSampler::_process_draft_tokens_tree is wrong:
After logging, I find that the shape of finish_reasons is (1, 5, 1), if we try to accept all the 4 draft tokens in our case, the num_accepted_draft_tokens becomes 5 finally(1 target token plus 4 draft tokens). However 5 is not a valid index to be the finish_reasons's second dim.
Is the right parameter for function finish_if_reason's step parameter to be num_accepted_draft_tokens - 1?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
