-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
- CPU: x86_64
- GPU: NVIDIA H100
- OS: Ubuntu 22.04
- docker: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6
- tensorrt_llm Version: 1.2.0rc6
- tensorrt Version: 10.13.3.9
- torch Version: 2.9.0a0+145a3a7bda.nv25.10
- nvidia-smi: NVIDIA-SMI 580.65.06 Driver Version: 580.65.06 CUDA Version: 13.0
- nvcc --version
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
-
start the docker by:
sudo docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all --name=trtllm nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 /bin/bash -
download the models from hugging-face
huggingface-cli download Qwen/Qwen3-4B --local-dir Qwen/Qwen3-4B
huggingface-cli download zhuyksir/EAGLE3-Qwen3-4B-DenseHead --local-dir zhuyksir/EAGLE3-Qwen3-4B-DenseHead -
luanch the Qwen baseline and get the result
luanch by:trtllm-serve Qwen/Qwen3-4B/
test by:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": " Qwen3-4B",
"messages": [{"role": "user", "content": "Tell me a long story"}],
"max_tokens": 500,
"temperature": 0
}'
- luanch the Qwen+Eagle3 and get the result
luanch by:trtllm-serve Qwen/Qwen3-4B/ --extra_llm_api_options zhuyksir/EAGLE3-Qwen3-4B-DenseHead/extra-llm-api-config.yml
the config content inextra-llm-api-config.yml:
speculative_config:
decoding_type: Eagle
max_draft_len: 4
speculative_model_dir: zhuyksir/EAGLE3-Qwen3-4B-DenseHead/
eagle3_one_model: true
test by:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": " Qwen3-4B",
"messages": [{"role": "user", "content": "Tell me a long story"}],
"max_tokens": 500,
"temperature": 0
}'
- compare the generated results
Expected behavior
Eagle3 should NOT change the output when we do greedy decoding.
actual behavior
Eagle shows DIFFERENT outputs when we do greedy decoding.
for example the outputs start to become different at the bold sentences:
-
Qwen baseline without Eagle output:
{"id":"chatcmpl-xxxxx","object":"chat.completion","created":1766737434,"model":"Qwen3-4B","choices":[{"index":0,"message":{"role":"assistant","content":"\nOkay, the user wants a long story. Let me think about what kind of story would be engaging. Maybe something with a unique setting and characters. I should start with a strong opening to hook the reader. Maybe a fantasy or sci-fi element? Or perhaps a more realistic story with a twist.\n\nHmm, a fantasy story could be interesting. Let's set it in a world with some magical elements. Maybe a kingdom with a unique feature. Oh, what about a place where time is a currency? That's an original concept. Time as a resource... that could lead to interesting conflicts.\n\nSo, the main character could be someone who discovers this secret. Maybe a young person, like a girl named Lira. She's curious and adventurous. She finds a hidden village where time is traded. The villagers use time to grow food, create art, and even age. But there's a catch—time is finite, and the village is in danger of running out.\n\nConflict: The village is facing a crisis because their time supply is depleting. They need to find a way to replenish it. Lira has to go on a quest to find the source of time. Maybe there's a magical entity or a hidden place where time is stored. Along the way, she meets allies and faces challenges.\n\nThemes could include the value of time, the consequences of greed, and the importance of community. Maybe the villagers are hoarding time, leading to their downfall. Lira has to teach them to value time differently, perhaps by sharing it or finding a sustainable source.\n\nI need to build the world around this. The village's society, how they use time, the magic system. Maybe the time is drawn from a sacred place, like a crystal or a tree. The climax could involve a ritual or a battle to restore the balance. The resolution would be about harmony and understanding.\n\nI should make sure the story has a clear beginning, middle, and end. Develop the characters, show their growth, and include some emotional moments. Maybe Lira learns something about herself and the importance of time in her life. Also, include some descriptive details to make the world vivid.\n\nCheck for plot holes. How does time work in this world? What are the rules? Make sure the magic system is consistent. Also, the conflict needs to be resolved in a satisfying way. Maybe the villagers realize they need to use time wisely, not hoard it. The ending could be hopeful","reasoning_content":"","reasoning":null,"tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null,"mm_embedding_handle":null,"disaggregated_params":null,"avg_decoded_tokens_per_iter":1.0}],"usage":{"prompt_tokens":13,"total_tokens":513,"completion_tokens":500,"prompt_tokens_details":{"cached_tokens":0}},"prompt_token_ids":null} -
Qwen + Eagle3 output:
{"id":"chatcmpl-xxxxx","object":"chat.completion","created":1766737371,"model":"Qwen3-4B","choices":[{"index":0,"message":{"role":"assistant","content":"\nOkay, the user wants a long story. Let me think about what kind of story would be engaging. Maybe something with a unique setting and characters. I should start with a strong opening to hook the reader. Maybe a fantasy or sci-fi element? Or perhaps a more realistic story with a twist.\n\nHmm, a fantasy story could be interesting. Let's set it in a world with some magical elements. Maybe a kingdom with a unique feature. Oh, what about a place where time is a currency? That's an original concept. Time as a resource... that could lead to interesting conflicts.\n\nSo, the main character could be someone who discovers this secret. Maybe a young person, like a girl named Lira. She's curious and adventurous. She finds a hidden village where time is traded. The villagers use time to grow food, create art, and even age. But there's a catch—time is finite, and the village is in danger of running out.\n\nConflict: The village is facing a crisis because their time supply is depleting. They need to find a way to replenish it. Lira has to go on a quest to find the source of time. Maybe there's a magical entity or a hidden place where time is stored. Along the way, she meets allies and faces challenges.\n\nThemes could include the value of time, the consequences of greed, and the importance of community. Maybe the villagers are hoarding time, leading to their downfall. Lira has to teach them to value time differently, perhaps by sharing it or finding a sustainable source.\n\nI need to build the world around this. The village's society, how they use time, the magic system. Maybe the time is drawn from a sacred place, like a crystal or a tree. The climax could involve a ritual or a battle to restore the balance. The resolution would be about harmony and understanding.\n\nI should make sure the story has a clear beginning, middle, and end. Develop the characters, show their growth, and include some emotional moments. Maybe Lira learns to appreciate time more, and the village changes for the better. Add some suspense and adventure elements to keep it engaging.\n\nLet me outline the story structure. Start with Lira discovering the village, then the problem they face, her journey to find the solution, the challenges she encounters, and the resolution. Include some magical elements and maybe a mentor figure or a rival.\n\nI need to check for consistency in the world-building","reasoning_content":"","reasoning":null,"tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null,"mm_embedding_handle":null,"disaggregated_params":null,"avg_decoded_tokens_per_iter":1.5873016119003296}],"usage":{"prompt_tokens":13,"total_tokens":513,"completion_tokens":500,"prompt_tokens_details":{"cached_tokens":12}},"prompt_token_ids":null}
additional notes
Not sure if it is related the eagle specific parallel verification kernels.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.