[Eagle3] Accuracy Issue

### System Info

- CPU Architecture: x86_64
- ​GPU Properties:
  - Name: H100
  - Memory: 80GB
- TensorRT-LLM Branc: v1.0.0rc1
- Versions: cuda12.8

### Who can help?

@kaiyux 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

- My Script
```
export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0"
base_model=/mnt/modelops/models/Qwen3-30B-A3B/Qwen__Qwen3-30B-A3B
eagle_model=/mnt/modelops/models/qwen3_30b_moe_eagle3_8b1ecfe164a4efb3

max_batch_size=32
max_num_tokens=32768
tp_size=8

cat <<EOF > extra_llm_api_options.yaml
kv_cache_config:
  enable_block_reuse: False
  free_gpu_memory_fraction: 0.85
use_cuda_graph: True
cuda_graph_max_batch_size: 1
cuda_graph_padding_enabled: True
attn_backend: TRTLLM
enable_iter_perf_stats: False
enable_iter_req_stats: False
print_iter_log: False
enable_chunked_prefill: False
disable_overlap_scheduler: False
dtype: auto
# EAGLE-3
speculative_config:
    decoding_type: Eagle
    max_draft_len: 4
    pytorch_weights_path: ${eagle_model}

#
EOF

trtllm-serve \
    ${base_model} \
    --host 127.0.0.1 \
    --port 9122 \
    --tp_size ${tp_size} \
    --max_batch_size ${max_batch_size} \
    --max_num_tokens ${max_num_tokens} \
    --kv_cache_free_gpu_memory_fraction 0.85 \
    --log_level info \
    --trust_remote_code \
    --backend pytorch \
    --extra_llm_api_options extra_llm_api_options.yaml
```

- curl command

```
curl http:/127.0.0.1:9122/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "test",
        "messages":[{"role": "user", "content": "Python bubble sort code."}],
        "max_tokens": 128,
        "stream":false
    }'
```
- bad case

```
{"id":"chatcmpl-18967b0c68294089bd6188da8b8eac77","object":"chat.completion","created":1751878126,"model":"test","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n[…]\nOkay,颧, I控制 the user is askingDTV for Python bubble[method for bubble \nOkay, dri, Iyre, I need \n\n\nOkay,licted, the user ///</span> \n\nOkay, vk, the user站 is asking|\n\nOkay,一个职业, the \n\nOkay,ANTA, the站 is|\n\nOkay \nOkay, Tomas, the cott, theiris, the \n\nOkay,[method for bubble奶粉, \n\nOkay ///</span> \n\nOkay站 is|\n\nOkay \nOkay,iris, \n\nOkay,OAD, \n\nOkay站 is|\n\nOkay \n\nOkay,[method for \n\nOkay站 is","reasoning_content":null,"tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":13,"total_tokens":137,"completion_tokens":124},"prompt_token_ids":null}
```

### Expected behavior
right output：
{"id":"chatcmpl-eb60fe9f6ea44a079a0affc9b63044a8","object":"chat.completion","created":1751879316,"model":"test","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\nOkay, I need to write a Python code for bubble sort. Let me think about how bubble sort works. From what I remember, bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted.\n\nSo the basic steps are: iterate through the list, compare each pair of adjacent elements, and swap them if they are not in the correct order. Each pass moves the largest unsorted element to its correct position at the end of the list. That's why it's called bubble sort","reasoning_content":null,"tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":13,"total_tokens":141,"completion_tokens":128},"prompt_token_ids":null}

### actual behavior

{"id":"chatcmpl-18967b0c68294089bd6188da8b8eac77","object":"chat.completion","created":1751878126,"model":"test","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n[…]\nOkay,颧, I控制 the user is askingDTV for Python bubble[method for bubble \nOkay, dri, Iyre, I need \n\n\nOkay,licted, the user ///</span> \n\nOkay, vk, the user站 is asking|\n\nOkay,一个职业, the \n\nOkay,ANTA, the站 is|\n\nOkay \nOkay, Tomas, the cott, theiris, the \n\nOkay,[method for bubble奶粉, \n\nOkay ///</span> \n\nOkay站 is|\n\nOkay \nOkay,iris, \n\nOkay,OAD, \n\nOkay站 is|\n\nOkay \n\nOkay,[method for \n\nOkay站 is","reasoning_content":null,"tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":13,"total_tokens":137,"completion_tokens":124},"prompt_token_ids":null}

### additional notes

none

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Eagle3] Accuracy Issue #5791

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Eagle3] Accuracy Issue #5791

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions