Deepseek r1 generates garbled characters

### System Info

Hi @kaiyux 

I encountered an issue and need your help.

I followed the [documentation requirements](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3) and used pytorch backend to test Deepseek R1 on an H20 machine. There were no errors during the process, but garbled characters were generated at the end.

My system configuration and software versions are as follows:

```
GPU: 8 * H20(96Gb)
TensorRT_LLM: Building from Source Code on Linux, The version is 0.19.0.dev2025031800
```

The command used to reproduce the error was:

```
python quickstart_advanced.py --model_dir ${MyDir}/DeepSeek-R1 --tp_size 8
```

And got these logs:

```
2025-03-19 09:35:03,175 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
Using MpiPoolSession to spawn MPI processes
2025-03-19 09:35:14,136 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,136 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,213 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,217 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,225 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,225 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,233 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,238 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.30s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.47s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3576.37s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.67s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.70s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.71s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.74s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.67s
2025-03-19 10:34:56,390 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:15,008 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,126 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,128 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,141 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,192 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,243 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,292 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,343 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,395 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,446 - INFO - flashinfer.jit: Finished loading JIT ops: norm
[TensorRT-LLM][INFO] Detecting local TP group for rank 2
[TensorRT-LLM][INFO] Detecting local TP group for rank 5
[TensorRT-LLM][INFO] Detecting local TP group for rank 0
[TensorRT-LLM][INFO] Detecting local TP group for rank 7
[TensorRT-LLM][INFO] Detecting local TP group for rank 4
[TensorRT-LLM][INFO] Detecting local TP group for rank 3
[TensorRT-LLM][INFO] Detecting local TP group for rank 6
[TensorRT-LLM][INFO] Detecting local TP group for rank 1
[TensorRT-LLM][INFO] TP group is intra-node for rank 3
[TensorRT-LLM][INFO] TP group is intra-node for rank 5
[TensorRT-LLM][INFO] TP group is intra-node for rank 2
[TensorRT-LLM][INFO] TP group is intra-node for rank 6
[TensorRT-LLM][INFO] TP group is intra-node for rank 1
[TensorRT-LLM][INFO] TP group is intra-node for rank 0
[TensorRT-LLM][INFO] TP group is intra-node for rank 4
[TensorRT-LLM][INFO] TP group is intra-node for rank 7
2025-03-19 10:35:57,860 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,873 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,873 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,901 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,902 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,903 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,904 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,905 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:36:12,844 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,873 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,893 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,922 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,942 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,991 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:13,042 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:13,093 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
Processed requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.09it/s]
[0] Prompt: 'Hello, my name is', Generated text: '\n\n# (  Doctor, holiness,1 _., and ,  cruz? ......\nAlright, my角和, ......\n\n.\nOkay\n\n,,\n ......\n  ,,\n  ,##Hello,,\n,,角和s<think>Fan  ......\n, ,  '
[1] Prompt: 'The president of the United States is', Generated text: ' the●● is角和000\\( isDonald1.. The,,。,000'
[2] Prompt: 'The capital of France is', Generated text: ' Paris. I Paris##, _ _ _\n\n\r\n\n  ur,,000ాన\n\nOkay: 角和:'
[3] Prompt: 'The future of AI is', Generated text: ' a000\n editor. AI _AIAI[\n\n#AI000AI культуры/Hr000000000000000\n\n>0065\r0000000000000000000000000000000000000\r\nUrls/Hr0000000000000000000角和000\n0000000000000000000000000000'
```

The error is as follows：

```
[0] Prompt: 'Hello, my name is', Generated text: '\n\n# (  Doctor, holiness,1 _., and ,  cruz? ......\nAlright, my角和, ......\n\n.\nOkay\n\n,,\n ......\n  ,,\n  ,##Hello,,\n,,角和s<think>Fan  ......\n, ,  '
[1] Prompt: 'The president of the United States is', Generated text: ' the●● is角和000\\( isDonald1.. The,,。,000'
[2] Prompt: 'The capital of France is', Generated text: ' Paris. I Paris##, _ _ _\n\n\r\n\n  ur,,000ాన\n\nOkay: 角和:'
[3] Prompt: 'The future of AI is', Generated text: ' a000\n editor. AI _AIAI[\n\n#AI000AI культуры/Hr000000000000000\n\n>0065\r0000000000000000000000000000000000000\r\nUrls/Hr0000000000000000000角和000\n0000000000000000000000000000'
```

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

python quickstart_advanced.py --model_dir /mnt/newdisk/models/DeepSeek-R1 --tp_size 8

### Expected behavior

N/A

### actual behavior

N/A

### additional notes

N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepseek r1 generates garbled characters #2968

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deepseek r1 generates garbled characters #2968

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions