Skip to content

Deepseek r1 generates garbled characters #2968

@RubberDuck-whiteEaves

Description

@RubberDuck-whiteEaves

System Info

Hi @kaiyux

I encountered an issue and need your help.

I followed the documentation requirements and used pytorch backend to test Deepseek R1 on an H20 machine. There were no errors during the process, but garbled characters were generated at the end.

My system configuration and software versions are as follows:

GPU: 8 * H20(96Gb)
TensorRT_LLM: Building from Source Code on Linux, The version is 0.19.0.dev2025031800

The command used to reproduce the error was:

python quickstart_advanced.py --model_dir ${MyDir}/DeepSeek-R1 --tp_size 8

And got these logs:

2025-03-19 09:35:03,175 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
Using MpiPoolSession to spawn MPI processes
2025-03-19 09:35:14,136 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,136 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,213 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,217 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,225 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,225 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,233 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,238 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.30s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.47s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3576.37s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.67s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.70s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.71s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.74s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00,  2.14s/it]
Model init total -- 3575.67s
2025-03-19 10:34:56,390 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:15,008 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,126 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,128 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,141 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,192 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,243 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,292 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,343 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,395 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,446 - INFO - flashinfer.jit: Finished loading JIT ops: norm
[TensorRT-LLM][INFO] Detecting local TP group for rank 2
[TensorRT-LLM][INFO] Detecting local TP group for rank 5
[TensorRT-LLM][INFO] Detecting local TP group for rank 0
[TensorRT-LLM][INFO] Detecting local TP group for rank 7
[TensorRT-LLM][INFO] Detecting local TP group for rank 4
[TensorRT-LLM][INFO] Detecting local TP group for rank 3
[TensorRT-LLM][INFO] Detecting local TP group for rank 6
[TensorRT-LLM][INFO] Detecting local TP group for rank 1
[TensorRT-LLM][INFO] TP group is intra-node for rank 3
[TensorRT-LLM][INFO] TP group is intra-node for rank 5
[TensorRT-LLM][INFO] TP group is intra-node for rank 2
[TensorRT-LLM][INFO] TP group is intra-node for rank 6
[TensorRT-LLM][INFO] TP group is intra-node for rank 1
[TensorRT-LLM][INFO] TP group is intra-node for rank 0
[TensorRT-LLM][INFO] TP group is intra-node for rank 4
[TensorRT-LLM][INFO] TP group is intra-node for rank 7
2025-03-19 10:35:57,860 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,873 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,873 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,901 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,902 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,903 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,904 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,905 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:36:12,844 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,873 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,893 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,922 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,942 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,991 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:13,042 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:13,093 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
Processed requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.09it/s]
[0] Prompt: 'Hello, my name is', Generated text: '\n\n# (  Doctor, holiness,1 _., and ,  cruz? ......\nAlright, my角和, ......\n\n.\nOkay\n\n,,\n ......\n  ,,\n  ,##Hello,,\n,,角和s<think>Fan  ......\n, ,  '
[1] Prompt: 'The president of the United States is', Generated text: ' the●● is角和000\\( isDonald1.. The,,。,000'
[2] Prompt: 'The capital of France is', Generated text: ' Paris. I Paris##, _ _ _\n\n\r\n\n  ur,,000ాన\n\nOkay: 角和:'
[3] Prompt: 'The future of AI is', Generated text: ' a000\n editor. AI _AIAI[\n\n#AI000AI культуры/Hr000000000000000\n\n>0065\r0000000000000000000000000000000000000\r\nUrls/Hr0000000000000000000角和000\n0000000000000000000000000000'

The error is as follows:

[0] Prompt: 'Hello, my name is', Generated text: '\n\n# (  Doctor, holiness,1 _., and ,  cruz? ......\nAlright, my角和, ......\n\n.\nOkay\n\n,,\n ......\n  ,,\n  ,##Hello,,\n,,角和s<think>Fan  ......\n, ,  '
[1] Prompt: 'The president of the United States is', Generated text: ' the●● is角和000\\( isDonald1.. The,,。,000'
[2] Prompt: 'The capital of France is', Generated text: ' Paris. I Paris##, _ _ _\n\n\r\n\n  ur,,000ాన\n\nOkay: 角和:'
[3] Prompt: 'The future of AI is', Generated text: ' a000\n editor. AI _AIAI[\n\n#AI000AI культуры/Hr000000000000000\n\n>0065\r0000000000000000000000000000000000000\r\nUrls/Hr0000000000000000000角和000\n0000000000000000000000000000'

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python quickstart_advanced.py --model_dir /mnt/newdisk/models/DeepSeek-R1 --tp_size 8

Expected behavior

N/A

actual behavior

N/A

additional notes

N/A

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions