-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
bugSomething isn't workingSomething isn't workingtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
System Info
Hi @kaiyux
I encountered an issue and need your help.
I followed the documentation requirements and used pytorch backend to test Deepseek R1 on an H20 machine. There were no errors during the process, but garbled characters were generated at the end.
My system configuration and software versions are as follows:
GPU: 8 * H20(96Gb)
TensorRT_LLM: Building from Source Code on Linux, The version is 0.19.0.dev2025031800
The command used to reproduce the error was:
python quickstart_advanced.py --model_dir ${MyDir}/DeepSeek-R1 --tp_size 8
And got these logs:
2025-03-19 09:35:03,175 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
Using MpiPoolSession to spawn MPI processes
2025-03-19 09:35:14,136 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,136 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,213 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,217 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,225 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,225 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,233 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
2025-03-19 09:35:14,238 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM] TensorRT-LLM version: 0.19.0.dev2025031800
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.30s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.47s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3576.37s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.67s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.70s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.71s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.74s
Loading weights: 100%|██████████| 1640/1640 [58:23<00:00, 2.14s/it]
Model init total -- 3575.67s
2025-03-19 10:34:56,390 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:15,008 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,126 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,128 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,130 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-19 10:35:54,141 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,192 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,243 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,292 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,343 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,395 - INFO - flashinfer.jit: Finished loading JIT ops: norm
2025-03-19 10:35:54,446 - INFO - flashinfer.jit: Finished loading JIT ops: norm
[TensorRT-LLM][INFO] Detecting local TP group for rank 2
[TensorRT-LLM][INFO] Detecting local TP group for rank 5
[TensorRT-LLM][INFO] Detecting local TP group for rank 0
[TensorRT-LLM][INFO] Detecting local TP group for rank 7
[TensorRT-LLM][INFO] Detecting local TP group for rank 4
[TensorRT-LLM][INFO] Detecting local TP group for rank 3
[TensorRT-LLM][INFO] Detecting local TP group for rank 6
[TensorRT-LLM][INFO] Detecting local TP group for rank 1
[TensorRT-LLM][INFO] TP group is intra-node for rank 3
[TensorRT-LLM][INFO] TP group is intra-node for rank 5
[TensorRT-LLM][INFO] TP group is intra-node for rank 2
[TensorRT-LLM][INFO] TP group is intra-node for rank 6
[TensorRT-LLM][INFO] TP group is intra-node for rank 1
[TensorRT-LLM][INFO] TP group is intra-node for rank 0
[TensorRT-LLM][INFO] TP group is intra-node for rank 4
[TensorRT-LLM][INFO] TP group is intra-node for rank 7
2025-03-19 10:35:57,860 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,873 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,873 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,901 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,902 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,903 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,904 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:35:57,905 - INFO - flashinfer.jit: Loading JIT ops: silu_and_mul
2025-03-19 10:36:12,844 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,873 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,893 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,922 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,942 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:12,991 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:13,042 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
2025-03-19 10:36:13,093 - INFO - flashinfer.jit: Finished loading JIT ops: silu_and_mul
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.03 GiB for max tokens in paged KV cache (76800).
Processed requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.09it/s]
[0] Prompt: 'Hello, my name is', Generated text: '\n\n# ( Doctor, holiness,1 _., and , cruz? ......\nAlright, my角和, ......\n\n.\nOkay\n\n,,\n ......\n ,,\n ,##Hello,,\n,,角和s<think>Fan ......\n, , '
[1] Prompt: 'The president of the United States is', Generated text: ' the●● is角和000\\( isDonald1.. The,,。,000'
[2] Prompt: 'The capital of France is', Generated text: ' Paris. I Paris##, _ _ _\n\n\r\n\n ur,,000ాన\n\nOkay: 角和:'
[3] Prompt: 'The future of AI is', Generated text: ' a000\n editor. AI _AIAI[\n\n#AI000AI культуры/Hr000000000000000\n\n>0065\r0000000000000000000000000000000000000\r\nUrls/Hr0000000000000000000角和000\n0000000000000000000000000000'
The error is as follows:
[0] Prompt: 'Hello, my name is', Generated text: '\n\n# ( Doctor, holiness,1 _., and , cruz? ......\nAlright, my角和, ......\n\n.\nOkay\n\n,,\n ......\n ,,\n ,##Hello,,\n,,角和s<think>Fan ......\n, , '
[1] Prompt: 'The president of the United States is', Generated text: ' the●● is角和000\\( isDonald1.. The,,。,000'
[2] Prompt: 'The capital of France is', Generated text: ' Paris. I Paris##, _ _ _\n\n\r\n\n ur,,000ాన\n\nOkay: 角和:'
[3] Prompt: 'The future of AI is', Generated text: ' a000\n editor. AI _AIAI[\n\n#AI000AI культуры/Hr000000000000000\n\n>0065\r0000000000000000000000000000000000000\r\nUrls/Hr0000000000000000000角和000\n0000000000000000000000000000'
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
python quickstart_advanced.py --model_dir /mnt/newdisk/models/DeepSeek-R1 --tp_size 8
Expected behavior
N/A
actual behavior
N/A
additional notes
N/A
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers