Skip to content

[Bug]: Kimi K2 produces (gibberish?) #9930

@Shang-Pin

Description

@Shang-Pin

System Info

-GPU name B200
TensorRT-LLM commit ff0ef19

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce the probelm
Here's the command to start server.

trtllm-serve serve /data/weights/modelopt-moonshotai--Kimi-K2-Thinking-FP4/001 --tp_size=4 --backend=pytorch --host=0.0.0.0 --port=8000 --max_seq_len=262144 --max_num_tokens=32768 --max_batch_size=96 --ep_size=4 --extra_llm_api_options=extra.yml --enable_chunked_prefill

extra.yml

print_iter_log: true
enable_iter_perf_stats: true
kv_cache_config:
  free_gpu_memory_fraction: 0.8
  event_buffer_max_size: 10000000
moe_config:
  backend: TRTLLM

Expected behavior

produces normal response

actual behavior

produces gibberish

curl "http://localhost:8000/v1/chat/completions"   -H "Content-Type: application/json"   -H "Authorization: Bearer "   -d '{
      "model": "",
      "messages": [
        {
          "role": "user",
          "content": "Write bfs in python."
        }
      ],"temperature":1
    }'

{"id":"chatcmpl-4d3ab8026c944db89b87ecb61edbf962","object":"chat.completion","created":1765328529,"model":"deepseek-ai/DeepSeek-V3.2","choices":[{"index":0,"message":{"role":"assistant","content":" The user is asking me to write a BFS (Breadth-First Search) implementation in Python. This is a common algorithm for graph traversal. I should provide a clear, well-commented implementation that demonstrates the key concepts of BFS.\n\nLet me think about what是重要的 to include:\n1. A clear explanation of BFS\n2. A practical implementation that works on a graph\n3. Use of a queue (collections.deque is perfect for this)\n4. Tracking visited nodes to avoid cycles\n5. Maybe show both tree/graph traversal and pathfinding versions\n6. Include an example graph to demonstrate usage\n\nI'll structure the response with:\n- Brief explanation of BFS\n- Implementation for graph traversal (simple version)\n- Implementation that also tracks paths/parent nodes\n- Example usage with a sample graph\n- Complexity analysis\n- Use cases\n\nLet me write the code:\n\npython\nfrom collections import deque\n\ndef bfs(graph, start):\n \"\"\"\n Perform BFS traversal on a graph.\n \n Args:\n graph: Dictionary representing adjacency list\n start: Starting node\n \n Returns:\n List of nodes in BFS order\n \"\"\"\n visited = set()\n 出土的 =萨德和()\n queue.append(start)\n visited.add(start)\n result = []\n \n while queue:\n node = queue.popleft()\n result.append(node)\n \n # Process neighbors\n for neighbor in graph[node]:\n if neighbor not in visited:\n visited.add(neighbor)\n queue.append(neighbor)\n \n return result\n\n# For finding shortest path\ndef bfs_shortest_path(graph, start, goal):\n \"\"\"\n Find shortest path between start and goal using BFS.\n \n Returns:\n Path as list of nodes, or None if no path exists\n \"\"\"\n queue = deque([[start]])\n visited = {start}\n \n if start == goal:\n return [start]\n \n while queue:\n path = queue.popleft()\n node = path[-1]\n \n for neighbor in graph[node]:\n if neighbor not in visited:\n new_path = list(path)\n new_path.append(neighbor)\n \n if neighbor == goal:\n return new_path\n \n visited.add(neighbor)\n queue.append(new_path)\n \n return None\n\n\nWait, I should also include a more robust versionlər that handles disconnected graphs and shows the level-by-level nature. Let me refine this.\n\nActually,薹上面代码已经不错,但我可以添加更多注释和变体。让我重新组织:\n\n1. 基本BFS遍历\n2. BFS查找 villa路径\n3. BFS分层遍历(记录每个节点的距离)\n4. 完整的示例\n\nI'll make sure to use deque for efficient queue operations, set for visited tracking, and provide clear examples. Here's a clean implementation of Breadth-First Search (BFS) in Python with explanations:\n\npython\nfrom collections import deque\n\ndef bfs_traversal(graph, start):\n \"\"\"\n Perform BFS traversal and return visited nodes in order.\n \n Args:\n graph: Dictionary representing adjacency list {node: [neighbors]}\n start: Starting node\n \n Returns:\n List of nodes in BFS order\n \"\"\"\n visited = set()\n queue = deque([start])\n visited.add(start)\n result = []\n \n while queue:\n node = queue.popleft()\n result.append(node)\n \n # Explore neighbors\n for neighbor in graph[node]:\n if neighbor not in visited:\n visited.add(neighbor看不出 queue.append(neighbor)\n \n return result\n\ndef bfs_shortest_path(graph, start, goal):\n \"\"\"\n Find shortest path from start to goal using BFS.\n \n Returns:\n List of nodes representing the shortest path,\n or None if no path exists\n \"\"\"\n queue = deque([[start]])\n visitedLanding = {起}\n \n while queue on\nDon't forget to handle the case where start == goal:\n if start == goal:\n return [start Vernon being seen forwarded queue = deque([[步就]])\n visited = {力}\n \n while queue:\n path =肢体语言.popleft()\n node =我们可以[-1]\n \n if node == goal:\n return网游\n \n for neighbor in graph[node]:\n if neighbor not in visited:\n visited.add(neighbor)\n new_path = list(path)\n new_path.append(neighbor)\n queue.append(new_path)\n \n return None\n\n# Example usage\nif __name__ == \"__main__\":\n # Sample graph (adjacency list)\n graph = {\n 'A': ['出', 'C'],\n 'B': ['A', 'D', 'E'],\n 'C': ['A', 'F'],\n ' clumsiness': [],\n 'E': ['B', 'F'],\n 'F': ['C', 'E']\n }\n \n print(\"BFS Traversal starting from A:\")\n投标人 printeasygoing_bfs_traversal(graph, 'A')) # Output: ['A', 'B', 'C', 'D', 'E', 'F']\n \n print(\"\\nShortest path from A to F:\")\n path = bfs_shortest_path(graph, 'A', 'F')\n print(path) # Output: ['A', 'C', 'F']\n \n print(\"\\\\nShortest path from A内蒙古自治区 D gigs not exist:\")\n path +招标文件.bfs_shortest_path(graph, 'A', 'H')\n print(path_notifications) # Output: None\n\n\n## Key Features:\n\n1. Uses collections.deque:真的没有 O(1) 的 popleft() 操作 podium队列\n2. Hash有多个set跟踪已访问节点: 防止重复访问和无限循环\n3强 ** adjacency list表示**: 高效且 Python货比三家\n4. Two variants:\n - bfs_traversal: Simple traversal returning visit order\n - bfs_shortest_path: Finds shortest path (fewest edges)\n\n## Complexity:\n- Time: O(V + E) where V is vertices and E is edges\n- Space: O(V) for queue and visited set<|im_end|>","reasoning_content":"","reasoning":null,"tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null,"mm_embedding_handle":null,"disaggregated_params":null,"avg_decoded_tokens_per_iter":1.0}],"usage":{"prompt_tokens":13,"total_tokens":1302,"completion_tokens":1289,"prompt_tokens_details":{"cached_tokens":12}},"prompt_token_ids":null}


### additional notes

I have tried switching moe_backend to CUTLASS or turning off fuse_routing_kernel. It is the same output.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

Decoding/Sampling<NV>Token sampling algorithms in TRTLLM for text gen (top-k, top-p, beam).bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions