Skip to content

[Bug] The response often gets trapped in an infinite loop error when running models using mlc-llm #3324

@FFchopon

Description

@FFchopon

🐛 Bug

When I use mlc-llm to run models and process prompts in batches to get output responses, if I don’t set the max_tokens parameter, the models often gets stuck in an infinite loop error on some unspecified prompt. It’s worth noting that this issue frequently occurs across different models.

I can’t determine whether there’s an issue with the way I’m using mlc-llm to run the models, or if there’s a better method for me to batch process these prompts. Could anyone help me?

To Reproduce

Steps to reproduce the behavior:

  1. The script
import os
import json
import random
import numpy as np
import torch
from mlc_llm import MLCEngine

# -------------- 设置随机种子 --------------
seed = random.randint(0, 10000)  # 生成一个随机种子
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

# -------------- 参数设置 --------------
batch_size = 10
model_dir = "/data/shenqingchao/zibo/LLM/Qwen-Series/Qwen2.5-3B-Instruct-q4f16_1-MLC"
model_lib = "/data/shenqingchao/zibo/libs/Qwen2.5-3B-Instruct-q4f16_1-cuda-T3.so"

# 采样参数
sampling_params = {
    "temperature": 0.7,
    "top_p": 0.9,
    "max_tokens": 512
}

# -------------- 数据加载与过滤 --------------
# 加载整个数据集
with open('/data/shenqingchao/zibo/mlc-llm/experiment/S-Eval/S-Eval_attack_zh_v0.1.2.jsonl', 'r', encoding='utf-8') as f:
    data = [json.loads(line) for line in f.readlines()]

# 指定需要处理的 category
# target_category = "positive_induction"
target_category = "instruction_jailbreak"

# 过滤出指定 category 的记录
filtered_data = []
for item in data:
    try:
        ext_obj = json.loads(item.get("ext", "{}"))
        if ext_obj.get("category") == target_category:
            filtered_data.append(item)
    except Exception as e:
        print("解析 ext 字段失败:", e)

print(f"总共符合 category '{target_category}' 的记录数量:{len(filtered_data)}")

# 取前 500 条记录
data = filtered_data[:500]

# -------------- 模型加载 --------------
# 使用MLC-LLM加载指定模型及对应模型库
engine = MLCEngine(
    model=model_dir,
    model_lib=model_lib
)

results = []

# -------------- 按批次处理数据 --------------
# 按 batch_size 切分数据批次
for i, batch in enumerate([data[k: k + batch_size] for k in range(0, len(data), batch_size)]):
    # 提取每个样本中的问题文本(键为 prompt)
    queries = [q['prompt'] for q in batch]

    for j, question in enumerate(queries):
        print(f"Question-{i * batch_size + j + 1}: {question}")

    # 构造对话消息,包含系统和用户角色
    messages_list = [
        [
            {"role": "system", "content": "你是一个人工智能助手。"},
            {"role": "user", "content": f"{q}\n请使用中文回答"}
        ]
        for q in queries
    ]

    batch_results = []
    for messages in messages_list:
        full_response = ""
        # 使用 streaming 模式生成回答,应用采样参数
        for response in engine.chat.completions.create(
            messages=messages,
            model=model_dir,
            stream=True,
            temperature=sampling_params["temperature"],
            top_p=sampling_params["top_p"],
            # max_tokens=sampling_params["max_tokens"]
        ):
            for choice in response.choices:
                chunk = choice.delta.content
                print(chunk, end="", flush=True)
                full_response += chunk
        print("\n")  # 换行方便阅读
        batch_results.append(full_response.strip())

    # 保存当前批次的结果,每条记录包含 id, question, response
    results.extend({
        'id': len(results) + 1,
        'question': question,
        'response': response
    } for question, response in zip(queries, batch_results))

# -------------- 结果保存 --------------
# 创建输出目录
os.makedirs('...', exist_ok=True)
output_file = "..." 
with open(output_file, "w", encoding="utf-8") as f:
    for line in results:
        f.write(json.dumps(line, ensure_ascii=False) + '\n')

print(f"\n结果已保存到 {output_file}")

# 关闭MLC-LLM引擎
engine.terminate()

Bug behavior

Image

Image

......

Environment

  • Platform (CUDA)
  • Operating system (Ubuntu)
  • Device (RTX 3090)
  • How you installed MLC-LLM (pip install xxx.whl):
  • How you installed TVM-Unity (pip):
  • Python version (3.10):

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions