-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugConfirmed bugsConfirmed bugs
Description
🐛 Bug
When I use mlc-llm to run models and process prompts in batches to get output responses, if I don’t set the max_tokens parameter, the models often gets stuck in an infinite loop error on some unspecified prompt. It’s worth noting that this issue frequently occurs across different models.
I can’t determine whether there’s an issue with the way I’m using mlc-llm to run the models, or if there’s a better method for me to batch process these prompts. Could anyone help me?
To Reproduce
Steps to reproduce the behavior:
- The script
import os
import json
import random
import numpy as np
import torch
from mlc_llm import MLCEngine
# -------------- 设置随机种子 --------------
seed = random.randint(0, 10000) # 生成一个随机种子
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
# -------------- 参数设置 --------------
batch_size = 10
model_dir = "/data/shenqingchao/zibo/LLM/Qwen-Series/Qwen2.5-3B-Instruct-q4f16_1-MLC"
model_lib = "/data/shenqingchao/zibo/libs/Qwen2.5-3B-Instruct-q4f16_1-cuda-T3.so"
# 采样参数
sampling_params = {
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 512
}
# -------------- 数据加载与过滤 --------------
# 加载整个数据集
with open('/data/shenqingchao/zibo/mlc-llm/experiment/S-Eval/S-Eval_attack_zh_v0.1.2.jsonl', 'r', encoding='utf-8') as f:
data = [json.loads(line) for line in f.readlines()]
# 指定需要处理的 category
# target_category = "positive_induction"
target_category = "instruction_jailbreak"
# 过滤出指定 category 的记录
filtered_data = []
for item in data:
try:
ext_obj = json.loads(item.get("ext", "{}"))
if ext_obj.get("category") == target_category:
filtered_data.append(item)
except Exception as e:
print("解析 ext 字段失败:", e)
print(f"总共符合 category '{target_category}' 的记录数量:{len(filtered_data)}")
# 取前 500 条记录
data = filtered_data[:500]
# -------------- 模型加载 --------------
# 使用MLC-LLM加载指定模型及对应模型库
engine = MLCEngine(
model=model_dir,
model_lib=model_lib
)
results = []
# -------------- 按批次处理数据 --------------
# 按 batch_size 切分数据批次
for i, batch in enumerate([data[k: k + batch_size] for k in range(0, len(data), batch_size)]):
# 提取每个样本中的问题文本(键为 prompt)
queries = [q['prompt'] for q in batch]
for j, question in enumerate(queries):
print(f"Question-{i * batch_size + j + 1}: {question}")
# 构造对话消息,包含系统和用户角色
messages_list = [
[
{"role": "system", "content": "你是一个人工智能助手。"},
{"role": "user", "content": f"{q}\n请使用中文回答"}
]
for q in queries
]
batch_results = []
for messages in messages_list:
full_response = ""
# 使用 streaming 模式生成回答,应用采样参数
for response in engine.chat.completions.create(
messages=messages,
model=model_dir,
stream=True,
temperature=sampling_params["temperature"],
top_p=sampling_params["top_p"],
# max_tokens=sampling_params["max_tokens"]
):
for choice in response.choices:
chunk = choice.delta.content
print(chunk, end="", flush=True)
full_response += chunk
print("\n") # 换行方便阅读
batch_results.append(full_response.strip())
# 保存当前批次的结果,每条记录包含 id, question, response
results.extend({
'id': len(results) + 1,
'question': question,
'response': response
} for question, response in zip(queries, batch_results))
# -------------- 结果保存 --------------
# 创建输出目录
os.makedirs('...', exist_ok=True)
output_file = "..."
with open(output_file, "w", encoding="utf-8") as f:
for line in results:
f.write(json.dumps(line, ensure_ascii=False) + '\n')
print(f"\n结果已保存到 {output_file}")
# 关闭MLC-LLM引擎
engine.terminate()
Bug behavior
......
Environment
- Platform (CUDA)
- Operating system (Ubuntu)
- Device (RTX 3090)
- How you installed MLC-LLM (pip install xxx.whl):
- How you installed TVM-Unity (pip):
- Python version (3.10):
Additional context
Metadata
Metadata
Assignees
Labels
bugConfirmed bugsConfirmed bugs