-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
问题概述
下面这个 45 秒 MP3(来自 The KK Show 324 的片段)会触发 GLM-ASR-Nano 的大量重复输出。我把原始 MP3公开出来方便复现。
复现音频(45s MP3)
- 文件:
pod_822b3874_4956.mp3 - 下载链接:
https://gist.githubusercontent.com/dukebw/3aeab91243cafbed84399c1e08badead/raw/pod_822b3874_4956.mp3 - SHA256:
74dd2588a065e644e33e395693fc8a3aff1a9f8386e91c53554e8d72a07a7372 - 大小:576 KiB
复现步骤(单脚本自动下载音频)
如果你的 Python 缺少
_lzma(常见于 pyenv),请先执行:pip install backports.lzma
python - <<'PY'
import sys
import urllib.request
from pathlib import Path
AUDIO_URL = "https://gist.githubusercontent.com/dukebw/3aeab91243cafbed84399c1e08badead/raw/pod_822b3874_4956.mp3"
AUDIO_PATH = Path("pod_822b3874_4956.mp3")
# 处理缺失 lzma
try:
import lzma # noqa: F401
except Exception:
from backports import lzma as backports_lzma
sys.modules["lzma"] = backports_lzma
# 下载音频
if not AUDIO_PATH.exists() or AUDIO_PATH.stat().st_size == 0:
req = urllib.request.Request(AUDIO_URL, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(req) as resp, open(AUDIO_PATH, "wb") as f:
while True:
chunk = resp.read(1024 * 1024)
if not chunk:
break
f.write(chunk)
import torch
from transformers import AutoModelForSeq2SeqLM, AutoProcessor
model_id = "zai-org/GLM-ASR-Nano-2512"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, trust_remote_code=True)
if torch.cuda.is_available():
device = "cuda"
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"
model.to(device)
dtype = torch.float16 if device != "cpu" else torch.float32
inputs = processor.apply_transcription_request(str(AUDIO_PATH))
inputs = inputs.to(model.device, dtype=getattr(model, "dtype", None) or dtype)
with torch.inference_mode():
outputs = model.generate(**inputs, do_sample=False, max_new_tokens=512)
prompt_len = inputs["input_ids"].shape[1]
text = processor.batch_decode(outputs[:, prompt_len:], skip_special_tokens=True)
print(text[0] if text else "")
PY实际输出(节选)
…我宁愿不应该不应该这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样…
环境
- macOS(Apple Silicon)
- Python 3.12.12
- torch 2.8.0
- transformers 5.0.0.dev0
Metadata
Metadata
Assignees
Labels
No labels