Skip to content

GLM-ASR 在 MP3 上出现重复输出(附 45s mp3) #26

@dukebw

Description

@dukebw

问题概述

下面这个 45 秒 MP3(来自 The KK Show 324 的片段)会触发 GLM-ASR-Nano 的大量重复输出。我把原始 MP3公开出来方便复现。

复现音频(45s MP3)

复现步骤(单脚本自动下载音频)

如果你的 Python 缺少 _lzma(常见于 pyenv),请先执行:pip install backports.lzma

python - <<'PY'
import sys
import urllib.request
from pathlib import Path

AUDIO_URL = "https://gist.githubusercontent.com/dukebw/3aeab91243cafbed84399c1e08badead/raw/pod_822b3874_4956.mp3"
AUDIO_PATH = Path("pod_822b3874_4956.mp3")

# 处理缺失 lzma
try:
    import lzma  # noqa: F401
except Exception:
    from backports import lzma as backports_lzma
    sys.modules["lzma"] = backports_lzma

# 下载音频
if not AUDIO_PATH.exists() or AUDIO_PATH.stat().st_size == 0:
    req = urllib.request.Request(AUDIO_URL, headers={"User-Agent": "Mozilla/5.0"})
    with urllib.request.urlopen(req) as resp, open(AUDIO_PATH, "wb") as f:
        while True:
            chunk = resp.read(1024 * 1024)
            if not chunk:
                break
            f.write(chunk)

import torch
from transformers import AutoModelForSeq2SeqLM, AutoProcessor

model_id = "zai-org/GLM-ASR-Nano-2512"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, trust_remote_code=True)

if torch.cuda.is_available():
    device = "cuda"
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

model.to(device)
dtype = torch.float16 if device != "cpu" else torch.float32

inputs = processor.apply_transcription_request(str(AUDIO_PATH))
inputs = inputs.to(model.device, dtype=getattr(model, "dtype", None) or dtype)

with torch.inference_mode():
    outputs = model.generate(**inputs, do_sample=False, max_new_tokens=512)

prompt_len = inputs["input_ids"].shape[1]
text = processor.batch_decode(outputs[:, prompt_len:], skip_special_tokens=True)
print(text[0] if text else "")
PY

实际输出(节选)

…我宁愿不应该不应该这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样这样…

环境

  • macOS(Apple Silicon)
  • Python 3.12.12
  • torch 2.8.0
  • transformers 5.0.0.dev0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions