name	faster-whisper-zh
description	本地中文语音识别（faster-whisper），无需API key，CPU即可运行。适用于语音消息转文字、会议记录、播客转录等场景。支持多种音频格式(.wav, .mp3, .ogg, .m4a, .flac)和输出格式(文本/JSON/SRT字幕)。使用CTranslate2后端优化，比原版Whisper更快。

本地中文语音识别 (faster-whisper)

使用faster-whisper实现快速、准确的中文语音转文字。无需API密钥，支持本地CPU运行，特别优化了中文识别效果。

快速开始

单文件转录

# 基础用法
scripts/transcribe.sh voice.ogg

# 指定模型和格式
scripts/transcribe.sh voice.mp3 --model medium --format json

# 保存到文件
scripts/transcribe.sh voice.wav --output transcript.txt

批量处理

# 批量转录目录中的音频文件
scripts/transcribe.sh --batch /path/to/audio/files --format srt

# 批量处理并保存到指定目录
scripts/transcribe.sh --batch ./audio --output ./transcripts --format json

主要功能

1. 多格式支持

输入: WAV, MP3, OGG, M4A, FLAC, AAC
输出: 纯文本, JSON (带时间戳), SRT字幕

2. 中文优化

默认中文语言模型，识别准确率高
支持自动语言检测
优化了中文分词和标点符号

3. 灵活配置

5种模型大小：tiny, base, small, medium, large-v3
CPU/GPU计算支持
可调计算精度：int8/float16/float32

模型选择指南

模型	文件大小	速度	准确度	适用场景
tiny	~40MB	最快	较低	快速预览
base	~140MB	快	良好	推荐日常使用
small	~460MB	中等	很好	高质量转录
medium	~1.4GB	较慢	优秀	专业场景
large-v3	~2.9GB	最慢	最佳	最高精度要求

输出格式示例

文本格式 (默认)

今天天气很好，我们一起去公园散步吧。

JSON格式 (包含时间戳)

{
  "language": "zh",
  "language_probability": 0.98,
  "duration": 3.2,
  "segments": [
    {
      "start": 0.0,
      "end": 3.2,
      "text": "今天天气很好，我们一起去公园散步吧。"
    }
  ]
}

SRT字幕格式

1
00:00:00,000 --> 00:00:03,200
今天天气很好，我们一起去公园散步吧。

常见用例

语音消息转文字

scripts/transcribe.sh wechat_voice.ogg --format text

会议录音转录

scripts/transcribe.sh meeting.wav --model medium --format json --output meeting_transcript.json

播客制作字幕

scripts/transcribe.sh podcast.mp3 --format srt --output podcast.srt

批量处理语音文件

scripts/transcribe.sh --batch ./voice_messages --format text --output ./transcripts

安装和依赖

脚本会自动检查和安装依赖：

Python 3.8+
faster-whisper
必要的系统库

首次运行时会自动下载和缓存模型文件到 ~/.cache/huggingface/hub/。

性能优化

CPU优化

使用 --compute-type int8 (默认) 获得最佳速度
较大模型建议使用 --compute-type float16

GPU加速

scripts/transcribe.sh voice.mp3 --device cuda --compute-type float16

批量处理优化

批量模式会复用已加载的模型，比单独处理每个文件快很多。

故障排除

模型下载慢

模型文件会自动从Hugging Face下载，如果网络慢可以：

使用更小的模型 (tiny/base)
配置代理或镜像源

内存不足

使用更小的模型
降低计算精度 --compute-type int8
确保没有其他大型程序占用内存

识别质量差

尝试更大的模型 (medium/large-v3)
检查音频质量，确保声音清晰
对于非中文音频，指定正确的语言代码

Scripts

transcribe.sh

Bash包装脚本，自动处理依赖安装和环境检查。推荐日常使用。

transcribe.py

Python核心脚本，提供完整的转录功能。支持单文件和批量处理，多种输出格式。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

本地中文语音识别 (faster-whisper)

快速开始

单文件转录

批量处理

主要功能

1. 多格式支持

2. 中文优化

3. 灵活配置

模型选择指南

输出格式示例

文本格式 (默认)

JSON格式 (包含时间戳)

SRT字幕格式

常见用例

语音消息转文字

会议录音转录

播客制作字幕

批量处理语音文件

安装和依赖

性能优化

CPU优化

GPU加速

批量处理优化

故障排除

模型下载慢

内存不足

识别质量差

Scripts

transcribe.sh

transcribe.py

FilesExpand file tree

SKILL.md

Latest commit

History

SKILL.md

File metadata and controls

本地中文语音识别 (faster-whisper)

快速开始

单文件转录

批量处理

主要功能

1. 多格式支持

2. 中文优化

3. 灵活配置

模型选择指南

输出格式示例

文本格式 (默认)

JSON格式 (包含时间戳)

SRT字幕格式

常见用例

语音消息转文字

会议录音转录

播客制作字幕

批量处理语音文件

安装和依赖

性能优化

CPU优化

GPU加速

批量处理优化

故障排除

模型下载慢

内存不足

识别质量差

Scripts

transcribe.sh

transcribe.py