Skip to content

Commit a5baaba

Browse files
committed
添加测试音频文件
1 parent 6510895 commit a5baaba

File tree

5 files changed

+130
-2
lines changed

5 files changed

+130
-2
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
__pycache__
2+
logs
3+
.voiceprint.yaml

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@
186186
same "printed page" as the copyright notice for easier
187187
identification within third-party archives.
188188

189-
Copyright [yyyy] [name of copyright owner]
189+
Copyright 2025 xinnan-tech
190190

191191
Licensed under the Apache License, Version 2.0 (the "License");
192192
you may not use this file except in compliance with the License.

README.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,64 @@
11
# voiceprint-api
2-
声纹识别api
2+
3+
基于3D-Speaker的声纹识别API服务
4+
5+
## 项目简介
6+
7+
本项目是一个基于FastAPI开发的声纹识别HTTP服务,使用3D-Speaker模型实现声纹识别功能。支持声纹注册和识别功能,并提供完整的API文档。
8+
9+
目前用于[xiaozhi-esp32-server](https://github.com/xinnan-tech/xiaozhi-esp32-server)项目,识别小智设备说话人
10+
11+
## 主要功能
12+
13+
1. 声纹注册
14+
- 输入:说话人ID和声音WAV文件
15+
- 输出:注册成功状态
16+
17+
2. 声纹识别
18+
- 输入:可能的说话人ID列表(逗号分隔)和声音WAV文件
19+
- 输出:识别到的说话人ID(未识别则返回空)
20+
21+
## 技术栈
22+
23+
- FastAPI:Web框架
24+
- 3D-Speaker:声纹识别模型
25+
- MySQL:数据存储
26+
27+
## 安装说明
28+
29+
1. 克隆项目
30+
```bash
31+
git clone https://github.com/xinnan-tech/voiceprint-api.git
32+
cd voiceprint-api
33+
```
34+
35+
2. 安装依赖
36+
```bash
37+
conda remove -n voiceprint-api --all -y
38+
conda create -n voiceprint-api python=3.11 -y
39+
conda activate voiceprint-api
40+
41+
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
42+
pip install -r requirements.txt
43+
```
44+
45+
3. 配置数据库
46+
- 创建数据库
47+
```
48+
CREATE DATABASE voiceprint_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
49+
```
50+
- 创建数据表
51+
```
52+
CREATE TABLE voiceprints (
53+
id INT AUTO_INCREMENT PRIMARY KEY,
54+
speaker_id VARCHAR(50) UNIQUE,
55+
feature_vector LONGBLOB NOT NULL,
56+
INDEX idx_speaker_id (speaker_id)
57+
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
58+
```
59+
- 复制 `voiceprint.yaml``data/.voiceprint.yaml`
60+
61+
4. 启动
62+
```
63+
python app.py
64+
```

app.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import numpy as np
2+
import torch
3+
from modelscope.pipelines import pipeline
4+
from modelscope.utils.constant import Tasks
5+
6+
# 初始化
7+
sv_pipeline = pipeline(
8+
task=Tasks.speaker_verification, model="iic/speech_campplus_sv_zh-cn_3dspeaker_16k"
9+
)
10+
11+
voiceprints = {}
12+
13+
14+
def _to_numpy(x):
15+
return x.cpu().numpy() if torch.is_tensor(x) else np.asarray(x)
16+
17+
18+
def register_voiceprint(name, audio_path):
19+
"""登记声纹特征"""
20+
result = sv_pipeline([audio_path], output_emb=True)
21+
emb = _to_numpy(result["embs"][0]) # 1 条音频只取第 0 条
22+
voiceprints[name] = emb
23+
print(f"已登记: {name}")
24+
25+
26+
def identify_speaker(audio_path):
27+
"""识别声纹所属"""
28+
test_result = sv_pipeline([audio_path], output_emb=True)
29+
test_emb = _to_numpy(test_result["embs"][0])
30+
31+
similarities = {}
32+
for name, emb in voiceprints.items():
33+
cos_sim = np.dot(test_emb, emb) / (
34+
np.linalg.norm(test_emb) * np.linalg.norm(emb)
35+
)
36+
similarities[name] = cos_sim
37+
38+
match_name = max(similarities, key=similarities.get)
39+
return match_name, similarities[match_name], similarities
40+
41+
42+
if __name__ == "__main__":
43+
register_voiceprint("max_output_size", "test//test0.wav")
44+
register_voiceprint("tts1", "test//test1.wav")
45+
46+
test_file = "test//test2.wav"
47+
match_name, match_score, all_scores = identify_speaker(test_file)
48+
49+
print(f"\n识别结果: {test_file} 属于 {match_name}")
50+
print(f"匹配分数: {match_score:.4f}")
51+
print("\n所有声纹对比分数:")
52+
for name, score in all_scores.items():
53+
print(f"{name}: {score:.4f}")

requirements.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
modelscope==1.11.0
2+
datasets==2.14.5
3+
numpy==1.23.5
4+
packaging==21.3
5+
addict==2.4.0
6+
transformers==4.52.4
7+
torch==2.2.2
8+
sentencepiece==0.2.0
9+
soundfile==0.13.1
10+
torchaudio==2.2.2

0 commit comments

Comments
 (0)