Skip to content

Commit 2c9a5ba

Browse files
authored
add speech transcription doc (#107)
1 parent c1159e7 commit 2c9a5ba

File tree

4 files changed

+146
-2
lines changed

4 files changed

+146
-2
lines changed

docs/.vuepress/notes/en/guide.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ export const Guide: ThemeNote = defineNoteConfig({
3030
'prompted_vqa',
3131
'mathquestion_extract',
3232
'knowledge_cleaning',
33-
'quick_general_text_evaluation'
33+
'quick_general_text_evaluation',
34+
'speech_transcription',
3435
],
3536
},
3637
// {

docs/.vuepress/notes/zh/guide.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ export const Guide: ThemeNote = defineNoteConfig({
3030
"prompted_vqa",
3131
"mathquestion_extract",
3232
'knowledge_cleaning',
33-
'quick_general_text_evaluation'
33+
'quick_general_text_evaluation',
34+
'speech_transcription',
3435
],
3536
},
3637
// {
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: Case 9. Speech transcription
3+
createTime: 2025/08/22 16:38:49
4+
permalink: /en/guide/5pdipkiv/
5+
icon: fad:headphones
6+
---
7+
8+
9+
This example demonstrates how to use the SpeechTranscriptor operator for speech-to-text transcription.
10+
11+
## Speech Transcription
12+
### Step 1: Install the Dataflow Environment
13+
```bash
14+
pip install open-dataflow[vllm]
15+
```
16+
17+
### Step 2: Create a New Dataflow Working Directory
18+
```shell
19+
mkdir run_dataflow
20+
cd run_dataflow
21+
```
22+
23+
### Step 3: Initialize Dataflow
24+
```shell
25+
dataflow init
26+
```
27+
After this step, you should see:
28+
```shell
29+
run_dataflow/gpu_pipelines/speechtranscription_pipeline.py
30+
```
31+
32+
### Step 4: Prepare the data to be translated.
33+
```python
34+
self.storage = FileStorage(
35+
first_entry_file_name="../example_data/SpeechTranscription/pipeline_speechtranscription.jsonl", # your data path
36+
cache_path="./cache",
37+
file_name_prefix="dataflow_cache_step",
38+
cache_type="jsonl",
39+
)
40+
```
41+
42+
Data format is as follows
43+
```jsonl
44+
{"raw_content": "../example_data/SpeechTranscription/audio/test.wav"}
45+
{"raw_content": "https://raw.githubusercontent.com/FireRedTeam/FireRedASR/main/examples/wav/IT0011W0001.wav"}
46+
```
47+
48+
### Step 5: Launch serving
49+
```python
50+
self.llm_serving = LocalModelLALMServing_vllm(
51+
hf_model_name_or_path='Qwen/Qwen2-Audio-7B-Instruct', # your model path
52+
vllm_tensor_parallel_size=4,
53+
vllm_max_tokens=8192,
54+
)
55+
```
56+
57+
### Step 6: Speech transcription operator
58+
```python
59+
self.speech_transcriptor = SpeechTranscriptor(
60+
llm_serving = self.llm_serving,
61+
system_prompt="You are a professional translator; your task is to transcribe speech into text and then translate it into English." # model system prompt
62+
)
63+
```
64+
65+
### Step 7: Run the operator
66+
```python
67+
self.speech_transcriptor.run(
68+
storage=self.storage.step(),
69+
input_key="raw_content"
70+
)
71+
```
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: 案例9. 语音转文字
3+
createTime: 2025/08/22 16:37:30
4+
permalink: /zh/guide/du2akut8/
5+
icon: fad:headphones
6+
---
7+
8+
9+
本示例展示使用SpeechTranscriptor算子进行语音转文字。
10+
11+
## 语音转文字
12+
### 第一步:安装DataFlow环境
13+
```bash
14+
pip install open-dataflow[vllm]
15+
```
16+
17+
### 第二步:创建新的dataflow工作文件夹
18+
```shell
19+
mkdir run_dataflow
20+
cd run_dataflow
21+
```
22+
23+
### 第三步:初始化Dataflow
24+
```shell
25+
dataflow init
26+
```
27+
这时你会看见
28+
```shell
29+
run_dataflow/gpu_pipelines/speechtranscription_pipeline.py
30+
```
31+
32+
### 第四步:准备需要翻译的数据
33+
```python
34+
self.storage = FileStorage(
35+
first_entry_file_name="../example_data/SpeechTranscription/pipeline_speechtranscription.jsonl", # 数据路径写在这里
36+
cache_path="./cache",
37+
file_name_prefix="dataflow_cache_step",
38+
cache_type="jsonl",
39+
)
40+
```
41+
42+
数据格式如下
43+
```jsonl
44+
{"raw_content": "../example_data/SpeechTranscription/audio/test.wav"}
45+
{"raw_content": "https://raw.githubusercontent.com/FireRedTeam/FireRedASR/main/examples/wav/IT0011W0001.wav"}
46+
```
47+
48+
### 第五步:启动serving
49+
```python
50+
self.llm_serving = LocalModelLALMServing_vllm(
51+
hf_model_name_or_path='Qwen/Qwen2-Audio-7B-Instruct', # 填入模型路径
52+
vllm_tensor_parallel_size=4,
53+
vllm_max_tokens=8192,
54+
)
55+
```
56+
57+
### 第六步:语音转文字算子
58+
```python
59+
self.speech_transcriptor = SpeechTranscriptor(
60+
llm_serving = self.llm_serving,
61+
system_prompt="你是一个专业的翻译员,你需要将语音转录为文本。" # 模型系统提示词
62+
)
63+
```
64+
65+
### 第七步:执行算子
66+
```python
67+
self.speech_transcriptor.run(
68+
storage=self.storage.step(),
69+
input_key="raw_content"
70+
)
71+
```

0 commit comments

Comments
 (0)