Skip to content

Commit 3013af8

Browse files
authored
Update README.md
1 parent 7ac85ef commit 3013af8

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

docs/whisper_transcription/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@
33
This project provides a high-performance pipeline for **audio/video transcription**, **speaker diarization**, and **summarization** using [Faster-Whisper](https://github.com/guillaumekln/faster-whisper), Hugging Face LLMs (e.g. Mistral), and [pyannote.audio](https://github.com/pyannote/pyannote-audio). It exposes a **FastAPI-based REST API** and supports CLI usage as well.
44

55
---
6+
## System Architecture
7+
8+
The overall architecture consists of several key stages. First, audio is converted using ffmpeg and optionally denoised using a hybrid method combining Demucs (for structured background removal) and either noisereduce or DeepFilterNet (for static noise). Next, silence-aware chunking is applied using pydub to segment speech cleanly without breaking mid-sentence. The Whisper model then transcribes each chunk, optionally followed by speaker diarization using pyannote-audio. Finally, if summarization is enabled, an instruction-tuned LLM such as Mistral-7B generates concise and structured summaries. Outputs are written to .txt ,log and .json files, optionally embedded with speaker turns and summaries.
9+
![image](https://github.com/user-attachments/assets/6a8b55f0-9de5-46e9-9ef0-80e904f61a7d)
610

711
## Features
812

@@ -99,6 +103,65 @@ curl -X POST http://<YOUR_IP>:8000/transcribe \
99103
-F "max_speakers=2"
100104
```
101105

106+
### Start docker
107+
108+
```bash
109+
docker run --gpus all -p 8000:8000 whisper_api
110+
```
111+
112+
### Example Docker Call
113+
114+
```bash
115+
curl -X POST http://<YOUR_IP>:8000/transcribe \
116+
117+
-F "model=medium" \
118+
-F "summary=true" \
119+
-F "speaker=true" \
120+
-F "denoise=false" \
121+
-F "streaming=true" \
122+
-F "hf_token=hf_xxx" \
123+
-F "max_speakers=2"
124+
```
125+
126+
### Start Blueprint Deployment
127+
in the deploymen part of Blueprint, add a recipe suchas the following
128+
```bash
129+
{
130+
"recipe_id": "whisper transcription",
131+
"recipe_mode": "service",
132+
"deployment_name": "whisper-transcription-a10",
133+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:whisper_transcription_v2",
134+
"recipe_node_shape": "VM.GPU.A10.2",
135+
"recipe_replica_count": 1,
136+
"recipe_container_port": "8000",
137+
"recipe_nvidia_gpu_count": 2,
138+
"recipe_node_pool_size": 1,
139+
"recipe_node_boot_volume_size_in_gbs": 200,
140+
"recipe_ephemeral_storage_size": 100,
141+
"recipe_shared_memory_volume_size_limit_in_mb": 200
142+
}
143+
144+
```
145+
146+
### Example Blueprint
147+
148+
```bash
149+
curl -L -X POST http://<YOUR_IP>:8000/transcribe \
150+
151+
-F "model=medium" \
152+
-F "summary=true" \
153+
-F "speaker=true" \
154+
-F "denoise=false" \
155+
-F "streaming=true" \
156+
-F "hf_token=hf_xxx" \
157+
-F "max_speakers=2"
158+
```
159+
In case you want to see the streaming, you need to :
160+
```bash
161+
curl -N http://<YOUR_IP>:8000/stream_log/<YOUR_Log_Name>1.log
162+
```
163+
once running the curl command you can see the log name printed
164+
102165
---
103166

104167
## Outputs

0 commit comments

Comments
 (0)