|
3 | 3 | This project provides a high-performance pipeline for **audio/video transcription**, **speaker diarization**, and **summarization** using [Faster-Whisper](https://github.com/guillaumekln/faster-whisper), Hugging Face LLMs (e.g. Mistral), and [pyannote.audio](https://github.com/pyannote/pyannote-audio). It exposes a **FastAPI-based REST API** and supports CLI usage as well. |
4 | 4 |
|
5 | 5 | --- |
| 6 | +## System Architecture |
| 7 | + |
| 8 | +The overall architecture consists of several key stages. First, audio is converted using ffmpeg and optionally denoised using a hybrid method combining Demucs (for structured background removal) and either noisereduce or DeepFilterNet (for static noise). Next, silence-aware chunking is applied using pydub to segment speech cleanly without breaking mid-sentence. The Whisper model then transcribes each chunk, optionally followed by speaker diarization using pyannote-audio. Finally, if summarization is enabled, an instruction-tuned LLM such as Mistral-7B generates concise and structured summaries. Outputs are written to .txt ,log and .json files, optionally embedded with speaker turns and summaries. |
| 9 | + |
6 | 10 |
|
7 | 11 | ## Features |
8 | 12 |
|
@@ -99,6 +103,65 @@ curl -X POST http://<YOUR_IP>:8000/transcribe \ |
99 | 103 | -F "max_speakers=2" |
100 | 104 | ``` |
101 | 105 |
|
| 106 | +### Start docker |
| 107 | + |
| 108 | +```bash |
| 109 | +docker run --gpus all -p 8000:8000 whisper_api |
| 110 | +``` |
| 111 | + |
| 112 | +### Example Docker Call |
| 113 | + |
| 114 | +```bash |
| 115 | +curl -X POST http://<YOUR_IP>:8000/transcribe \ |
| 116 | + |
| 117 | + -F "model=medium" \ |
| 118 | + -F "summary=true" \ |
| 119 | + -F "speaker=true" \ |
| 120 | + -F "denoise=false" \ |
| 121 | + -F "streaming=true" \ |
| 122 | + -F "hf_token=hf_xxx" \ |
| 123 | + -F "max_speakers=2" |
| 124 | +``` |
| 125 | + |
| 126 | +### Start Blueprint Deployment |
| 127 | +in the deploymen part of Blueprint, add a recipe suchas the following |
| 128 | +```bash |
| 129 | +{ |
| 130 | + "recipe_id": "whisper transcription", |
| 131 | + "recipe_mode": "service", |
| 132 | + "deployment_name": "whisper-transcription-a10", |
| 133 | + "recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:whisper_transcription_v2", |
| 134 | + "recipe_node_shape": "VM.GPU.A10.2", |
| 135 | + "recipe_replica_count": 1, |
| 136 | + "recipe_container_port": "8000", |
| 137 | + "recipe_nvidia_gpu_count": 2, |
| 138 | + "recipe_node_pool_size": 1, |
| 139 | + "recipe_node_boot_volume_size_in_gbs": 200, |
| 140 | + "recipe_ephemeral_storage_size": 100, |
| 141 | + "recipe_shared_memory_volume_size_limit_in_mb": 200 |
| 142 | +} |
| 143 | + |
| 144 | +``` |
| 145 | + |
| 146 | +### Example Blueprint |
| 147 | + |
| 148 | +```bash |
| 149 | +curl -L -X POST http://<YOUR_IP>:8000/transcribe \ |
| 150 | + |
| 151 | + -F "model=medium" \ |
| 152 | + -F "summary=true" \ |
| 153 | + -F "speaker=true" \ |
| 154 | + -F "denoise=false" \ |
| 155 | + -F "streaming=true" \ |
| 156 | + -F "hf_token=hf_xxx" \ |
| 157 | + -F "max_speakers=2" |
| 158 | +``` |
| 159 | +In case you want to see the streaming, you need to : |
| 160 | +```bash |
| 161 | +curl -N http://<YOUR_IP>:8000/stream_log/<YOUR_Log_Name>1.log |
| 162 | +``` |
| 163 | +once running the curl command you can see the log name printed |
| 164 | + |
102 | 165 | --- |
103 | 166 |
|
104 | 167 | ## Outputs |
|
0 commit comments