Skip to content

Commit acc850f

Browse files
fix whisper transcription readme and remove unneeded file
1 parent 68f6ce3 commit acc850f

File tree

2 files changed

+50
-39
lines changed

2 files changed

+50
-39
lines changed

consolidated_bluperints_2.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/sample_blueprints/other/whisper_transcription/README.md

Lines changed: 50 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,75 @@
11
# Whisper Transcription API
22

3-
### Transcription + Summarization + Diarization Pipeline (FastAPI-powered)
3+
#### Transcription + Summarization + Diarization Pipeline (FastAPI-powered)
44

55
This blueprint provides a complete solution for running **audio/video transcription**, **speaker diarization**, and **summarization** via a RESTful API. It integrates [Faster-Whisper](https://github.com/guillaumekln/faster-whisper) for efficient transcription, [pyannote.audio](https://github.com/pyannote/pyannote-audio) for diarization, and Hugging Face instruction-tuned LLMs (e.g., Mistral-7B) for summarization. It supports multi-GPU acceleration, real-time streaming logs, and JSON/text output formats.
66

77
---
8+
89
## Pre-Filled Samples
910

1011
Below are pre-configured blueprints for deploying Whisper transcription using different GPU configurations on Oracle Cloud Infrastructure.
1112

12-
| Feature Showcase Title | Description | Blueprint File |
13-
|----------------------------------------------------------------------|-----------------------------------------------------------------------|-----------------------------------|
14-
| Deploy Whisper transcription on A10 GPU for real-time speech-to-text | Real-time audio transcription with Whisper on BM.GPU.A10.8 | [whisper-transcription-A10.json](whisper-transcription-A10.json) |
15-
| Deploy Whisper transcription on A100 GPU for high-speed processing | High-performance Whisper transcription using BM.GPU.A100.8 | [whisper-transcription-A100.json](whisper-transcription-A100.json) |
16-
| Deploy Whisper transcription on H100 GPU for next-gen AI workloads | Ultra-fast Whisper transcription with Whisper on BM.GPU.H100.8 | [whisper-transcription-H100.json](whisper-transcription-H100.json) |
13+
## Pre-Filled Samples
14+
15+
| Feature Showcase | Title | Description | Blueprint File |
16+
| -------------------------------------------------------------------- | ------------------ | -------------------------------------------------------------- | ------------------------------------------------------------------ |
17+
| Deploy Whisper transcription on A10 GPU for real-time speech-to-text | A10 Transcription | Real-time audio transcription with Whisper on BM.GPU.A10.8 | [whisper-transcription-A10.json](whisper-transcription-A10.json) |
18+
| Deploy Whisper transcription on A100 GPU for high-speed processing | A100 Transcription | High-performance Whisper transcription using BM.GPU.A100.8 | [whisper-transcription-A100.json](whisper-transcription-A100.json) |
19+
| Deploy Whisper transcription on H100 GPU for next-gen AI workloads | H100 Transcription | Ultra-fast Whisper transcription with Whisper on BM.GPU.H100.8 | [whisper-transcription-H100.json](whisper-transcription-H100.json) |
1720

18-
## Key Features
21+
---
22+
23+
# In-Depth Feature Overview
1924

20-
| Capability | Description |
21-
|------------------------|-----------------------------------------------------------------------------------------------|
22-
| Transcription | Fast, multi-GPU inference with Faster-Whisper |
23-
| Summarization | Uses Mistral-7B (or other HF models) to create summaries of long transcripts |
24-
| Speaker Diarization | Global speaker labeling via pyannote.audio |
25-
| Denoising | Hybrid removal of background noise using Demucs and noisereduce |
26-
| Real-Time Streaming | Logs stream live via HTTP if enabled |
27-
| Format Compatibility | Supports `.mp3`, `.wav`, `.flac`, `.aac`, `.m4a`, `.mp4`, `.webm`, `.mov`, `.mkv`, `.avi`, etc. |
25+
| Capability | Description |
26+
| -------------------- | ----------------------------------------------------------------------------------------------- |
27+
| Transcription | Fast, multi-GPU inference with Faster-Whisper |
28+
| Summarization | Uses Mistral-7B (or other HF models) to create summaries of long transcripts |
29+
| Speaker Diarization | Global speaker labeling via pyannote.audio |
30+
| Denoising | Hybrid removal of background noise using Demucs and noisereduce |
31+
| Real-Time Streaming | Logs stream live via HTTP if enabled |
32+
| Format Compatibility | Supports `.mp3`, `.wav`, `.flac`, `.aac`, `.m4a`, `.mp4`, `.webm`, `.mov`, `.mkv`, `.avi`, etc. |
2833

2934
---
3035

3136
## Deployment on OCI Blueprint
3237

3338
### Sample Recipe (Service Mode)
34-
please look at this json file as an example [whisper-transcription-A10.json](whisper-transcription-A10.json)
39+
40+
please look at this json file as an example [whisper-transcription-A10.json](whisper-transcription-A10.json)
3541

3642
### Endpoint
43+
3744
```
3845
POST https://<YOUR_DEPLOYMENT>.nip.io/transcribe
3946
```
47+
4048
**Example:**
4149
`https://whisper-transcription-a10-6666.130-162-199-33.nip.io/transcribe`
4250

4351
---
4452

4553
## API Parameters
4654

47-
| Name | Type | Description |
48-
|-------------------|-----------|-----------------------------------------------------------------------------------------------------------------------|
49-
| `audio_url` | string | URL to audio file in OCI Object Storage (requires PAR) |
50-
| `model` | string | Whisper model to use: `base`, `medium`, `large`, `turbo`, etc. |
51-
| `summary` | bool | Whether to generate a summary (default: false). Requires `hf_token` if model path not provided |
52-
| `speaker` | bool | Whether to run diarization (default: false). Requires `hf_token` |
53-
| `max_speakers` | int | (Optional) Maximum number of speakers expected for diarization |
54-
| `denoise` | bool | Whether to apply noise reduction |
55-
| `streaming` | bool | Enables real-time logs via /stream_log endpoint |
56-
| `hf_token` | string | Hugging Face access token (required for diarization or HF-hosted summarizers) |
57-
| `prop_decrease` | float | (Optional) Controls level of noise suppression. Range: 0.0–1.0 (default: 0.7) |
58-
| `summarized_model`| string | (Optional) Path or HF model ID for summarizer. Default: `mistralai/Mistral-7B-Instruct-v0.1` |
59-
| `ground_truth` | string | (Optional) Path to reference transcript file to compute WER |
55+
| Name | Type | Description |
56+
| ------------------ | ------ | ---------------------------------------------------------------------------------------------- |
57+
| `audio_url` | string | URL to audio file in OCI Object Storage (requires PAR) |
58+
| `model` | string | Whisper model to use: `base`, `medium`, `large`, `turbo`, etc. |
59+
| `summary` | bool | Whether to generate a summary (default: false). Requires `hf_token` if model path not provided |
60+
| `speaker` | bool | Whether to run diarization (default: false). Requires `hf_token` |
61+
| `max_speakers` | int | (Optional) Maximum number of speakers expected for diarization |
62+
| `denoise` | bool | Whether to apply noise reduction |
63+
| `streaming` | bool | Enables real-time logs via /stream_log endpoint |
64+
| `hf_token` | string | Hugging Face access token (required for diarization or HF-hosted summarizers) |
65+
| `prop_decrease` | float | (Optional) Controls level of noise suppression. Range: 0.0–1.0 (default: 0.7) |
66+
| `summarized_model` | string | (Optional) Path or HF model ID for summarizer. Default: `mistralai/Mistral-7B-Instruct-v0.1` |
67+
| `ground_truth` | string | (Optional) Path to reference transcript file to compute WER |
6068

6169
---
6270

6371
## Example cURL Command
72+
6473
```bash
6574
curl -k -N -L -X POST https://<YOUR_DEPLOYMENT>.nip.io/transcribe \
6675
-F "audio_url=<YOUR_PAR_URL>" \
@@ -88,13 +97,16 @@ Each processed audio generates the following:
8897
## Streaming Logs
8998

9099
If `streaming=true`, the response will contain a log filename:
100+
91101
```json
92102
{
93103
"meta": "logfile_name",
94104
"logfile": "transcription_log_remote_audio_<timestamp>.log"
95105
}
96106
```
107+
97108
To stream logs in real-time:
109+
98110
```bash
99111
curl -N https://<YOUR_DEPLOYMENT>.nip.io/stream_log/<log_filename>
100112
```
@@ -113,15 +125,15 @@ https://huggingface.co/settings/tokens
113125

114126
## Dependencies
115127

116-
| Package | Purpose |
117-
|---------------------|----------------------------------|
118-
| `faster-whisper` | Core transcription engine |
119-
| `transformers` | Summarization via Hugging Face |
120-
| `pyannote.audio` | Speaker diarization |
121-
| `pydub`, `librosa` | Audio chunking and processing |
122-
| `demucs` | Vocal separation / denoising |
123-
| `fastapi`, `uvicorn`| REST API server |
124-
| `jiwer` | WER evaluation |
128+
| Package | Purpose |
129+
| -------------------- | ------------------------------ |
130+
| `faster-whisper` | Core transcription engine |
131+
| `transformers` | Summarization via Hugging Face |
132+
| `pyannote.audio` | Speaker diarization |
133+
| `pydub`, `librosa` | Audio chunking and processing |
134+
| `demucs` | Vocal separation / denoising |
135+
| `fastapi`, `uvicorn` | REST API server |
136+
| `jiwer` | WER evaluation |
125137

126138
---
127139

0 commit comments

Comments
 (0)