Skip to content

Commit 09b403d

Browse files
authored
Merge pull request #12 from aj47/pr-11
Groq provider option
2 parents 6ff4b1d + a8ef98b commit 09b403d

File tree

7 files changed

+852
-9
lines changed

7 files changed

+852
-9
lines changed

README.md

Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device insane
7979
# Mac Apple Silicon accelerated
8080
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device mlx
8181

82+
# Groq API (fastest, requires API key)
83+
export GROQ_API_KEY="your_groq_api_key_here"
84+
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device groq
85+
8286
# Advanced options (see Advanced Options section below for full details)
8387
transcribe-anything video.mp4 --device mlx --batch_size 16 --verbose
8488
transcribe-anything video.mp4 --device insane --batch-size 8 --flash True
@@ -97,18 +101,27 @@ transcribe_anything(
97101
device="cuda"
98102
)
99103

104+
# Using Groq API for fastest transcription
105+
transcribe_anything(
106+
url_or_file="video.mp4",
107+
output_dir="output_dir",
108+
device="groq",
109+
groq_api_key="your_groq_api_key" # or set GROQ_API_KEY env var
110+
)
111+
100112
# Full function signiture:
101113
def transcribe(
102114
url_or_file: str,
103115
output_dir: Optional[str] = None,
104116
model: Optional[str] = None, # tiny,small,medium,large
105117
task: Optional[str] = None, # transcribe or translate
106118
language: Optional[str] = None, # auto detected if none, "en" for english...
107-
device: Optional[str] = None, # cuda,cpu,insane,mlx
119+
device: Optional[str] = None, # cuda,cpu,insane,mlx,groq
108120
embed: bool = False, # Produces a video.mp4 with the subtitles burned in.
109121
hugging_face_token: Optional[str] = None, # If you want a speaker.json
110122
other_args: Optional[list[str]] = None, # Other args to be passed to to the whisper backend
111123
initial_prompt: Optional[str] = None, # Custom prompt for better recognition of specific terms
124+
groq_api_key: Optional[str] = None, # Groq API key for speech-to-text (or set GROQ_API_KEY env var)
112125
) -> str:
113126

114127
```
@@ -197,12 +210,73 @@ Mac:
197210

198211
- Use `--device mlx`
199212

213+
# Groq API Integration
214+
215+
For the fastest transcription speeds, you can use Groq's speech-to-text API. This requires a Groq API key but provides near-instant transcription results.
216+
217+
## Setup
218+
219+
1. Get a free API key from [Groq Console](https://console.groq.com/)
220+
2. Set your API key as an environment variable:
221+
222+
```bash
223+
export GROQ_API_KEY="your_groq_api_key_here"
224+
```
225+
226+
Or pass it directly:
227+
228+
```bash
229+
transcribe-anything video.mp4 --device groq --groq_api_key "your_api_key"
230+
```
231+
232+
## Supported Models
233+
234+
- `whisper-large-v3` - Best accuracy, multilingual
235+
- `whisper-large-v3-turbo` - Faster, multilingual (default mapping for most models)
236+
- `distil-whisper-large-v3-en` - Fastest, English-only
237+
238+
## Features
239+
240+
- **Speed**: Near-instant transcription (189-250x real-time)
241+
- **File Size**: Automatic chunking for files larger than 90MB
242+
- **Languages**: Multilingual support with automatic detection
243+
- **Custom Prompts**: Support for domain-specific vocabulary
244+
- **Output Formats**: Same SRT, VTT, TXT, and JSON outputs as other backends
245+
- **Smart Chunking**: Large files are automatically split into chunks and reassembled
246+
247+
## Usage Examples
248+
249+
```bash
250+
# Basic Groq transcription
251+
transcribe-anything video.mp4 --device groq
252+
253+
# With custom model
254+
transcribe-anything audio.wav --device groq --model whisper-large-v3
255+
256+
# With custom prompt for better accuracy
257+
transcribe-anything meeting.mp3 --device groq --initial_prompt "This is a technical discussion about AI and machine learning"
258+
259+
# Translate to English
260+
transcribe-anything foreign_audio.mp4 --device groq --task translate
261+
262+
# Large file (will be automatically chunked)
263+
transcribe-anything large_podcast.mp3 --device groq --model whisper-large-v3-turbo
264+
```
265+
266+
## Limitations
267+
268+
- Requires internet connection
269+
- API usage limits apply (see Groq pricing)
270+
- Large files are automatically chunked (may have slight timing gaps between chunks)
271+
- Requires `ffmpeg` for audio chunking of large files
272+
200273
# Advanced Options and Backend-Specific Arguments
201274

202275
## Quick Reference
203276

204277
| Backend | Device Flag | Key Arguments | Best For |
205278
|---------|-------------|---------------|----------|
279+
| **Groq API** | `--device groq` | `--groq_api_key`, `--initial_prompt` | Fastest transcription (cloud) |
206280
| **MLX** | `--device mlx` | `--batch_size`, `--verbose`, `--initial_prompt` | Mac Apple Silicon |
207281
| **Insanely Fast** | `--device insane` | `--batch-size`, `--hf_token`, `--flash`, `--timestamp` | Windows/Linux GPU |
208282
| **CPU** | `--device cpu` | Standard whisper args | Universal compatibility |

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ dependencies = [
1818
"webvtt-py==0.4.6",
1919
"uv-iso-env>=1.0.43",
2020
"python-dotenv>=1.0.1",
21+
"groq>=0.11.0",
2122
]
2223
# VERSION
2324
version = "3.2.0" # Update this manually or configure setuptools-scm for automatic versioning

src/transcribe_anything/_cmd.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,12 @@ def parse_arguments() -> argparse.Namespace:
100100
default=None,
101101
choices=[None] + whisper_options["language"],
102102
)
103-
choices = [None, "cpu", "cuda", "insane"]
103+
choices = [None, "cpu", "cuda", "insane", "groq"]
104104
if platform.system() == "Darwin":
105105
choices.extend(["mlx", "mps"]) # mps for backward compatibility
106106
parser.add_argument(
107107
"--device",
108-
help="device to use for processing, None will auto select CUDA if available or else CPU",
108+
help="device to use for processing, None will auto select CUDA if available or else CPU. Use 'groq' for Groq API",
109109
default=None,
110110
choices=choices,
111111
)
@@ -119,6 +119,11 @@ def parse_arguments() -> argparse.Namespace:
119119
help="save huggingface token to a file for future use",
120120
action="store_true",
121121
)
122+
parser.add_argument(
123+
"--groq_api_key",
124+
help="Groq API key for speech-to-text (can also be set via GROQ_API_KEY environment variable)",
125+
default=None,
126+
)
122127
parser.add_argument(
123128
"--diarization_model",
124129
help=("Name of the pretrained model/ checkpoint to perform diarization." + " (default: pyannote/speaker-diarization). Only works for --device insane."),
@@ -254,6 +259,7 @@ def main() -> int:
254259
embed=args.embed,
255260
hugging_face_token=args.hf_token,
256261
other_args=unknown,
262+
groq_api_key=args.groq_api_key,
257263
)
258264
except KeyboardInterrupt:
259265
print("KeyboardInterrupt")

src/transcribe_anything/api.py

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from appdirs import user_config_dir # type: ignore
2424

2525
from transcribe_anything.audio import fetch_audio
26+
from transcribe_anything.groq_whisper import run_groq_whisper
2627
from transcribe_anything.insanely_fast_whisper import run_insanely_fast_whisper
2728
from transcribe_anything.logger import log_error
2829
from transcribe_anything.util import chop_double_extension, sanitize_filename
@@ -53,6 +54,7 @@ class Device(Enum):
5354
CUDA = "cuda"
5455
INSANE = "insane"
5556
MLX = "mlx"
57+
GROQ = "groq"
5658

5759
def __str__(self) -> str:
5860
return self.value
@@ -73,6 +75,8 @@ def from_str(device: str) -> "Device":
7375
if sys.platform != "darwin":
7476
raise ValueError("MLX is only supported on macOS.")
7577
return Device.MLX
78+
if device == "groq":
79+
return Device.GROQ
7680
# Backward compatibility: accept 'mps' as alias for 'mlx'
7781
if device == "mps":
7882
if sys.platform != "darwin":
@@ -174,6 +178,7 @@ def transcribe(
174178
hugging_face_token: Optional[str] = None,
175179
other_args: Optional[list[str]] = None,
176180
initial_prompt: Optional[str] = None,
181+
groq_api_key: Optional[str] = None,
177182
) -> str:
178183
"""
179184
Runs the transcription program.
@@ -184,13 +189,14 @@ def transcribe(
184189
model: Whisper model to use (tiny, small, medium, large, etc.)
185190
task: Task to perform (transcribe or translate)
186191
language: Language of the audio (auto-detected if None)
187-
device: Device to use (cuda, cpu, insane, mlx)
192+
device: Device to use (cuda, cpu, insane, mlx, groq)
188193
embed: Whether to embed subtitles into video file
189194
hugging_face_token: Token for speaker diarization
190195
other_args: Additional arguments to pass to Whisper backend
191196
initial_prompt: Initial prompt to provide context for transcription.
192197
Useful for custom vocabulary, names, or domain-specific terms.
193198
Example: "The speaker discusses AI, machine learning, and neural networks."
199+
groq_api_key: API key for Groq speech-to-text service (can also be set via GROQ_API_KEY env var)
194200
195201
Returns:
196202
Path to the output directory containing transcription files
@@ -244,6 +250,10 @@ def transcribe(
244250
print("#####################################")
245251
print("####### MAC MLX GPU MODE! ###########")
246252
print("#####################################")
253+
elif device_enum == Device.GROQ:
254+
print("#####################################")
255+
print("####### GROQ API MODE! ###############")
256+
print("#####################################")
247257
else:
248258
raise ValueError(f"Unknown device {device}")
249259
print(f"Using device {device}")
@@ -260,7 +270,18 @@ def transcribe(
260270

261271
print(f"Running whisper on {tmp_wav} (will install models on first run)")
262272
with tempfile.TemporaryDirectory() as tmpdir:
263-
if device_enum == Device.INSANE:
273+
if device_enum == Device.GROQ:
274+
run_groq_whisper(
275+
input_wav=Path(tmp_wav),
276+
model=model_str,
277+
output_dir=Path(tmpdir),
278+
task=task_str,
279+
language=language_str,
280+
api_key=groq_api_key,
281+
initial_prompt=initial_prompt,
282+
other_args=other_args,
283+
)
284+
elif device_enum == Device.INSANE:
264285
run_insanely_fast_whisper(
265286
input_wav=Path(tmp_wav),
266287
model=model_str,

0 commit comments

Comments
 (0)