Skip to content

Commit d4f37ee

Browse files
authored
Merge pull request #35 from vcon-dev/groq_whisper
Add Groq Whisper integration for audio transcription
2 parents 1404092 + 51c2f37 commit d4f37ee

File tree

6 files changed

+693
-19
lines changed

6 files changed

+693
-19
lines changed

.env.example

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
REDIS_URL=redis://redis
32

43
# Leave this blank to disable API security
@@ -9,3 +8,6 @@ CONSERVER_API_TOKEN=
98
# modify the values in config.yml as needed
109
# and set CONSERVER_CONFIG_FILE to ./config.yml below
1110
CONSERVER_CONFIG_FILE=
11+
12+
# Groq API key for Whisper transcription
13+
GROQ_API_KEY=your_groq_api_key_here

poetry.lock

Lines changed: 38 additions & 18 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ slack-sdk = "^3.27.1"
2525
boto3 = "^1.34.52"
2626
deepgram-sdk = "^3.1.5"
2727
openai = ">=1.54.3"
28+
groq = "^0.4.0"
2829
psycopg2-binary = "^2.9.9"
2930
pymongo = "^4.6.2"
3031
elasticsearch = "^8.13.1"
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Groq Whisper Link
2+
3+
A vCon-server link that provides automatic transcription of audio content using Groq's implementation of Whisper ASR.
4+
5+
## Overview
6+
7+
This link processes vCon objects containing audio recordings and transcribes them using Groq's Whisper API. The transcription results are added back to the vCon as analysis entries.
8+
9+
## Requirements
10+
11+
- Python 3.12+
12+
- A valid Groq API key
13+
- The `groq` Python package
14+
15+
## Installation
16+
17+
1. Install the required dependencies:
18+
19+
```bash
20+
poetry add groq
21+
```
22+
23+
2. Set your Groq API key in the environment:
24+
25+
```bash
26+
export GROQ_API_KEY=your_groq_api_key_here
27+
```
28+
29+
Alternatively, you can add the API key to your `.env` file:
30+
31+
```
32+
GROQ_API_KEY=your_groq_api_key_here
33+
```
34+
35+
## Configuration
36+
37+
The link accepts the following configuration options:
38+
39+
| Option | Description | Default |
40+
|--------|-------------|---------|
41+
| `API_KEY` | Groq API key for authentication | From GROQ_API_KEY environment variable |
42+
| `minimum_duration` | Minimum duration (in seconds) of audio to transcribe | 30 |
43+
44+
## Usage
45+
46+
To use this link in a vCon processing chain:
47+
48+
```python
49+
from server.links.groq_whisper import run
50+
51+
result = run(
52+
vcon_uuid="your-vcon-uuid",
53+
link_name="groq_whisper",
54+
opts={
55+
"minimum_duration": 60 # Optional override
56+
}
57+
)
58+
```
59+
60+
## How It Works
61+
62+
1. The link retrieves the vCon object from Redis
63+
2. For each recording dialog in the vCon:
64+
- Skips dialogs shorter than the minimum duration
65+
- Skips dialogs that already have a transcript
66+
- Extracts audio content (from inline base64 or external URL)
67+
- Sends the audio to Groq's Whisper API for transcription
68+
- Adds transcription results as a new analysis entry
69+
3. Stores the updated vCon back to Redis
70+
71+
## Testing
72+
73+
To run the tests:
74+
75+
```bash
76+
# Set a dummy API key for testing
77+
export GROQ_API_KEY=test_api_key_for_testing
78+
79+
# Run the tests
80+
pytest server/links/groq_whisper/test_groq_whisper.py -v
81+
```
82+
83+
## Response Format
84+
85+
The Groq Whisper API returns transcription results in the following format:
86+
87+
```json
88+
{
89+
"text": "The complete transcription text.",
90+
"chunks": [
91+
{
92+
"text": "Chunk of transcription",
93+
"timestamp": [0.0, 5.0]
94+
},
95+
{
96+
"text": "Another chunk",
97+
"timestamp": [5.1, 10.0]
98+
}
99+
],
100+
"language": "en"
101+
}
102+
```
103+
104+
This response is stored in the vCon's analysis section as a transcript entry.

0 commit comments

Comments
 (0)