Skip to content

Commit 690c376

Browse files
committed
feat: Integrate AssemblyAI for transcription services
- Add `ASSEMBLYAI_API_KEY` to the config options - Update usage instructions for the `transcribe-me` tool - Changed `.env.example` to `.env.dev` in the `init` target in Makefile - Add a flag to use AssemblyAI for transcription in `.transcribe.yaml` - Include features related to AssemblyAI outputs and transcription in README.md
1 parent 797ec15 commit 690c376

File tree

9 files changed

+217
-73
lines changed

9 files changed

+217
-73
lines changed

.env.dev

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
TWINE_PASSWORD=pypi_api_token
2+
GITHUB_TOKEN=github_api_token
3+
GITHUB_REPOSITORY=echohello-dev/transcribe-me
4+
GITHUB_ACTOR=echohello-dev

.env.example

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
TWINE_PASSWORD=pypi_api_token
2-
GITHUB_TOKEN=github_api_token
3-
GITHUB_REPOSITORY=echohello-dev/transcribe-me
4-
GITHUB_ACTOR=echohello-dev
1+
OPENAI_API_KEY=your_openai_api_key_here
2+
ANTHROPIC_API_KEY=your_anthropic_api_key_here
3+
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here

.transcribe.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
use_assemblyai: true
2+
13
openai:
24
models:
35
- temperature: 0.1

Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ VERSION ?= $(shell git describe --tags --always)
88
export
99

1010
init:
11-
cp .env.example .env
11+
cp .env.dev .env
1212

1313
check-ffmpeg:
1414
ifeq (, $(shell which ffmpeg))
@@ -79,13 +79,13 @@ else
7979
docker compose build --push
8080
endif
8181

82-
transcribe: install
82+
transcribe:
8383
$(VENV) python -m transcribe_me.main
8484

85-
transcribe-archive: install
85+
transcribe-archive:
8686
$(VENV) python -m transcribe_me.main archive
8787

88-
transcribe-install: install
88+
transcribe-install:
8989
$(VENV) python -m transcribe_me.main install
9090

9191
release-version:

README.md

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,28 +4,33 @@
44

55
[![Build](https://github.com/echohello-dev/transcribe-me/actions/workflows/build.yaml/badge.svg)](https://github.com/echohello-dev/transcribe-me/actions/workflows/build.yaml)
66

7-
Transcribe Me is a CLI-driven Python application that transcribes audio files using the OpenAI Whisper API and generates summaries of the transcriptions using both OpenAI's GPT-4 and Anthropic's Claude models.
7+
Transcribe Me is a CLI-driven Python application that transcribes audio files using either the OpenAI Whisper API or AssemblyAI, and generates summaries of the transcriptions using OpenAI's GPT-4 and Anthropic's Claude models.
88

99
```mermaid
1010
graph TD
1111
A[Load Config] --> B[Get Audio Files]
1212
B --> C{Audio File Exists?}
13-
C --Yes--> D[Transcribe Audio File]
14-
D --> E[Generate Summaries]
15-
E --> F[Save Transcription]
16-
F --> G[Save Summaries]
17-
G --> H[Clean Up Temporary Files]
18-
H --> B
19-
C --No--> I[Print Warning]
20-
I --> B
13+
C --Yes--> D{Use AssemblyAI?}
14+
D --Yes--> E[Transcribe with AssemblyAI]
15+
D --No--> F[Transcribe with OpenAI]
16+
E --> G[Generate Additional Outputs]
17+
F --> H[Generate Summaries]
18+
G --> I[Save Transcription and Outputs]
19+
H --> J[Save Transcription and Summaries]
20+
I --> K[Clean Up Temporary Files]
21+
J --> K
22+
K --> B
23+
C --No--> L[Print Warning]
24+
L --> B
2125
```
2226

2327
## :key: Key Features
2428

25-
- **Audio Transcription**: Transcribes audio files using the OpenAI Whisper API. It supports both MP3 and M4A formats and can handle large files by splitting them into smaller chunks for transcription.
26-
- **Summary Generation**: Generates summaries of the transcriptions using both OpenAI's GPT-4 and Anthropic's Claude models. The summaries are saved in Markdown format and include key points in bold and a "Next Steps" section.
29+
- **Audio Transcription**: Transcribes audio files using either the OpenAI Whisper API or AssemblyAI. It supports both MP3 and M4A formats.
30+
- **Summary Generation**: Generates summaries of the transcriptions using both OpenAI's GPT-4 and Anthropic's Claude models when using OpenAI for transcription.
31+
- **AssemblyAI Features**: When using AssemblyAI, provides additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
2732
- **Configurable Models**: Supports multiple models for OpenAI and Anthropic, with configurable temperature, max_tokens, and system prompts.
28-
- **Supports Audio Files**: Supports audio files `.m4a` and `.mp3` formats.
33+
- **Supports Audio Files**: Supports audio files in `.m4a` and `.mp3` formats.
2934
- **Supports Docker**: Can be run in a Docker container for easy deployment and reproducibility.
3035

3136
## :package: Installation
@@ -65,11 +70,12 @@ This has been tested with macOS, your mileage may vary on other operating system
6570
transcribe-me install
6671
```
6772

68-
This command will also prompt you to enter your API keys for OpenAI and Anthropic if they are not already provided in environment variables. You can also set the API keys in environment variables:
73+
This command will prompt you to enter your API keys for OpenAI, Anthropic, and AssemblyAI if they are not already provided in environment variables. You can also set the API keys in environment variables:
6974

7075
```bash
7176
export OPENAI_API_KEY=your_api_key
7277
export ANTHROPIC_API_KEY=your_api_key
78+
export ASSEMBLYAI_API_KEY=your_api_key
7379
```
7480

7581
2. Place your audio files in the `input` directory (or any other directory specified in the configuration).
@@ -117,6 +123,7 @@ You can also run the application using Docker:
117123
--rm \
118124
-e OPENAI_API_KEY \
119125
-e ANTHROPIC_API_KEY \
126+
-e ASSEMBLYAI_API_KEY \
120127
-v $(pwd)/archive:/app/archive \
121128
-v $(pwd)/input:/app/input \
122129
-v $(pwd)/output:/app/output \
@@ -136,6 +143,7 @@ You can also run the application using Docker:
136143
environment:
137144
- OPENAI_API_KEY
138145
- ANTHROPIC_API_KEY
146+
- ASSEMBLYAI_API_KEY
139147
volumes:
140148
- ./input:/app/input
141149
- ./output:/app/output
@@ -151,7 +159,7 @@ You can also run the application using Docker:
151159

152160
This command mounts the `input`, `output`, `archive`, and `.transcribe.yaml` configuration file into the Docker container. See [`compose.example.yaml`](./compose.example.yaml) for an example configuration.
153161

154-
Make sure to replace `OPENAI_API_KEY` and `ANTHROPIC_API_KEY` with your actual API keys. Also make sure to create the `.transcribe.yaml` configuration file in the same directory as the `docker-compose.yml` file.
162+
Make sure to replace `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, and `ASSEMBLYAI_API_KEY` with your actual API keys. Also make sure to create the `.transcribe.yaml` configuration file in the same directory as the `docker-compose.yml` file.
155163

156164
## :rocket: How it Works
157165

@@ -160,21 +168,23 @@ The Transcribe Me application follows a straightforward workflow:
160168
1. **Load Configuration**: The application loads the configuration from the `.transcribe.yaml` file, which includes settings for input/output directories, models, and their configurations.
161169
2. **Get Audio Files**: The application gets a list of audio files from the input directory specified in the configuration.
162170
3. **Check Existing Transcriptions**: For each audio file, the application checks if there is an existing transcription file. If a transcription file exists, it skips to the next audio file.
163-
4. **Transcribe Audio File**: If no transcription file exists, the application transcribes the audio file using the OpenAI Whisper API. It splits the audio file into smaller chunks for efficient transcription.
164-
5. **Generate Summaries**: After transcription, the application generates summaries of the transcription using the configured models (OpenAI GPT-4 and Anthropic Claude).
165-
6. **Save Transcription and Summaries**: The application saves the transcription to a text file and the summaries from each configured model to separate Markdown files in the output directory.
171+
4. **Transcribe Audio File**: If no transcription file exists, the application transcribes the audio file using either the OpenAI Whisper API or AssemblyAI, based on the configuration.
172+
5. **Generate Outputs**:
173+
- For OpenAI: The application generates summaries of the transcription using the configured models (OpenAI GPT-4 and Anthropic Claude).
174+
- For AssemblyAI: The application generates additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
175+
6. **Save Transcription and Outputs**: The application saves the transcription and all generated outputs to separate files in the output directory.
166176
7. **Clean Up Temporary Files**: The application removes any temporary files generated during the transcription process.
167177
8. **Repeat**: The process repeats for each audio file in the input directory.
168178

169179
## :gear: Configuration
170180

171181
The application uses a configuration file (`.transcribe.yaml`) to specify settings such as input/output directories, API keys, models, and their configurations. The configuration file is created automatically when you run the `transcribe-me install` command.
172182

173-
> `max_tokens` is the maximum number of tokens to generate in the summary. The default is dynamic based on the model.
174-
175183
Here is an example configuration file:
176184

177185
```yaml
186+
use_assemblyai: false # Set to true to use AssemblyAI instead of OpenAI for transcription
187+
178188
openai:
179189
models:
180190
- temperature: 0.1
@@ -226,7 +236,7 @@ output_folder: output
226236
make install
227237
```
228238

229-
3. Run the `transcribe-me install` command to create the `.transcribe.yaml` configuration file and provide your API keys for OpenAI and Anthropic:
239+
3. Run the `transcribe-me install` command to create the `.transcribe.yaml` configuration file and provide your API keys for OpenAI, Anthropic, and AssemblyAI:
230240

231241
```bash
232242
make transcribe-install
@@ -277,4 +287,4 @@ To release a new version:
277287

278288
## Star History
279289

280-
[![Star History Chart](https://api.star-history.com/svg?repos=echohello-dev/transcribe-me&type=Date)](https://star-history.com/#echohello-dev/transcribe-me&Date)
290+
[![Star History Chart](https://api.star-history.com/svg?repos=echohello-dev/transcribe-me&type=Date)](https://star-history.com/#echohello-dev/transcribe-me&Date)

requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@ annotated-types==0.6.0
22
anthropic==0.21.3
33
anyio==4.4.0
44
argcomplete==3.2.3
5+
assemblyai==0.34.0
56
astroid==3.1.0
7+
autopep8==2.3.1
68
black==24.4.0
79
build==1.2.1
810
certifi==2024.7.4
@@ -72,5 +74,6 @@ twine==5.0.0
7274
typing_extensions==4.10.0
7375
urllib3==2.2.1
7476
wcwidth==0.2.13
77+
websockets==13.1
7578
yamale==5.1.0
7679
zipp==3.19.1

transcribe_me/audio/transcription.py

Lines changed: 94 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from glob import glob
33
from typing import Dict, Any
44
import openai
5+
import assemblyai as aai
56
from tqdm import tqdm
67
from colorama import Fore
78
from tenacity import retry, wait_exponential, stop_after_attempt
@@ -29,20 +30,33 @@ def transcribe_chunk(file_path: str) -> str:
2930
raise e
3031

3132

32-
def transcribe_audio(file_path: str, output_path: str) -> None:
33+
def transcribe_audio(file_path: str, output_path: str, config: Dict[str, Any]) -> None:
3334
"""
34-
Transcribe an audio file using the OpenAI Whisper API.
35+
Transcribe an audio file using either OpenAI Whisper API or AssemblyAI.
3536
3637
Args:
3738
file_path (str): Path to the audio file to transcribe.
3839
output_path (str): Path to the output file for the transcription.
40+
config (Dict[str, Any]): Configuration dictionary.
41+
"""
42+
use_assemblyai = config.get("use_assemblyai", False)
43+
44+
if use_assemblyai:
45+
transcribe_with_assemblyai(file_path, output_path, config)
46+
else:
47+
transcribe_with_openai(file_path, output_path)
48+
49+
50+
def transcribe_with_openai(file_path: str, output_path: str) -> None:
51+
"""
52+
Transcribe an audio file using the OpenAI Whisper API.
3953
"""
4054
chunk_files = split_audio(file_path)
4155
full_transcription = ""
4256

4357
progress_bar = tqdm(
4458
chunk_files,
45-
desc="Transcribing",
59+
desc="Transcribing with OpenAI",
4660
unit="chunk",
4761
bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt}",
4862
)
@@ -61,6 +75,78 @@ def transcribe_audio(file_path: str, output_path: str) -> None:
6175
file.write(full_transcription)
6276

6377

78+
def transcribe_with_assemblyai(
79+
file_path: str, output_path: str, config: Dict[str, Any]
80+
) -> None:
81+
"""
82+
Transcribe an audio file using AssemblyAI.
83+
"""
84+
transcription_config = aai.TranscriptionConfig(
85+
speaker_labels=True,
86+
summarization=True,
87+
sentiment_analysis=True,
88+
auto_highlights=True,
89+
iab_categories=True,
90+
)
91+
transcriber = aai.Transcriber()
92+
93+
transcript = transcriber.transcribe(file_path, config=transcription_config)
94+
95+
# Write transcription to file
96+
with open(output_path, "w", encoding="utf-8") as file:
97+
file.write(transcript.text)
98+
99+
# Write additional information to separate files
100+
base_name = os.path.splitext(output_path)[0]
101+
102+
# Speaker Diarization
103+
with open(f"{base_name}_speakers.txt", "w", encoding="utf-8") as file:
104+
for utterance in transcript.utterances:
105+
file.write(f"Speaker {utterance.speaker}: {utterance.text}\n")
106+
107+
# Auto Highlights
108+
with open(f"{base_name}_auto_highlights.txt", "w", encoding="utf-8") as file:
109+
for highlight in transcript.auto_highlights_result.results:
110+
file.write(f"{highlight.text}\n")
111+
112+
# Summary
113+
with open(f"{base_name}_summary.txt", "w", encoding="utf-8") as file:
114+
file.write(transcript.summary)
115+
116+
# Sentiment Analysis
117+
if transcript.sentiment_analysis:
118+
with open(f"{base_name}_sentiment.txt", "w", encoding="utf-8") as file:
119+
for result in transcript.sentiment_analysis:
120+
file.write(f"Text: {result.text}\n")
121+
file.write(f"Sentiment: {result.sentiment}\n")
122+
file.write(f"Confidence: {result.confidence}\n")
123+
file.write(f"Timestamp: {result.start} - {result.end}\n\n")
124+
125+
# Key Phrases
126+
with open(f"{base_name}_key_phrases.txt", "w", encoding="utf-8") as file:
127+
for phrase in transcript.auto_highlights_result.results:
128+
file.write(f"{phrase.text}\n")
129+
130+
# Topic Detection
131+
if transcript.iab_categories:
132+
with open(f"{base_name}_topics.txt", "w", encoding="utf-8") as file:
133+
# Detailed results
134+
file.write("Detailed Topic Results:\n")
135+
for result in transcript.iab_categories.results:
136+
file.write(f"Text: {result.text}\n")
137+
file.write(
138+
f"Timestamp: {result.timestamp.start} - {result.timestamp.end}\n"
139+
)
140+
for label in result.labels:
141+
file.write(f" {label.label} (Relevance: {label.relevance})\n")
142+
file.write("\n")
143+
144+
# Summary of all topics
145+
file.write("\nTopic Summary:\n")
146+
for topic, relevance in transcript.iab_categories.summary.items():
147+
file.write(f"Audio is {relevance * 100:.2f}% relevant to {topic}\n")
148+
149+
64150
def process_audio_files(
65151
input_folder: str, output_folder: str, config: Dict[str, Any]
66152
) -> None:
@@ -84,11 +170,12 @@ def process_audio_files(
84170
try:
85171
if not os.path.exists(output_file):
86172
print(f"{Fore.BLUE}Transcribing audio file: {file_path}\n")
87-
transcribe_audio(file_path, output_file)
173+
transcribe_audio(file_path, output_file, config)
88174
except Exception as e:
89175
print(f"{Fore.RED}An error occurred while processing {file_path}: {e}")
90176
raise e
91177
finally:
92-
# Delete the _part* MP3 files
93-
for file in glob(f"{file_path.partition('.')[0]}_part*.mp3"):
94-
os.remove(file)
178+
# Delete the _part* MP3 files if using OpenAI
179+
if not config.get("use_assemblyai", False):
180+
for file in glob(f"{file_path.partition('.')[0]}_part*.mp3"):
181+
os.remove(file)

0 commit comments

Comments
 (0)