You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Integrate AssemblyAI for transcription services
- Add `ASSEMBLYAI_API_KEY` to the config options
- Update usage instructions for the `transcribe-me` tool
- Changed `.env.example` to `.env.dev` in the `init` target in Makefile
- Add a flag to use AssemblyAI for transcription in `.transcribe.yaml`
- Include features related to AssemblyAI outputs and transcription in README.md
Transcribe Me is a CLI-driven Python application that transcribes audio files using the OpenAI Whisper API and generates summaries of the transcriptions using both OpenAI's GPT-4 and Anthropic's Claude models.
7
+
Transcribe Me is a CLI-driven Python application that transcribes audio files using either the OpenAI Whisper API or AssemblyAI, and generates summaries of the transcriptions using OpenAI's GPT-4 and Anthropic's Claude models.
8
8
9
9
```mermaid
10
10
graph TD
11
11
A[Load Config] --> B[Get Audio Files]
12
12
B --> C{Audio File Exists?}
13
-
C --Yes--> D[Transcribe Audio File]
14
-
D --> E[Generate Summaries]
15
-
E --> F[Save Transcription]
16
-
F --> G[Save Summaries]
17
-
G --> H[Clean Up Temporary Files]
18
-
H --> B
19
-
C --No--> I[Print Warning]
20
-
I --> B
13
+
C --Yes--> D{Use AssemblyAI?}
14
+
D --Yes--> E[Transcribe with AssemblyAI]
15
+
D --No--> F[Transcribe with OpenAI]
16
+
E --> G[Generate Additional Outputs]
17
+
F --> H[Generate Summaries]
18
+
G --> I[Save Transcription and Outputs]
19
+
H --> J[Save Transcription and Summaries]
20
+
I --> K[Clean Up Temporary Files]
21
+
J --> K
22
+
K --> B
23
+
C --No--> L[Print Warning]
24
+
L --> B
21
25
```
22
26
23
27
## :key: Key Features
24
28
25
-
-**Audio Transcription**: Transcribes audio files using the OpenAI Whisper API. It supports both MP3 and M4A formats and can handle large files by splitting them into smaller chunks for transcription.
26
-
-**Summary Generation**: Generates summaries of the transcriptions using both OpenAI's GPT-4 and Anthropic's Claude models. The summaries are saved in Markdown format and include key points in bold and a "Next Steps" section.
29
+
-**Audio Transcription**: Transcribes audio files using either the OpenAI Whisper API or AssemblyAI. It supports both MP3 and M4A formats.
30
+
-**Summary Generation**: Generates summaries of the transcriptions using both OpenAI's GPT-4 and Anthropic's Claude models when using OpenAI for transcription.
31
+
-**AssemblyAI Features**: When using AssemblyAI, provides additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
27
32
-**Configurable Models**: Supports multiple models for OpenAI and Anthropic, with configurable temperature, max_tokens, and system prompts.
28
-
-**Supports Audio Files**: Supports audio files `.m4a` and `.mp3` formats.
33
+
-**Supports Audio Files**: Supports audio files in `.m4a` and `.mp3` formats.
29
34
-**Supports Docker**: Can be run in a Docker container for easy deployment and reproducibility.
30
35
31
36
## :package: Installation
@@ -65,11 +70,12 @@ This has been tested with macOS, your mileage may vary on other operating system
65
70
transcribe-me install
66
71
```
67
72
68
-
This command will also prompt you to enter your API keys forOpenAIand Anthropic if they are not already providedin environment variables. You can also set the API keys in environment variables:
73
+
This command will prompt you to enter your API keys forOpenAI, Anthropic, and AssemblyAI if they are not already providedin environment variables. You can also set the API keys in environment variables:
69
74
70
75
```bash
71
76
export OPENAI_API_KEY=your_api_key
72
77
export ANTHROPIC_API_KEY=your_api_key
78
+
export ASSEMBLYAI_API_KEY=your_api_key
73
79
```
74
80
75
81
2. Place your audio files in the `input` directory (or any other directory specified in the configuration).
@@ -117,6 +123,7 @@ You can also run the application using Docker:
117
123
--rm \
118
124
-e OPENAI_API_KEY \
119
125
-e ANTHROPIC_API_KEY \
126
+
-e ASSEMBLYAI_API_KEY \
120
127
-v $(pwd)/archive:/app/archive \
121
128
-v $(pwd)/input:/app/input \
122
129
-v $(pwd)/output:/app/output \
@@ -136,6 +143,7 @@ You can also run the application using Docker:
136
143
environment:
137
144
- OPENAI_API_KEY
138
145
- ANTHROPIC_API_KEY
146
+
- ASSEMBLYAI_API_KEY
139
147
volumes:
140
148
- ./input:/app/input
141
149
- ./output:/app/output
@@ -151,7 +159,7 @@ You can also run the application using Docker:
151
159
152
160
This command mounts the `input`, `output`, `archive`, and `.transcribe.yaml` configuration file into the Docker container. See [`compose.example.yaml`](./compose.example.yaml) for an example configuration.
153
161
154
-
Make sure to replace `OPENAI_API_KEY`and `ANTHROPIC_API_KEY` with your actual API keys. Also make sure to create the `.transcribe.yaml` configuration file in the same directory as the `docker-compose.yml` file.
162
+
Make sure to replace `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, and `ASSEMBLYAI_API_KEY` with your actual API keys. Also make sure to create the `.transcribe.yaml` configuration file in the same directory as the `docker-compose.yml` file.
155
163
156
164
## :rocket: How it Works
157
165
@@ -160,21 +168,23 @@ The Transcribe Me application follows a straightforward workflow:
160
168
1. **Load Configuration**: The application loads the configuration from the `.transcribe.yaml` file, which includes settings for input/output directories, models, and their configurations.
161
169
2. **Get Audio Files**: The application gets a list of audio files from the input directory specified in the configuration.
162
170
3. **Check Existing Transcriptions**: For each audio file, the application checks if there is an existing transcription file. If a transcription file exists, it skips to the next audio file.
163
-
4. **Transcribe Audio File**: If no transcription file exists, the application transcribes the audio file using the OpenAI Whisper API. It splits the audio file into smaller chunks for efficient transcription.
164
-
5. **Generate Summaries**: After transcription, the application generates summaries of the transcription using the configured models (OpenAI GPT-4 and Anthropic Claude).
165
-
6. **Save Transcription and Summaries**: The application saves the transcription to a text file and the summaries from each configured model to separate Markdown files in the output directory.
171
+
4. **Transcribe Audio File**: If no transcription file exists, the application transcribes the audio file using either the OpenAI Whisper API or AssemblyAI, based on the configuration.
172
+
5. **Generate Outputs**:
173
+
- For OpenAI: The application generates summaries of the transcription using the configured models (OpenAI GPT-4 and Anthropic Claude).
174
+
- For AssemblyAI: The application generates additional outputs including Speaker Diarization, Summary, Sentiment Analysis, Key Phrases, and Topic Detection.
175
+
6. **Save Transcription and Outputs**: The application saves the transcription and all generated outputs to separate files in the output directory.
166
176
7. **Clean Up Temporary Files**: The application removes any temporary files generated during the transcription process.
167
177
8. **Repeat**: The process repeats foreach audio filein the input directory.
168
178
169
179
## :gear: Configuration
170
180
171
181
The application uses a configuration file (`.transcribe.yaml`) to specify settings such as input/output directories, API keys, models, and their configurations. The configuration file is created automatically when you run the `transcribe-me install` command.
172
182
173
-
>`max_tokens` is the maximum number of tokens to generate in the summary. The default is dynamic based on the model.
174
-
175
183
Here is an example configuration file:
176
184
177
185
```yaml
186
+
use_assemblyai: false# Set to true to use AssemblyAI instead of OpenAI for transcription
187
+
178
188
openai:
179
189
models:
180
190
- temperature: 0.1
@@ -226,7 +236,7 @@ output_folder: output
226
236
make install
227
237
```
228
238
229
-
3. Run the `transcribe-me install`command to create the `.transcribe.yaml` configuration file and provide your API keys for OpenAIand Anthropic:
239
+
3. Run the `transcribe-me install`command to create the `.transcribe.yaml` configuration file and provide your API keys for OpenAI, Anthropic, and AssemblyAI:
230
240
231
241
```bash
232
242
make transcribe-install
@@ -277,4 +287,4 @@ To release a new version:
277
287
278
288
## Star History
279
289
280
-
[](https://star-history.com/#echohello-dev/transcribe-me&Date)
290
+
[](https://star-history.com/#echohello-dev/transcribe-me&Date)
0 commit comments