You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-8Lines changed: 17 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,16 @@
8
8
9
9
WhisperWriter is a small speech-to-text app that uses [OpenAI's Whisper model](https://openai.com/research/whisper) to auto-transcribe recordings from a user's microphone.
10
10
11
-
Once started, the script runs in the background and waits for a keyboard shortcut to be pressed (`ctrl+shift+space` by default, but this can be changed in the [Configuration Options](#configuration-options)). When the shortcut is pressed, the app starts recording from your microphone. It will continue recording until you stop speaking or there is a long enough pause in your speech. While it is recording, a small status window is displayed that shows the current stage of the transcription process. Once the transcription is complete, the transcribed text will be automatically written to the active window.
11
+
Once started, the script runs in the background and waits for a keyboard shortcut to be pressed (`ctrl+shift+space` by default). When the shortcut is pressed, the app starts recording from your microphone. There are three options to stop recording:
12
+
-`voice_activity_detection` that stops recording once it detects a long enough pause in your speech.
13
+
-`press_to_toggle` that stops recording when the activation key is pressed again.
14
+
-`hold_to_record` that stops recording when the activation key is released.
15
+
16
+
You can change the activation key and recording mode in the [Configuration Options](#configuration-options). While recording and transcribing, a small status window is displayed that shows the current stage of the process (but this can be turned off). Once the transcription is complete, the transcribed text will be automatically written to the active window.
12
17
13
18
The transcription can either be done locally through the [faster-whisper Python package](https://github.com/SYSTRAN/faster-whisper/) or through a request to [OpenAI's API](https://platform.openai.com/docs/guides/speech-to-text). By default, the app will use a local model, but you can change this in the [Configuration Options](#configuration-options). If you choose to use the API, you will need to provide your OpenAI API key in a `.env` file.
14
19
15
-
**Fun fact:** Almost the entirety of this project was pair-programmed with [ChatGPT-4](https://openai.com/product/gpt-4) and [GitHub Copilot](https://github.com/features/copilot) using VS Code. Practically every line, including most of this README, was written by AI. After the initial prototype was finished, WhisperWriter was used to write a lot of the prompts as well!
20
+
**Fun fact:** Almost the entirety of the initial release of the project was pair-programmed with [ChatGPT-4](https://openai.com/product/gpt-4) and [GitHub Copilot](https://github.com/features/copilot) using VS Code. Practically every line, including most of this README, was written by AI. After the initial prototype was finished, WhisperWriter was used to write a lot of the prompts as well!
16
21
17
22
## Getting Started
18
23
@@ -22,6 +27,11 @@ Before you can run this app, you'll need to have the following software installe
If you want to run `faster-whisper` on your GPU, you'll also need to install the following NVIDIA libraries:
31
+
32
+
-[cuBLAS for CUDA 11](https://developer.nvidia.com/cublas)
33
+
-[cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn)
34
+
25
35
### Installation
26
36
To set up and run the project, follow these steps:
27
37
@@ -54,7 +64,7 @@ pip install -r requirements.txt
54
64
To switch between running Whisper locally and using the OpenAI API, you need to modify the `src\config.json` file:
55
65
56
66
- If you prefer using the OpenAI API, set `"use_api"` to `true`. You will also need to set up your OpenAI API key in the next step.
57
-
- If you prefer using a local Whisper model, set `"use_api"` to `false`. You may also want to change the device that the model uses; see the [Model Options](#model-options).
67
+
- If you prefer using a local Whisper model, set `"use_api"` to `false`. You may also want to change the device that the model uses; see the [Model Options](#model-options). Note that you need to have the [NVIDIA libraries installed](https://github.com/SYSTRAN/faster-whisper/#gpu) to run the model on your GPU.
58
68
59
69
```
60
70
{
@@ -109,6 +119,7 @@ WhisperWriter uses a configuration file to customize its behaviour. To set up th
109
119
"vad_filter": false
110
120
},
111
121
"activation_key": "ctrl+shift+space",
122
+
"recording_mode": "voice_activity",
112
123
"sound_device": null,
113
124
"sample_rate": 16000,
114
125
"silence_duration": 900,
@@ -137,6 +148,7 @@ WhisperWriter uses a configuration file to customize its behaviour. To set up th
137
148
-`vad_filter`: Set to `true` to use [a voice activity detection (VAD) filter](https://github.com/snakers4/silero-vad) to remove silence from the recording. (Default: `false`)
138
149
#### Customization Options
139
150
-`activation_key`: The keyboard shortcut to activate the recording and transcribing process. (Default: `"ctrl+shift+space"`)
151
+
-`recording_mode`: The recording mode to use. Options include `voice_activity_detection` to use voice activity detection to determine when to stop recording, or `press_to_toggle` to start and stop recording by pressing the activation key, or `hold_to_record` to start recording when the activation key is pressed down and stop recording when the activation key is released. (Default: `"voice_activity"`)
140
152
-`sound_device`: The name of the sound device to use for recording. Set to `null` to let the system automatically choose the default device. To find a device number, run `python -m sounddevice`. (Default: `null`)
141
153
-`sample_rate`: The sample rate in Hz to use for recording. (Default: `16000`)
142
154
-`silence_duration`: The duration in milliseconds to wait for silence before stopping the recording. (Default: `900`)
@@ -145,7 +157,6 @@ WhisperWriter uses a configuration file to customize its behaviour. To set up th
145
157
-`add_trailing_space`: Set to `true` to add a trailing space to the transcribed text. (Default: `true`)
146
158
-`remove_capitalization`: Set to `true` to convert the transcribed text to lowercase. (Default: `false`)
147
159
-`print_to_terminal`: Set to `true` to print the script status and transcribed text to the terminal. (Default: `true`)
148
-
-`push_to_talk`: Set to `true` to enable push to talk. Recording starts when activation-key is pressed down. When activation-key is released, recording stops and transcription starts.
149
160
-`hide_window`: Set to `true` to hide the status window.
150
161
151
162
If any of the configuration options are invalid or not provided, the program will use the default values.
@@ -164,9 +175,6 @@ Below are features I am planning to add in the near future:
164
175
-[ ] Updating GUI
165
176
-[ ] Creating standalone executable file
166
177
167
-
Below are features I plan on investigating and may end up adding in the future:
168
-
-[ ] Push-to-talk option
169
-
170
178
Below are features not currently planned:
171
179
-[ ] Pipelining audio files
172
180
@@ -176,8 +184,9 @@ Contributions are welcome! I created this project for my own personal use and di
176
184
177
185
## Credits
178
186
179
-
-[OpenAI](https://openai.com/) for creating the Whisper model and providing the API.
187
+
-[OpenAI](https://openai.com/) for creating the Whisper model and providing the API. Plus [ChatGPT](https://chat.openai.com/), which was used to write a lot of the initial code for this project.
180
188
-[Guillaume Klein](https://github.com/guillaumekln) for creating the [faster-whisper Python package](https://github.com/SYSTRAN/faster-whisper).
189
+
- All of our [contributors](https://github.com/savbell/whisper-writer/graphs/contributors)!
model_method='OpenAI\'s API'ifconfig['use_api'] else'a local model'
114
+
print(f'Script activated. Whisper is set to run using {model_method}. To change this, modify the "use_api" value in the src\\config.json file.')
117
115
118
-
print(f'Script activated. Whisper is set to run using {method}. To change this, modify the "use_api" value in the src\\config.json file.')
116
+
# Set up local model if needed
119
117
local_model=None
120
118
ifnotconfig['use_api']:
121
119
print('Creating local model...')
122
120
local_model=create_local_model(config)
123
121
print('Local model created.')
124
122
125
-
print(f'Press {format_keystrokes(config["activation_key"])} to start recording and transcribing. Press Ctrl+C on the terminal window to quit.')
123
+
print(f'WhisperWriter is set to record using {config["recording_mode"]}. To change this, modify the "recording_mode" value in the src\\config.json file.')
124
+
print(f'The activation key combo is set to {format_keystrokes(config["activation_key"])}.', end='')
0 commit comments