You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+35-20Lines changed: 35 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
## Project Overview
6
6
7
-
AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe, and the result is typed into the focused app. It supports two transcription backends: **Groq Whisper** (cloud)and **Apple Speech** (on-device). If no Groq API key is configured, the app falls back to Apple's on-device `SFSpeechRecognizer` automatically. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
7
+
AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe, and the result is typed into the focused app. It supports three transcription backends: **Groq Whisper** (cloud), **OpenAI Whisper** (cloud), and **Apple Speech** (on-device). If no cloud API key is configured, the app falls back to Apple's on-device `SFSpeechRecognizer` automatically. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
8
8
9
9
## Build Commands
10
10
@@ -72,23 +72,26 @@ Releases (`.github/workflows/release.yml`) trigger on `v*` tags and produce `Aud
72
72
73
73
### Transcription Engine System
74
74
75
-
The app uses a **protocol-based engine abstraction** to support multiple speech-to-text backends:
75
+
The app uses a **protocol-based engine abstraction**with a shared base class to support multiple speech-to-text backends:
76
76
77
77
```
78
78
TranscriptionEngine (protocol)
79
-
├── GroqEngine — Cloud-based, Groq Whisper API, requires API key
@@ -122,7 +125,7 @@ fn key released → TranscriptionManager.stopRecordingAndTranscribe()
122
125
| Accessibility | Keyboard simulation (TextInserter) | Granted via System Settings |
123
126
| Speech Recognition | Apple Speech engine (on-device) |`NSSpeechRecognitionUsageDescription`|
124
127
125
-
Speech recognition permission is requested on-demand the first time the Apple Speech engine is used. The Groq engine does not require this permission.
128
+
Speech recognition permission is requested on-demand the first time the Apple Speech engine is used. Cloud engines (Groq, OpenAI) do not require this permission.
-**Protocols** for abstractions with multiple implementations: `TranscriptionEngine`
173
-
-**Classes** for stateful objects with reference semantics: `TranscriptionManager`, `AudioRecorder`, `GroqEngine`, `AppleSpeechEngine`
174
-
-**Enums** for namespaced constants and error types: `AudioTypeTheme`, `GroqEngineError`, `AppleSpeechError`, `TranscriptionEngineType`, `KeychainHelper`
175
-
-**Structs** for SwiftUI views: `RecordingOverlay`, `SettingsView`
178
+
-**Classes** for stateful objects with reference semantics: `TranscriptionManager`, `AudioRecorder`, `WhisperAPIEngine`, `GroqEngine`, `OpenAIEngine`, `AppleSpeechEngine`
179
+
-**Enums** for namespaced constants and error types: `AudioTypeTheme`, `WhisperAPIError`, `AppleSpeechError`, `TranscriptionEngineType`, `KeychainHelper`
180
+
-**Structs** for SwiftUI views and config: `RecordingOverlay`, `SettingsView`, `WhisperAPIConfig`
176
181
- camelCase for properties/methods, PascalCase for types
177
182
- Identifier names: min 1 char, max 50 chars; `x`, `y`, `i`, `j`, `k` are allowed
178
183
@@ -184,7 +189,8 @@ Resources/
184
189
- Errors shown to user go through `TranscriptionState.error(String)`
185
190
186
191
### Patterns Used
187
-
-**Protocol abstraction**: `TranscriptionEngine` with `GroqEngine` and `AppleSpeechEngine` implementations
192
+
-**Protocol abstraction**: `TranscriptionEngine` with `WhisperAPIEngine` base class and `AppleSpeechEngine`
193
+
-**Inheritance for shared logic**: `WhisperAPIEngine` base class handles WAV encoding, multipart HTTP, response parsing; `GroqEngine` and `OpenAIEngine` are thin subclasses supplying config
188
194
-**Resolver pattern**: `EngineResolver.resolve()` picks the engine at runtime based on config
0 commit comments