You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Apple Speech as fallback when no Groq API key is configured (#5)
* Add AGENTS.md with build, lint, style, and architecture guidelines
* Add language dropdown with auto-detect and 26 languages
Support multilingual transcription via Groq's Whisper API. Default is
auto-detect (omits language param, letting Whisper infer). Users can
pin a specific language in Settings for better accuracy and latency.
* Add Apple Speech as fallback transcription engine when no Groq API key is configured
Introduce a TranscriptionEngine protocol to abstract speech-to-text backends,
with GroqEngine (cloud) and AppleSpeechEngine (on-device via SFSpeechRecognizer)
as implementations. In Auto mode, Groq is preferred when a key exists; otherwise
Apple Speech is used — making the app fully functional without any API key.
- Add TranscriptionEngine protocol and EngineResolver
- Add AppleSpeechEngine using SFSpeechAudioBufferRecognitionRequest
- Make API key optional in onboarding (skip to use Apple Speech)
- Add engine picker in Settings (Auto / Groq / Apple Speech)
- Add Speech framework linking and NSSpeechRecognitionUsageDescription
- Add speech recognition permission handling in Permissions
* Update AGENTS.md with engine abstraction architecture and Apple Speech docs
* Update architecture.md with dual-engine system and Apple Speech documentation
Copy file name to clipboardExpand all lines: AGENTS.md
+81-13Lines changed: 81 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
## Project Overview
6
6
7
-
AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe via Groq's Whisper API, and the result is typed into the focused app. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
7
+
AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe, and the result is typed into the focused app. It supports two transcription backends: **Groq Whisper** (cloud) and **Apple Speech** (on-device). If no Groq API key is configured, the app falls back to Apple's on-device `SFSpeechRecognizer` automatically. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
8
8
9
9
## Build Commands
10
10
@@ -68,6 +68,62 @@ CI runs on every push/PR to `main` (`.github/workflows/ci.yml`):
68
68
69
69
Releases (`.github/workflows/release.yml`) trigger on `v*` tags and produce `AudioType.dmg` + `AudioType.zip`.
70
70
71
+
## Architecture
72
+
73
+
### Transcription Engine System
74
+
75
+
The app uses a **protocol-based engine abstraction** to support multiple speech-to-text backends:
76
+
77
+
```
78
+
TranscriptionEngine (protocol)
79
+
├── GroqEngine — Cloud-based, Groq Whisper API, requires API key
80
+
└── AppleSpeechEngine — On-device, Apple SFSpeechRecognizer, no API key needed
81
+
```
82
+
83
+
**`EngineResolver`** selects the active engine at runtime based on user preference (`TranscriptionEngineType`):
84
+
85
+
| Mode | Behavior |
86
+
|------|----------|
87
+
|**Auto** (default) | Groq if API key exists, otherwise Apple Speech |
88
+
|**Groq Whisper**| Always use Groq (fails if no key) |
89
+
|**Apple Speech**| Always use on-device recognition |
90
+
91
+
Both engines implement a single method: `transcribe(samples: [Float]) async throws -> String` — accepting 16 kHz mono Float32 PCM samples from `AudioRecorder`.
92
+
93
+
### Data Flow
94
+
95
+
```
96
+
fn key held → HotKeyManager → TranscriptionManager.startRecording()
97
+
↓
98
+
AudioRecorder (AVAudioEngine, 16kHz mono PCM)
99
+
↓
100
+
fn key released → TranscriptionManager.stopRecordingAndTranscribe()
0 commit comments