Skip to content

Commit b1ba14b

Browse files
authored
Add language dropdown with auto-detect and 26 languages (#4)
* Add AGENTS.md with build, lint, style, and architecture guidelines * Add language dropdown with auto-detect and 26 languages Support multilingual transcription via Groq's Whisper API. Default is auto-detect (omits language param, letting Whisper infer). Users can pin a specific language in Settings for better accuracy and latency.
1 parent 114c96e commit b1ba14b

File tree

3 files changed

+260
-2
lines changed

3 files changed

+260
-2
lines changed

AGENTS.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# AGENTS.md — AudioType
2+
3+
> Guidelines for AI coding agents working in this repository.
4+
5+
## Project Overview
6+
7+
AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe via Groq's Whisper API, and the result is typed into the focused app. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
8+
9+
## Build Commands
10+
11+
```bash
12+
# Debug build (used during development)
13+
swift build
14+
15+
# Release build
16+
swift build -c release
17+
18+
# Build + create .app bundle (debug)
19+
make app
20+
21+
# Full dev cycle: kill running app, reset Accessibility, rebuild, install to /Applications, launch
22+
make dev
23+
24+
# Clean all build artifacts
25+
make clean
26+
```
27+
28+
**There is no Xcode project.** The app is built entirely via `Package.swift` + `Makefile`. Do not create `.xcodeproj` or `.xcworkspace` files.
29+
30+
## Lint
31+
32+
```bash
33+
# Run SwiftLint (must pass before merge — CI blocks on failure)
34+
swiftlint lint AudioType
35+
36+
# Format code with swift-format
37+
swift-format -i -r AudioType
38+
39+
# Install lint tools
40+
make setup
41+
```
42+
43+
SwiftLint config is in `.swiftlint.yml`. Key rules:
44+
- **`force_cast`** is an error — always use `as?` with guard/if-let
45+
- **`opening_brace`** — opening braces must be on the same line, preceded by a space
46+
- `trailing_whitespace`, `line_length`, `function_body_length`, `file_length`, `type_body_length` are **disabled**
47+
- See `.swiftlint.yml` for the full opt-in rule list
48+
49+
## Tests
50+
51+
```bash
52+
# Run all tests
53+
swift test
54+
55+
# Run a single test (by filter)
56+
swift test --filter TestClassName
57+
swift test --filter TestClassName/testMethodName
58+
```
59+
60+
Note: the test target may be empty. CI runs `swift test` with `continue-on-error: true`.
61+
62+
## CI Pipeline
63+
64+
CI runs on every push/PR to `main` (`.github/workflows/ci.yml`):
65+
1. **Lint**`swiftlint lint AudioType` (must pass; blocks Build and Test)
66+
2. **Build**`swift build` (debug) + `swift build -c release` + `make app` + codesign verify
67+
3. **Test**`swift test` (soft failure allowed)
68+
69+
Releases (`.github/workflows/release.yml`) trigger on `v*` tags and produce `AudioType.dmg` + `AudioType.zip`.
70+
71+
## Project Structure
72+
73+
```
74+
AudioType/
75+
App/ # App entry point, menu bar controller, transcription orchestration
76+
AudioTypeApp.swift # @main, AppDelegate, onboarding flow
77+
MenuBarController.swift # NSStatusItem, state-driven icon tinting, overlay windows
78+
TranscriptionManager.swift # State machine (idle→recording→processing→idle/error)
79+
Core/ # Business logic
80+
AudioRecorder.swift # AVAudioEngine capture, PCM→16kHz resampling, RMS level
81+
GroqEngine.swift # Groq Whisper API client, WAV encoding, multipart upload
82+
HotKeyManager.swift # CGEventTap for fn key hold detection
83+
TextInserter.swift # CGEvent keyboard simulation to type into focused app
84+
TextPostProcessor.swift # Post-transcription corrections (tech terms, punctuation)
85+
UI/ # SwiftUI views
86+
RecordingOverlay.swift # Floating waveform (recording) / thinking dots (processing)
87+
OnboardingView.swift # First-launch permission + API key setup
88+
SettingsView.swift # API key, model picker, permissions, launch-at-login
89+
Theme.swift # Brand color system (coral palette, adaptive dark/light)
90+
Utilities/
91+
Permissions.swift # Microphone + Accessibility permission helpers
92+
KeychainHelper.swift # File-based secret storage (Application Support, 0600 perms)
93+
Resources/
94+
Assets.xcassets/ # Asset catalog (currently empty)
95+
Resources/
96+
Info.plist # Bundle config (LSUIElement, mic usage description)
97+
AppIcon.icns # App icon (coral gradient)
98+
```
99+
100+
## Code Style
101+
102+
### Imports
103+
- Sort alphabetically: `import AppKit`, `import Foundation`, `import SwiftUI`
104+
- Use specific submodule imports where appropriate: `import os.log` (not `import os`)
105+
- Only import what the file actually uses
106+
107+
### Formatting
108+
- **2-space indentation** (no tabs)
109+
- Opening braces on the **same line** as the declaration
110+
- No trailing whitespace (rule disabled in linter, but keep it clean)
111+
- Use `// MARK: -` sections to organize classes (`// MARK: - Private`, `// MARK: - Transcription`)
112+
113+
### Types & Naming
114+
- **Classes** for stateful objects with reference semantics: `TranscriptionManager`, `AudioRecorder`
115+
- **Enums** for namespaced constants and error types: `AudioTypeTheme`, `GroqEngineError`, `KeychainHelper`
116+
- **Structs** for SwiftUI views: `RecordingOverlay`, `SettingsView`
117+
- camelCase for properties/methods, PascalCase for types
118+
- Identifier names: min 1 char, max 50 chars; `x`, `y`, `i`, `j`, `k` are allowed
119+
120+
### Error Handling
121+
- Define domain-specific error enums conforming to `Error, LocalizedError`
122+
- Provide human-readable `errorDescription` for every case
123+
- Use `do/catch` or `try?` — never `try!`
124+
- Never force-cast (`as!`) — use `as?` with guard/if-let
125+
- Errors shown to user go through `TranscriptionState.error(String)`
126+
127+
### Patterns Used
128+
- **Singleton**: `TranscriptionManager.shared`, `TextPostProcessor.shared`, `AudioLevelMonitor.shared`
129+
- **`@MainActor`** on `TranscriptionManager` — all state mutations on main thread
130+
- **NotificationCenter** for decoupled state communication (`transcriptionStateChanged`, `audioLevelChanged`)
131+
- **`@Published` + ObservableObject** for SwiftUI reactivity
132+
- **Closures** for callbacks: `HotKeyManager(callback:)`, `audioRecorder.onLevelUpdate`
133+
- **`os.log` Logger** with subsystem `"com.audiotype"` — use per-class categories
134+
135+
### Colors & Theming
136+
All colors live in `AudioType/UI/Theme.swift` (`AudioTypeTheme` enum). Never use hardcoded color literals in views. The palette:
137+
- **Coral** `#FF6B6B` — brand color, waveform bars, accents, checkmarks
138+
- **Amber** `#FFB84D` — processing state (thinking dots, menu bar icon)
139+
- **Recording red** `#FF4D4D` — menu bar icon while recording
140+
- Adaptive variants for dark mode (coralLight, amberLight)
141+
142+
### Menu Bar Icon States
143+
- **Idle**: SF Symbol `waveform.circle.fill`, `isTemplate = true` (follows OS appearance)
144+
- **Recording**: same symbol, tinted `nsRecordingRed`, `isTemplate = false`
145+
- **Processing**: `ellipsis.circle.fill`, tinted `nsAmber`, `isTemplate = false`
146+
- **Error**: `exclamationmark.triangle.fill`, tinted `.systemRed`
147+
148+
### Security
149+
- API keys stored in `~/Library/Application Support/AudioType/.secrets` with `0600` permissions
150+
- Never commit `.env`, credentials, or API keys
151+
- Audio is recorded in-memory only — never written to disk
152+
153+
## App Bundle
154+
The `.app` bundle is assembled by `make app` (not Xcode):
155+
- Binary: `.build/debug/AudioType``AudioType.app/Contents/MacOS/AudioType`
156+
- Plist: `Resources/Info.plist``AudioType.app/Contents/Info.plist`
157+
- Icon: `Resources/AppIcon.icns``AudioType.app/Contents/Resources/AppIcon.icns`
158+
- Ad-hoc codesigned: `codesign --force --deep --sign -`
159+
160+
When updating the version, change **both** `CFBundleVersion` and `CFBundleShortVersionString` in `Resources/Info.plist` **and** the display string in `SettingsView.swift`.

AudioType/Core/GroqEngine.swift

Lines changed: 88 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,90 @@ enum GroqModel: String, CaseIterable {
2828
}
2929
}
3030

31+
// MARK: - Transcription Language
32+
33+
enum TranscriptionLanguage: String, CaseIterable, Identifiable {
34+
case auto = "auto"
35+
case english = "en"
36+
case spanish = "es"
37+
case french = "fr"
38+
case german = "de"
39+
case italian = "it"
40+
case portuguese = "pt"
41+
case dutch = "nl"
42+
case russian = "ru"
43+
case chinese = "zh"
44+
case japanese = "ja"
45+
case korean = "ko"
46+
case arabic = "ar"
47+
case hindi = "hi"
48+
case turkish = "tr"
49+
case polish = "pl"
50+
case swedish = "sv"
51+
case danish = "da"
52+
case norwegian = "no"
53+
case finnish = "fi"
54+
case czech = "cs"
55+
case ukrainian = "uk"
56+
case indonesian = "id"
57+
case malay = "ms"
58+
case thai = "th"
59+
case vietnamese = "vi"
60+
case gujarati = "gu"
61+
62+
var id: String { rawValue }
63+
64+
var displayName: String {
65+
switch self {
66+
case .auto: return "Auto-detect"
67+
case .english: return "English"
68+
case .spanish: return "Spanish"
69+
case .french: return "French"
70+
case .german: return "German"
71+
case .italian: return "Italian"
72+
case .portuguese: return "Portuguese"
73+
case .dutch: return "Dutch"
74+
case .russian: return "Russian"
75+
case .chinese: return "Chinese"
76+
case .japanese: return "Japanese"
77+
case .korean: return "Korean"
78+
case .arabic: return "Arabic"
79+
case .hindi: return "Hindi"
80+
case .turkish: return "Turkish"
81+
case .polish: return "Polish"
82+
case .swedish: return "Swedish"
83+
case .danish: return "Danish"
84+
case .norwegian: return "Norwegian"
85+
case .finnish: return "Finnish"
86+
case .czech: return "Czech"
87+
case .ukrainian: return "Ukrainian"
88+
case .indonesian: return "Indonesian"
89+
case .malay: return "Malay"
90+
case .thai: return "Thai"
91+
case .vietnamese: return "Vietnamese"
92+
case .gujarati: return "Gujarati"
93+
}
94+
}
95+
96+
/// ISO-639-1 code sent to the API, or `nil` for auto-detect.
97+
var isoCode: String? {
98+
self == .auto ? nil : rawValue
99+
}
100+
101+
static var current: TranscriptionLanguage {
102+
get {
103+
if let saved = UserDefaults.standard.string(forKey: "transcriptionLanguage"),
104+
let lang = TranscriptionLanguage(rawValue: saved) {
105+
return lang
106+
}
107+
return .auto
108+
}
109+
set {
110+
UserDefaults.standard.set(newValue.rawValue, forKey: "transcriptionLanguage")
111+
}
112+
}
113+
}
114+
31115
// MARK: - Errors
32116

33117
enum GroqEngineError: Error, LocalizedError {
@@ -118,8 +202,10 @@ class GroqEngine {
118202
contentType: "audio/wav", data: wavData)
119203
// model field
120204
body.appendMultipart(boundary: boundary, name: "model", value: model.rawValue)
121-
// language field
122-
body.appendMultipart(boundary: boundary, name: "language", value: "en")
205+
// language field (omit for auto-detect — Whisper infers the language)
206+
if let langCode = TranscriptionLanguage.current.isoCode {
207+
body.appendMultipart(boundary: boundary, name: "language", value: langCode)
208+
}
123209
// response_format field
124210
body.appendMultipart(boundary: boundary, name: "response_format", value: "json")
125211
// temperature

AudioType/UI/SettingsView.swift

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import SwiftUI
55
struct SettingsView: View {
66
@AppStorage("launchAtLogin") private var launchAtLogin = false
77
@State private var selectedModel = GroqModel.current
8+
@State private var selectedLanguage = TranscriptionLanguage.current
89
@State private var apiKey: String = ""
910
@State private var isApiKeySet: Bool = GroqEngine.isConfigured
1011
@State private var apiKeySaveError: String?
@@ -64,6 +65,17 @@ struct SettingsView: View {
6465
.onChange(of: selectedModel) { newModel in
6566
GroqModel.current = newModel
6667
}
68+
69+
Picker("Language", selection: $selectedLanguage) {
70+
ForEach(TranscriptionLanguage.allCases) { lang in
71+
Text(lang.displayName)
72+
.tag(lang)
73+
}
74+
}
75+
.pickerStyle(.menu)
76+
.onChange(of: selectedLanguage) { newLang in
77+
TranscriptionLanguage.current = newLang
78+
}
6779
} header: {
6880
Text("Transcription (Groq)")
6981
}

0 commit comments

Comments
 (0)