Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# AGENTS.md — AudioType

> Guidelines for AI coding agents working in this repository.

## Project Overview

AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe via Groq's Whisper API, and the result is typed into the focused app. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).

## Build Commands

```bash
# Debug build (used during development)
swift build

# Release build
swift build -c release

# Build + create .app bundle (debug)
make app

# Full dev cycle: kill running app, reset Accessibility, rebuild, install to /Applications, launch
make dev

# Clean all build artifacts
make clean
```

**There is no Xcode project.** The app is built entirely via `Package.swift` + `Makefile`. Do not create `.xcodeproj` or `.xcworkspace` files.

## Lint

```bash
# Run SwiftLint (must pass before merge — CI blocks on failure)
swiftlint lint AudioType

# Format code with swift-format
swift-format -i -r AudioType

# Install lint tools
make setup
```

SwiftLint config is in `.swiftlint.yml`. Key rules:
- **`force_cast`** is an error — always use `as?` with guard/if-let
- **`opening_brace`** — opening braces must be on the same line, preceded by a space
- `trailing_whitespace`, `line_length`, `function_body_length`, `file_length`, `type_body_length` are **disabled**
- See `.swiftlint.yml` for the full opt-in rule list

## Tests

```bash
# Run all tests
swift test

# Run a single test (by filter)
swift test --filter TestClassName
swift test --filter TestClassName/testMethodName
```

Note: the test target may be empty. CI runs `swift test` with `continue-on-error: true`.

## CI Pipeline

CI runs on every push/PR to `main` (`.github/workflows/ci.yml`):
1. **Lint** — `swiftlint lint AudioType` (must pass; blocks Build and Test)
2. **Build** — `swift build` (debug) + `swift build -c release` + `make app` + codesign verify
3. **Test** — `swift test` (soft failure allowed)

Releases (`.github/workflows/release.yml`) trigger on `v*` tags and produce `AudioType.dmg` + `AudioType.zip`.

## Project Structure

```
AudioType/
App/ # App entry point, menu bar controller, transcription orchestration
AudioTypeApp.swift # @main, AppDelegate, onboarding flow
MenuBarController.swift # NSStatusItem, state-driven icon tinting, overlay windows
TranscriptionManager.swift # State machine (idle→recording→processing→idle/error)
Core/ # Business logic
AudioRecorder.swift # AVAudioEngine capture, PCM→16kHz resampling, RMS level
GroqEngine.swift # Groq Whisper API client, WAV encoding, multipart upload
HotKeyManager.swift # CGEventTap for fn key hold detection
TextInserter.swift # CGEvent keyboard simulation to type into focused app
TextPostProcessor.swift # Post-transcription corrections (tech terms, punctuation)
UI/ # SwiftUI views
RecordingOverlay.swift # Floating waveform (recording) / thinking dots (processing)
OnboardingView.swift # First-launch permission + API key setup
SettingsView.swift # API key, model picker, permissions, launch-at-login
Theme.swift # Brand color system (coral palette, adaptive dark/light)
Utilities/
Permissions.swift # Microphone + Accessibility permission helpers
KeychainHelper.swift # File-based secret storage (Application Support, 0600 perms)
Resources/
Assets.xcassets/ # Asset catalog (currently empty)
Resources/
Info.plist # Bundle config (LSUIElement, mic usage description)
AppIcon.icns # App icon (coral gradient)
```

## Code Style

### Imports
- Sort alphabetically: `import AppKit`, `import Foundation`, `import SwiftUI`
- Use specific submodule imports where appropriate: `import os.log` (not `import os`)
- Only import what the file actually uses

### Formatting
- **2-space indentation** (no tabs)
- Opening braces on the **same line** as the declaration
- No trailing whitespace (rule disabled in linter, but keep it clean)
- Use `// MARK: -` sections to organize classes (`// MARK: - Private`, `// MARK: - Transcription`)

### Types & Naming
- **Classes** for stateful objects with reference semantics: `TranscriptionManager`, `AudioRecorder`
- **Enums** for namespaced constants and error types: `AudioTypeTheme`, `GroqEngineError`, `KeychainHelper`
- **Structs** for SwiftUI views: `RecordingOverlay`, `SettingsView`
- camelCase for properties/methods, PascalCase for types
- Identifier names: min 1 char, max 50 chars; `x`, `y`, `i`, `j`, `k` are allowed

### Error Handling
- Define domain-specific error enums conforming to `Error, LocalizedError`
- Provide human-readable `errorDescription` for every case
- Use `do/catch` or `try?` — never `try!`
- Never force-cast (`as!`) — use `as?` with guard/if-let
- Errors shown to user go through `TranscriptionState.error(String)`

### Patterns Used
- **Singleton**: `TranscriptionManager.shared`, `TextPostProcessor.shared`, `AudioLevelMonitor.shared`
- **`@MainActor`** on `TranscriptionManager` — all state mutations on main thread
- **NotificationCenter** for decoupled state communication (`transcriptionStateChanged`, `audioLevelChanged`)
- **`@Published` + ObservableObject** for SwiftUI reactivity
- **Closures** for callbacks: `HotKeyManager(callback:)`, `audioRecorder.onLevelUpdate`
- **`os.log` Logger** with subsystem `"com.audiotype"` — use per-class categories

### Colors & Theming
All colors live in `AudioType/UI/Theme.swift` (`AudioTypeTheme` enum). Never use hardcoded color literals in views. The palette:
- **Coral** `#FF6B6B` — brand color, waveform bars, accents, checkmarks
- **Amber** `#FFB84D` — processing state (thinking dots, menu bar icon)
- **Recording red** `#FF4D4D` — menu bar icon while recording
- Adaptive variants for dark mode (coralLight, amberLight)

### Menu Bar Icon States
- **Idle**: SF Symbol `waveform.circle.fill`, `isTemplate = true` (follows OS appearance)
- **Recording**: same symbol, tinted `nsRecordingRed`, `isTemplate = false`
- **Processing**: `ellipsis.circle.fill`, tinted `nsAmber`, `isTemplate = false`
- **Error**: `exclamationmark.triangle.fill`, tinted `.systemRed`

### Security
- API keys stored in `~/Library/Application Support/AudioType/.secrets` with `0600` permissions
- Never commit `.env`, credentials, or API keys
- Audio is recorded in-memory only — never written to disk

## App Bundle
The `.app` bundle is assembled by `make app` (not Xcode):
- Binary: `.build/debug/AudioType` → `AudioType.app/Contents/MacOS/AudioType`
- Plist: `Resources/Info.plist` → `AudioType.app/Contents/Info.plist`
- Icon: `Resources/AppIcon.icns` → `AudioType.app/Contents/Resources/AppIcon.icns`
- Ad-hoc codesigned: `codesign --force --deep --sign -`

When updating the version, change **both** `CFBundleVersion` and `CFBundleShortVersionString` in `Resources/Info.plist` **and** the display string in `SettingsView.swift`.
90 changes: 88 additions & 2 deletions AudioType/Core/GroqEngine.swift
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,90 @@ enum GroqModel: String, CaseIterable {
}
}

// MARK: - Transcription Language

enum TranscriptionLanguage: String, CaseIterable, Identifiable {
case auto = "auto"
case english = "en"
case spanish = "es"
case french = "fr"
case german = "de"
case italian = "it"
case portuguese = "pt"
case dutch = "nl"
case russian = "ru"
case chinese = "zh"
case japanese = "ja"
case korean = "ko"
case arabic = "ar"
case hindi = "hi"
case turkish = "tr"
case polish = "pl"
case swedish = "sv"
case danish = "da"
case norwegian = "no"
case finnish = "fi"
case czech = "cs"
case ukrainian = "uk"
case indonesian = "id"
case malay = "ms"
case thai = "th"
case vietnamese = "vi"
case gujarati = "gu"

var id: String { rawValue }

var displayName: String {
switch self {
case .auto: return "Auto-detect"
case .english: return "English"
case .spanish: return "Spanish"
case .french: return "French"
case .german: return "German"
case .italian: return "Italian"
case .portuguese: return "Portuguese"
case .dutch: return "Dutch"
case .russian: return "Russian"
case .chinese: return "Chinese"
case .japanese: return "Japanese"
case .korean: return "Korean"
case .arabic: return "Arabic"
case .hindi: return "Hindi"
case .turkish: return "Turkish"
case .polish: return "Polish"
case .swedish: return "Swedish"
case .danish: return "Danish"
case .norwegian: return "Norwegian"
case .finnish: return "Finnish"
case .czech: return "Czech"
case .ukrainian: return "Ukrainian"
case .indonesian: return "Indonesian"
case .malay: return "Malay"
case .thai: return "Thai"
case .vietnamese: return "Vietnamese"
case .gujarati: return "Gujarati"
}
}

/// ISO-639-1 code sent to the API, or `nil` for auto-detect.
var isoCode: String? {
self == .auto ? nil : rawValue
}

static var current: TranscriptionLanguage {
get {
if let saved = UserDefaults.standard.string(forKey: "transcriptionLanguage"),
let lang = TranscriptionLanguage(rawValue: saved) {
return lang
}
return .auto
}
set {
UserDefaults.standard.set(newValue.rawValue, forKey: "transcriptionLanguage")
}
}
}

// MARK: - Errors

enum GroqEngineError: Error, LocalizedError {
Expand Down Expand Up @@ -118,8 +202,10 @@ class GroqEngine {
contentType: "audio/wav", data: wavData)
// model field
body.appendMultipart(boundary: boundary, name: "model", value: model.rawValue)
// language field
body.appendMultipart(boundary: boundary, name: "language", value: "en")
// language field (omit for auto-detect — Whisper infers the language)
if let langCode = TranscriptionLanguage.current.isoCode {
body.appendMultipart(boundary: boundary, name: "language", value: langCode)
}
// response_format field
body.appendMultipart(boundary: boundary, name: "response_format", value: "json")
// temperature
Expand Down
12 changes: 12 additions & 0 deletions AudioType/UI/SettingsView.swift
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import SwiftUI
struct SettingsView: View {
@AppStorage("launchAtLogin") private var launchAtLogin = false
@State private var selectedModel = GroqModel.current
@State private var selectedLanguage = TranscriptionLanguage.current
@State private var apiKey: String = ""
@State private var isApiKeySet: Bool = GroqEngine.isConfigured
@State private var apiKeySaveError: String?
Expand Down Expand Up @@ -64,6 +65,17 @@ struct SettingsView: View {
.onChange(of: selectedModel) { newModel in
GroqModel.current = newModel
}

Picker("Language", selection: $selectedLanguage) {
ForEach(TranscriptionLanguage.allCases) { lang in
Text(lang.displayName)
.tag(lang)
}
}
.pickerStyle(.menu)
.onChange(of: selectedLanguage) { newLang in
TranscriptionLanguage.current = newLang
}
} header: {
Text("Transcription (Groq)")
}
Expand Down
Loading