PatelUtkarsh
diff --git a/‎.github/workflows/release.yml‎
Lines changed: 8 additions & 8 deletions b/‎.github/workflows/release.yml‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 35 additions & 20 deletions b/‎AGENTS.md‎
Lines changed: 35 additions & 20 deletions
diff --git a/‎AudioType/App/TranscriptionManager.swift‎
Lines changed: 3 additions & 3 deletions b/‎AudioType/App/TranscriptionManager.swift‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎AudioType/Core/AppleSpeechEngine.swift‎
Lines changed: 1 addition & 1 deletion b/‎AudioType/Core/AppleSpeechEngine.swift‎
Lines changed: 1 addition & 1 deletion
@@ -98,25 +98,25 @@ jobs:
           body: |
             ## AudioType v${{ steps.version.outputs.version }}
             
-            Voice-to-text for macOS powered by Groq cloud transcription (Whisper Large V3).
+            Voice-to-text for macOS with multiple transcription backends: Groq Whisper, OpenAI Whisper, and Apple Speech (on-device).
             
             ### What's New
-            - Cloud-powered transcription via Groq API for significantly better accuracy
-            - Self-serve: bring your own free Groq API key
-            - Simplified build — no more whisper.cpp compilation required
+            - Cloud-powered transcription via Groq or OpenAI Whisper APIs
+            - On-device fallback via Apple Speech — works without an API key
+            - Self-serve: bring your own Groq (free tier) or OpenAI API key
             
             ### Installation
             1. Download `AudioType.dmg` or `AudioType.zip`
             2. Extract and move `AudioType.app` to your Applications folder
-            3. Open the app and grant Microphone and Accessibility permissions
-            4. Enter your free Groq API key (get one at https://console.groq.com/keys)
+            3. Open the app and grant Microphone, Accessibility, and Speech Recognition permissions
+            4. Optionally enter a Groq or OpenAI API key for cloud transcription
             5. Hold the fn key to dictate
             
             ### Requirements
             - macOS 14.0 or later
             - Apple Silicon or Intel Mac
-            - Internet connection
-            - Free Groq API key
+            - Internet connection (for cloud engines; not needed for Apple Speech)
+            - API key optional (Groq free tier or OpenAI)
             
             > Looking for the offline/local version? See [v1.1.1](https://github.com/PatelUtkarsh/audio-type/releases/tag/v1.1.1)
           files: |
 
@@ -4,7 +4,7 @@
 
 ## Project Overview
 
-AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe, and the result is typed into the focused app. It supports two transcription backends: **Groq Whisper** (cloud) and **Apple Speech** (on-device). If no Groq API key is configured, the app falls back to Apple's on-device `SFSpeechRecognizer` automatically. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
+AudioType is a **native macOS menu bar app** for voice-to-text. Users hold the `fn` key to record, release to transcribe, and the result is typed into the focused app. It supports three transcription backends: **Groq Whisper** (cloud), **OpenAI Whisper** (cloud), and **Apple Speech** (on-device). If no cloud API key is configured, the app falls back to Apple's on-device `SFSpeechRecognizer` automatically. It runs as an `LSUIElement` (no dock icon), built with Swift Package Manager (not Xcode projects).
 
 ## Build Commands
 
@@ -72,23 +72,26 @@ Releases (`.github/workflows/release.yml`) trigger on `v*` tags and produce `Aud
 
 ### Transcription Engine System
 
-The app uses a **protocol-based engine abstraction** to support multiple speech-to-text backends:
+The app uses a **protocol-based engine abstraction** with a shared base class to support multiple speech-to-text backends:
 
 ```
 TranscriptionEngine (protocol)
-├── GroqEngine          — Cloud-based, Groq Whisper API, requires API key
+├── WhisperAPIEngine (base class) — shared WAV encoding, multipart HTTP, response parsing
+│   ├── GroqEngine      — Cloud, Groq Whisper API, requires API key
+│   └── OpenAIEngine    — Cloud, OpenAI Whisper/GPT-4o API, requires API key
 └── AppleSpeechEngine   — On-device, Apple SFSpeechRecognizer, no API key needed
 ```
 
 **`EngineResolver`** selects the active engine at runtime based on user preference (`TranscriptionEngineType`):
 
 | Mode | Behavior |
 |------|----------|
-| **Auto** (default) | Groq if API key exists, otherwise Apple Speech |
+| **Auto** (default) | Groq if key exists → OpenAI if key exists → Apple Speech |
 | **Groq Whisper** | Always use Groq (fails if no key) |
+| **OpenAI Whisper** | Always use OpenAI (fails if no key) |
 | **Apple Speech** | Always use on-device recognition |
 
-Both engines implement a single method: `transcribe(samples: [Float]) async throws -> String` — accepting 16 kHz mono Float32 PCM samples from `AudioRecorder`.
+All engines implement a single method: `transcribe(samples: [Float]) async throws -> String` — accepting 16 kHz mono Float32 PCM samples from `AudioRecorder`.
 
 ### Data Flow
 
@@ -100,11 +103,11 @@ fn key held → HotKeyManager → TranscriptionManager.startRecording()
 fn key released → TranscriptionManager.stopRecordingAndTranscribe()
                                     ↓
                         EngineResolver.resolve() → TranscriptionEngine
-                           ↓                              ↓
-                      GroqEngine                  AppleSpeechEngine
-                   (HTTP multipart →              (SFSpeechAudioBuffer-
-                    Groq Whisper API)              RecognitionRequest)
-                           ↓                              ↓
+                     ↓                  ↓                     ↓
+                GroqEngine         OpenAIEngine        AppleSpeechEngine
+             (WhisperAPIEngine   (WhisperAPIEngine   (SFSpeechAudioBuffer-
+              → Groq API)         → OpenAI API)       RecognitionRequest)
+                     ↓                  ↓                     ↓
                               transcribed text
                                     ↓
                          TextPostProcessor (corrections)
@@ -122,7 +125,7 @@ fn key released → TranscriptionManager.stopRecordingAndTranscribe()
 | Accessibility | Keyboard simulation (TextInserter) | Granted via System Settings |
 | Speech Recognition | Apple Speech engine (on-device) | `NSSpeechRecognitionUsageDescription` |
 
-Speech recognition permission is requested on-demand the first time the Apple Speech engine is used. The Groq engine does not require this permission.
+Speech recognition permission is requested on-demand the first time the Apple Speech engine is used. Cloud engines (Groq, OpenAI) do not require this permission.
 
 ## Project Structure
 
@@ -135,19 +138,21 @@ AudioType/
   Core/                 # Business logic & transcription engines
     AudioRecorder.swift       # AVAudioEngine capture, PCM→16kHz resampling, RMS level
     TranscriptionEngine.swift # TranscriptionEngine protocol, TranscriptionEngineType, EngineResolver
-    GroqEngine.swift          # Groq Whisper API client, WAV encoding, multipart upload
+    WAVEncoder.swift          # WhisperAPIEngine base class, WAVEncoder, WhisperAPIConfig, Data helpers
+    GroqEngine.swift          # GroqEngine subclass, GroqModel enum, TranscriptionLanguage
+    OpenAIEngine.swift        # OpenAIEngine subclass, OpenAIModel enum
     AppleSpeechEngine.swift   # Apple SFSpeechRecognizer on-device transcription
     HotKeyManager.swift       # CGEventTap for fn key hold detection
     TextInserter.swift        # CGEvent keyboard simulation to type into focused app
     TextPostProcessor.swift   # Post-transcription corrections (tech terms, punctuation)
   UI/                   # SwiftUI views
     RecordingOverlay.swift  # Floating waveform (recording) / thinking dots (processing)
     OnboardingView.swift    # First-launch permission setup (API key optional)
-    SettingsView.swift      # Engine picker, API key, model, language, permissions
+    SettingsView.swift      # Engine picker, API keys, models, language, permissions
     Theme.swift             # Brand color system (coral palette, adaptive dark/light)
   Utilities/
     Permissions.swift       # Microphone, Accessibility, Speech Recognition permission helpers
-    KeychainHelper.swift    # File-based secret storage (Application Support, 0600 perms)
+    KeychainHelper.swift    # macOS Keychain-based secret storage
   Resources/
     Assets.xcassets/        # Asset catalog (currently empty)
 Resources/
@@ -170,9 +175,9 @@ Resources/
 
 ### Types & Naming
 - **Protocols** for abstractions with multiple implementations: `TranscriptionEngine`
-- **Classes** for stateful objects with reference semantics: `TranscriptionManager`, `AudioRecorder`, `GroqEngine`, `AppleSpeechEngine`
-- **Enums** for namespaced constants and error types: `AudioTypeTheme`, `GroqEngineError`, `AppleSpeechError`, `TranscriptionEngineType`, `KeychainHelper`
-- **Structs** for SwiftUI views: `RecordingOverlay`, `SettingsView`
+- **Classes** for stateful objects with reference semantics: `TranscriptionManager`, `AudioRecorder`, `WhisperAPIEngine`, `GroqEngine`, `OpenAIEngine`, `AppleSpeechEngine`
+- **Enums** for namespaced constants and error types: `AudioTypeTheme`, `WhisperAPIError`, `AppleSpeechError`, `TranscriptionEngineType`, `KeychainHelper`
+- **Structs** for SwiftUI views and config: `RecordingOverlay`, `SettingsView`, `WhisperAPIConfig`
 - camelCase for properties/methods, PascalCase for types
 - Identifier names: min 1 char, max 50 chars; `x`, `y`, `i`, `j`, `k` are allowed
 
@@ -184,7 +189,8 @@ Resources/
 - Errors shown to user go through `TranscriptionState.error(String)`
 
 ### Patterns Used
-- **Protocol abstraction**: `TranscriptionEngine` with `GroqEngine` and `AppleSpeechEngine` implementations
+- **Protocol abstraction**: `TranscriptionEngine` with `WhisperAPIEngine` base class and `AppleSpeechEngine`
+- **Inheritance for shared logic**: `WhisperAPIEngine` base class handles WAV encoding, multipart HTTP, response parsing; `GroqEngine` and `OpenAIEngine` are thin subclasses supplying config
 - **Resolver pattern**: `EngineResolver.resolve()` picks the engine at runtime based on config
 - **Singleton**: `TranscriptionManager.shared`, `TextPostProcessor.shared`, `AudioLevelMonitor.shared`
 - **`@MainActor`** on `TranscriptionManager` — all state mutations on main thread
@@ -193,7 +199,16 @@ Resources/
 - **Closures** for callbacks: `HotKeyManager(callback:)`, `audioRecorder.onLevelUpdate`
 - **`os.log` Logger** with subsystem `"com.audiotype"` — use per-class categories
 
-### Adding a New Transcription Engine
+### Adding a New Cloud Transcription Provider
+1. Create a new subclass of `WhisperAPIEngine` in `AudioType/Core/`
+2. Override `config` (with `WhisperAPIConfig`) and `currentModel` — that's it for the engine
+3. Add a model enum if the provider has multiple models
+4. Add static convenience methods (`isConfigured`, `setApiKey`, `clearApiKey`)
+5. Add a case to `TranscriptionEngineType` and update `EngineResolver.resolve()`
+6. Update `EngineResolver.anyEngineAvailable` if the engine has standalone availability
+7. Add UI in `SettingsView.swift` (API key field, model picker)
+
+### Adding a Non-Whisper Engine
 1. Create a new class conforming to `TranscriptionEngine` in `AudioType/Core/`
 2. Implement `displayName`, `isAvailable`, and `transcribe(samples:)`
 3. Add a case to `TranscriptionEngineType` and update `EngineResolver.resolve()`
@@ -214,7 +229,7 @@ All colors live in `AudioType/UI/Theme.swift` (`AudioTypeTheme` enum). Never use
 - **Error**: `exclamationmark.triangle.fill`, tinted `.systemRed`
 
 ### Security
-- API keys stored in `~/Library/Application Support/AudioType/.secrets` with `0600` permissions
+- API keys stored in macOS Keychain via `KeychainHelper` (Security framework)
 - Never commit `.env`, credentials, or API keys
 - Audio is recorded in-memory only — never written to disk
 
 
@@ -60,7 +60,7 @@ class TranscriptionManager: ObservableObject {
 
     if !EngineResolver.anyEngineAvailable {
       logger.warning("No transcription engine available")
-      setState(.error("No engine available — add a Groq key or enable Apple Speech"))
+      setState(.error("No engine available — add a cloud API key or enable Apple Speech"))
     } else {
       logger.info("Transcription engine ready: \(engine.displayName)")
     }
@@ -93,7 +93,7 @@ class TranscriptionManager: ObservableObject {
       setState(.idle)
       logger.info("Engine config changed, active engine: \(engine.displayName)")
     } else {
-      setState(.error("No engine available — add a Groq key or enable Apple Speech"))
+      setState(.error("No engine available — add a cloud API key or enable Apple Speech"))
     }
   }
 
@@ -118,7 +118,7 @@ class TranscriptionManager: ObservableObject {
     }
 
     guard EngineResolver.anyEngineAvailable else {
-      setState(.error("No engine available — add a Groq key or enable Apple Speech"))
+      setState(.error("No engine available — add a cloud API key or enable Apple Speech"))
       return
     }
 
 
@@ -30,7 +30,7 @@ enum AppleSpeechError: Error, LocalizedError {
 /// On-device speech-to-text using Apple's Speech framework (`SFSpeechRecognizer`).
 ///
 /// This engine requires no API key and works offline when on-device recognition is
-/// available (macOS 13+). It acts as the fallback when no Groq API key is configured.
+/// available (macOS 13+). It acts as the fallback when no cloud API key is configured.
 class AppleSpeechEngine: TranscriptionEngine {
 
   private let logger = Logger(subsystem: "com.audiotype", category: "AppleSpeechEngine")
Original file line number	Diff line number	Diff line change
`@@ -60,7 +60,7 @@ class TranscriptionManager: ObservableObject {`
`60`	`60`
`61`	`61`	`if !EngineResolver.anyEngineAvailable {`
`62`	`62`	`logger.warning("No transcription engine available")`
`63`		`- setState(.error("No engine available — add a Groq key or enable Apple Speech"))`
	`63`	`+ setState(.error("No engine available — add a cloud API key or enable Apple Speech"))`
`64`	`64`	`} else {`
`65`	`65`	`logger.info("Transcription engine ready: \(engine.displayName)")`
`66`	`66`	`}`
`@@ -93,7 +93,7 @@ class TranscriptionManager: ObservableObject {`
`93`	`93`	`setState(.idle)`
`94`	`94`	`logger.info("Engine config changed, active engine: \(engine.displayName)")`
`95`	`95`	`} else {`
`96`		`- setState(.error("No engine available — add a Groq key or enable Apple Speech"))`
	`96`	`+ setState(.error("No engine available — add a cloud API key or enable Apple Speech"))`
`97`	`97`	`}`
`98`	`98`	`}`
`99`	`99`
`@@ -118,7 +118,7 @@ class TranscriptionManager: ObservableObject {`
`118`	`118`	`}`
`119`	`119`
`120`	`120`	`guard EngineResolver.anyEngineAvailable else {`
`121`		`- setState(.error("No engine available — add a Groq key or enable Apple Speech"))`
	`121`	`+ setState(.error("No engine available — add a cloud API key or enable Apple Speech"))`
`122`	`122`	`return`
`123`	`123`	`}`
`124`	`124`