Android app for transcribing media files using OpenAI's Whisper API and GPT-4o models with AI-powered text cleanup. Built with Kotlin, Jetpack Compose, and Hilt for dependency injection.
# Build debug APK
./gradlew assembleDebug
# Build release APK
./gradlew assembleRelease
# Clean build (required after changing local.properties)
./gradlew clean assembleDebug# Run all unit tests
./gradlew test
# Run specific test class
./gradlew test --tests app.hdev.io.aitranscribe.ClipboardHelperTest
# Run instrumented tests (requires emulator/device)
./gradlew connectedAndroidTestFor testing builds with embedded OpenAI API keys, see TESTING.md. Add OPENAI_API_KEY=sk-... to local.properties and run ./gradlew clean assembleDebug.
Presentation Layer (presentation/)
MainActivity: Main transcription UI with file selection, processing states, and result displaySettingsActivity: API key management, model selection, language/prompt configurationHistoryActivity: Local transcription history with statistics and search- Built with Jetpack Compose; state managed directly in Activities (ViewModels not yet implemented)
Data Layer (data/)
TranscriptionDbHelper: SQLiteOpenHelper for local history storageTranscriptionEntry: Data model with transcript text, settings, and statistics (file sizes, duration, character count)- Migration Note: Room migration is planned but not yet implemented
API Layer (api/)
OpenAiApiService: Retrofit service interface for OpenAI APIRetrofitClient: Creates Retrofit instance with OkHttp logging interceptor- Supports three models:
whisper-1,gpt-4o-audio-preview-transcribe,gpt-4o-mini-audio-preview-transcribe
Utils (utils/)
FileProcessingManager: Handles media file copying, conversion to M4A/AAC using Media3 TransformerClipboardHelper: Formats transcription history entries for clipboard with statistics- Uses Hilt
@Singletonand@Injectfor dependency injection
DI Module (di/AppModule)
@Provides @Singleton
fun provideWhisperApiService(@ApplicationContext context: Context): OpenAiApiService
fun provideTranscriptionDbHelper(@ApplicationContext context: Context): TranscriptionDbHelperSecurity (sharedPrefsUtils/SharedPrefsUtils)
- API keys stored in
EncryptedSharedPreferenceswith AES256_GCM encryption - BuildConfig mechanism for embedding test API keys from
local.properties
File Processing Pipeline
- User selects/shares media file → URI received
FileProcessingManager.processAudioFile(uri)copies and converts to M4A/AAC format- File size validated (≤24MB) before upload
- Returns
ProcessingResultwith file metadata (original/processed sizes, filename)
Transcription Flow
- Processed file uploaded via Retrofit multipart request
- Model-specific API endpoint called (Whisper or GPT-4o variants)
- Optional AI cleanup using GPT-4o chat completions with custom prompt
- Result stored in SQLite with comprehensive statistics
- UI displays transcript with copy/share/retry options
Auto-Format Feature
- When enabled in Settings, automatically runs AI cleanup after transcription
- Uses configurable cleanup prompt to enhance readability while preserving content
- Always use Hilt: Inject
OpenAiApiService,TranscriptionDbHelper,FileProcessingManager - Don't create instances manually (e.g.,
RetrofitClient.create()in Activities) - Use
@AndroidEntryPointon Activities that need injection
- Network calls: Use Retrofit's
suspendfunctions withlifecycleScope.launch - File operations: Wrap in
withContext(Dispatchers.IO) - UI updates: Ensure main thread dispatch after background operations
- Use
rememberCoroutineScope()in Composables for async operations
- Audio conversion: Media3 Transformer with AAC encoder (adaptive bitrate: 16-32kbps)
- Output format: Always M4A container with AAC codec
- Temp files: Use
context.filesDirfor intermediate files, clean up on error - File naming: Avoid hardcoded names; use unique identifiers to prevent conflicts
- All queries on background thread (SQLiteOpenHelper is sync-blocking)
- Schema migrations: Use
onUpgrade()with version checks - Statistics tracked:
originalFileSizeBytes,uploadedFileSizeBytes,transcriptLength,audioDurationSeconds
- Runtime storage:
SharedPrefsUtils.getApiKey(context)/saveApiKey(context, key) - Build-time embedding:
BuildConfig.DEFAULT_OPENAI_API_KEYfromlocal.properties - Initialization:
AITranscriptionApp.onCreate()transfers BuildConfig key to encrypted storage if not set - Never log or expose API keys in error messages
- Unit tests:
app/src/test/with JUnit 4 - Existing tests:
ClipboardHelperTest,HistoryFormattingTest - No instrumented tests currently implemented
- Add tests when modifying
ClipboardHelperor history formatting logic
- Never commit API keys to repository
local.propertiesis in.gitignore- don't remove it- Disable sensitive logging in release builds (OkHttp interceptor)
- Use scoped storage APIs, avoid legacy
MANAGE_EXTERNAL_STORAGEpermission
- ViewModels not implemented - state currently in Activities (planned refactor)
- Room not used - still using SQLiteOpenHelper (planned migration)
- Some duplicated transcription model handling code across MainActivity
- File size limit: 24MB after processing (OpenAI API constraint)
- Changing
local.properties: Always run./gradlew cleanto regenerate BuildConfig - Modifying model handling: Update all three model branches (whisper-1, gpt-4o, gpt-4o-mini)
- Database schema changes: Increment
DATABASE_VERSIONand add migration inonUpgrade() - API changes: Update Retrofit service interface and handle backward compatibility
lifecycleScope.launch {
try {
val response = apiService.createTranscription(filePart, modelPart, ...)
if (response.isSuccessful) {
// Handle success
}
} catch (e: Exception) {
// Handle error
}
}@Inject lateinit var fileProcessingManager: FileProcessingManager
lifecycleScope.launch {
try {
val result = fileProcessingManager.processAudioFile(uri)
// Use result.processedFile, result.originalFileSizeBytes, etc.
} catch (e: FileProcessingException) {
// Handle error
}
}val apiKey = SharedPrefsUtils.getApiKey(context)
val language = SharedPrefsUtils.getLanguage(context)
val autoFormat = SharedPrefsUtils.getAutoFormat(context)- README.md: Feature overview and flowcharts
- TESTING.md: API key embedding for test builds
- IMPLEMENTATION_SUMMARY.md: API key mechanism technical details