Skip to content
Daniel edited this page Dec 26, 2025 · 2 revisions

AI Universal Transcriber

Introduction

This is the homepage of a small app that I developed to transcribe all my received voice messages and my own recordings.

Since then, I always have the option to read voice messages as text, which is very helpful in certain situations.

So that others can benefit from this as well, I've created a small Android app. Please test it and give me feedback.

Note on Test Phase

In the current internal test phase, the AI costs are covered by the developer. Therefore, the OpenAI API key is already pre-configured so you can use the app immediately.

To control costs, a budget is set. If too many testers use the app simultaneously, the app's operation may be temporarily disrupted if the budget is exhausted. You can always use your own API key to be independent of this.

Access

This is an Android app, which means it's unfortunately only available on Android devices.

Write me for access to internal test, via email aiuniversaltranscriber@secure.mailbox.org

Instructions

User Interface Concept

To make the app especially intuitive and simple, it offers two different interaction methods that flexibly integrate into your daily routine.

  1. The Intuitive Way: Seamless Integration via "Share"

    This is the recommended method as it starts right where the information originates – for example, in your messenger. When you receive a voice message in WhatsApp, Telegram, or Signal, you don't have to laboriously leave the respective app to temporarily save the file.

    Instead, you use your smartphone's system-wide "Share" function: A long press on the message is enough to send it directly to the AI Universal Transcriber. The app then opens automatically, takes over the file, and immediately starts the conversion and transcription in the background. This workflow is particularly efficient as it maintains the context of your conversation and delivers results within seconds.

  2. The Classic Way: Manual Selection in the App

    For files already on your internal storage – such as your own voice memos, downloaded audio recordings, or video files – the app offers a direct selection function.

    Via the "Select File and Transcribe" button in the main view, you open your device's file manager. This method is somewhat more demanding as it requires you to know the storage location of your files (e.g., in the Downloads folder or deep within messenger media folders). Nevertheless, this mode offers full flexibility to subsequently transcribe older archived recordings or complex media formats and refine them with AI post-processing.

User Interface

There are two views in the app: The main screen and the settings.

Main Screen

This screen is the control center of the app where transcription takes place and results are displayed.

  • Header:

    • Title: "AI Universal Transcriber" – Shows the app name.
    • Subtitle: Briefly explains that the app uses OpenAI's Whisper and GPT-4o models and requires its own API key.
  • "Selected File" Area:

    • Filename: Shows the currently loaded media file (e.g., an .opus file from WhatsApp).
    • Delete Icon (X): Tapping the small "X" on the right edge of the box removes the currently selected file.
  • Transcription Window:

    • Text Field: The recognized text appears here. It's scrollable if the message is longer.
    • "Copy to Clipboard" Button (Share Icon): Immediately copies the entire transcribed text to your phone's clipboard.
    • "Reformat with AI" Button (Pencil Icon): Sends the existing text again to the AI to refine it according to your prompt settings (e.g., add punctuation).
  • Action Buttons:

    • Select File and Transcribe: The main button. Opens the file manager to manually select a file and start the process.
    • Repeat Icon (Circular Arrow): Restarts the process for the already selected file (useful if you've changed the model or language in settings in the meantime).
  • Bottom Navigation Bar:

    • History: Opens the archive of your previous transcriptions including statistics.
    • Settings: Takes you to configuration settings.
    • Help: Shows help texts or app version information.

Settings

In this area, you define how the AI should work.

  • OpenAI API Key Section:

    Note: For the test phase, a standard API key is provided, whose costs are covered by the developer. However, you can always use your own API key.

    • Input Field: Where you enter the secret API key from OpenAI.
    • Save Key: Securely saves the key (encrypted) on the device.
    • Test Key: Validates the key immediately via a test request to OpenAI.
    • Status Display: Shows the currently active key (partially masked for security reasons).
  • Used Transcription Model (Model Selection):

    • Whisper-1: Radio button to select the classic Whisper model.
    • GPT-4o Transcribe: Radio button for the high-precision audio model.
    • GPT-4o-mini Transcribe: Radio button for the efficient, cost-effective version.
  • AI Reformat Prompt:

    • Text Area: Here you define the "instructions" for the AI for post-processing (e.g., "Create paragraphs", "Keep the original language").
    • Reset to Default: Resets the instructions to the app's factory settings.
  • Auto-format Settings:

    • Auto format (Switch): If this switch is activated (blue), the "Reformat" step is automatically executed after each transcription without you having to tap extra.
  • Language Settings for Transcription:

    • Pinned Language: Input field for the ISO language code (e.g., de for German, en for English).
    • Save & Reset: For saving the preferred language or resetting to automatic detection.
  • Whisper-1 Prompt:

    • Input Field: Here you can give Whisper hints (e.g., "This is an interview") to increase accuracy.
  • GPT-4 Audio Prompt:

    • Text Area: Special, detailed instructions for the GPT-4o Audio model (e.g., how to handle unclear words).
    • Save & Reset: For saving or discarding prompt changes.

Technical Details

  • Native Media Processing: The app uses Android's native interfaces to process media files. This means no heavy external libraries are needed and the app is optimally tuned to the operating system.

  • Audio Extraction from Videos: You can use not only pure audio files but all common video formats. As long as the Android system "understands" the video, the app can automatically extract the audio track and prepare it for transcription.

  • Efficient Re-Encoding: Every file is internally re-encoded to M4A/AAC format before being sent to the AI. This guarantees maximum compatibility with OpenAI models while maintaining excellent voice quality.

  • Intelligent Size Management: OpenAI's interface has a capacity limit of 25 MB. Thanks to the app's efficient re-encoding process, however, this limit is almost never reached in practice – even very long voice messages or videos usually remain well below this limit after conversion.

  • Security through Encryption: Your sensitive OpenAI API key is not simply stored as text. The app uses EncryptedSharedPreferences to store the key encrypted on your device according to current security standards.

  • Local Data Storage: All transcriptions and statistics (such as original file size, upload size, and duration) are stored in a local database on your smartphone. There is no automatic cloud sync of your texts – your privacy remains protected.

  • Modern User Interface: The app was developed with Jetpack Compose. This ensures a smooth, reactive UI that seamlessly integrates into modern Android design.

  • AI Power: The app offers direct access to OpenAI's latest models: Whisper-1 for core transcription, as well as GPT-4o and GPT-4o-mini for audio analysis and optional text refinement.

Privacy & Data Security

Protecting your privacy is paramount with this app. Since the app works with your personal OpenAI API key, data processing occurs directly between your device and OpenAI.

1. Data Transmission and Transcription

When you transcribe an audio or video file, the audio track is sent to OpenAI's servers. According to OpenAI's current guidelines for the transcription interface (/v1/audio/transcriptions):

  • No Storage: Your audio data is processed exclusively for the moment of conversion.
  • No Abuse Monitoring: For this specific service, OpenAI does not perform permanent storage or monitoring.

2. AI Post-Processing (Reformat with AI)

The app offers the optional function to make transcribed texts more readable through AI (adding punctuation and paragraphs).

  • Activation: This function is optional and can be (de-)activated at any time in settings.
  • Storage Duration: If you use this function, the text is processed via the chat interface (/v1/chat/completions). OpenAI stores this data for a maximum of 30 days in internal logs exclusively for abuse monitoring. After this period expires, the data is automatically deleted.

3. No Access by the Developer

It's important to emphasize that the developer of this app has no access at any time to your audio files, your transcribed texts, or your API key.

  • Direct Connection: The app communicates encrypted directly with OpenAI.
  • Local Storage: Your API key is stored encrypted on your phone using EncryptedSharedPreferences. Your transcription history also remains exclusively local in a database on your device.

4. No Use for AI Training

Unlike the free web version of ChatGPT, data sent via the API (with your own key) is not by default used to train or improve OpenAI models. Your private conversations thus remain private and do not contribute to the development of general AI.

For further details, we refer to OpenAI's official documentation: Your data and the API.

Outlook

The following enhancements are planned for the future:

  • Confidential AI Computing: Evaluation of Confidential Computing solutions – the highest level of remote confidential AI processing, where data remains encrypted even during processing.

  • European AI Providers: Integration of AI services that are exclusively hosted in Europe and subject to stricter privacy regulations (e.g., GDPR).

  • On-Device Transcription: Implementation of a fully local transcription option directly on the device for maximum privacy without internet connection.

  • Microphone Input: Direct recording and transcription of audio via the built-in microphone.

  • Streaming Output: Real-time streaming of AI responses for faster results and better user experience.

  • Translation Function: Automatic translation of transcribed texts into other languages.

  • Overlay Feature: Quick access via a floating overlay window for even more efficient operation.

  • Further Improvements: Support for additional languages and dialects, as well as optional anonymization functions.

Contact

Developer: Daniel
Email: aiuniversaltranscriber@secure.mailbox.org
Source Code: github.com/dhcgn/AIAudioTranscription