Skip to content

zaxx-q/AIPromptBridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

383 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AIPromptBridge

AIPromptBridge is a Windows desktop application that brings AI assistance to your fingertips. Use global hotkeys to edit text using AI, capture and analyze audio or screen content, and chat with models, all from a lightweight system tray app.

✨ Features

🎯 TextEditTool

Press Ctrl+Space anywhere to invoke AI on selected text:

  • Understand - Explain, Generate Summaries, or Keypoints
  • Edit - Proofread (✏️), Rewrite (πŸ“), or make it Casual (😎)
  • Q&A - Use the second input box in the popup to ask any question about the text
  • Compare - Use the πŸ”€ Compare button to compare selected text with another text selection
  • Custom prompts - Define and group your own actions in the Prompt Editor

Works in any application: browsers, IDEs, Notepad, Word, everywhere.


πŸ“Έ Screen Snip (SnipTool)

Press Ctrl+Alt+X to capture a region of your screen and analyze it with AI:

  • OCR - Extract Text or OCR to Markdown for clean formatting
  • Analysis - Describe, Summarize, or Explain Code
  • Data - Extract Data to tables, Transcribe handwriting, or Smart Cleanup notes
  • Compare - Compare Images to analyze differences between two screenshots
  • Response Modes - Choose to show result in Chat Window, Copy to Clipboard, or Type directly into active field

🎀 Audio Analyzer

Audio Analyzer

Press Ctrl+Alt+A to record and analyze audio:

  • Record - Capture microphone input or system audio (loopback)
  • Transcribe - High-fidelity transcription with timestamps and speaker identification
  • Analyze - Summarize meetings, extract key points, or analyze tone
  • Controls - Visual level meter, compression settings (Opus/MP3), and preview
  • Integration - Send audio directly to chat context for follow-up questions

πŸ”Š Text-to-Speech (TTS)

TTS Tool

Convert text into expressive speech using Gemini TTS models:

  • 30 Voices - Choose from 30 prebuilt voices with distinct styles (Bright, Firm, Upbeat, etc.)
  • AI Director - Automatically generates style instructions for expressive, nuanced speech
  • Two Models - Flash (fast) and Pro (quality) TTS model options
  • Multi-Speaker - Support for up to 2 speakers with individual voice assignment
  • Playback - Built-in audio preview with play/pause and seek controls
  • Export - Save generated audio as WAV files
  • Entry Points - πŸ”Š button in popups, [T] terminal key, hotkey Ctrl+Alt+T, and system tray menu

πŸ’¬ Chat Interface

Chat Interface

Lightweight chat windows with:

  • Streaming responses (real-time typing)
  • Markdown rendering
  • Session history (browse and restore)
  • Multi-theme UI with 7 color schemes

🎨 Theme System

The app supports 7 distinct themes with both Dark and Light variants:

Catppuccin Dracula Nord
Gruvbox OneDark Minimal
High Contrast

Customizable appearance with:

  • 7 themes: Catppuccin, Dracula, Nord, Gruvbox, OneDark, Minimal, High Contrast
  • Dark/Light modes: Each theme has both variants
  • System detection: Auto-switches based on Windows theme
  • Live preview: See theme changes instantly in Settings

πŸ”„ Robust Backend

  • Multi-provider support - Google Gemini, OpenRouter, custom endpoints
  • Automatic key rotation - Switch API keys on rate limits (429, 401, 403)
  • Smart retry logic - Handles errors gracefully with configurable delays
  • Empty response detection - Automatically retries with next key
  • Streaming support - Real-time responses
  • Batch Processing - Async processing for large workloads (Gemini Batch API)
  • Attachment Manager - Efficient external storage for session images, audio, and files

🧰 Tools System (Not accessible in No Console mode)

The File Processor tool enables bulk operations:

  • Batch Processing: Process folders of Images, Audio, Code, Text, or PDFs
  • Audio Optimization: Reduce file size (mono, sample rate) for efficient AI processing
  • Configurable: On-demand tools_config.json creation
  • Smart Handling:
    • Large Files: Auto-switches to Gemini Files API or Chunking logic
    • Checkpoints: Resume interrupted jobs or retry failures
    • Interactive Mode: Pause (P), Stop (S), or Abort (Esc) during processing

The TTS Processor tool enables batch text-to-speech generation:

  • Text Splitting: Lines, paragraphs, sentences, or whole file modes
  • Voice Selection: 30 prebuilt Gemini voices with single or multi-speaker support
  • Style Instructions: Manual, default, no style, or AI Director (single/per-segment)
  • AI Director: Auto-generates expressive style instructions for nuanced speech
  • Output Modes: Individual WAV files per segment or merged into single file
  • Checkpoints: Full resume support with failure retry
  • Interactive Mode: Pause (P), Stop (S) during generation

πŸš€ Quick Start

Download (Recommended)

  1. Download AIPromptBridge.zip from GitHub Releases
  2. Extract and run AIPromptBridge.exe (use AIPromptBridge-NoConsole.exe to hide console)
  3. On first launch, it automatically opens the Settings window in API Keys tab. Enter your API keys (see Getting API Keys), enter key name (Optional), and click Add
  4. Optionally configure selected provider, endpoint URL or models in Provider tab and click Save
  5. The app starts minimized to system tray

From Source (Alternative)

git clone https://github.com/zaxx-q/AIPromptBridge.git
cd AIPromptBridge
pip install -r requirements.txt
python main.py

πŸ“‹ Usage

System Tray

Right-click the tray icon for:

  • Toggle Console or Double click tray icon - Toggle console visibility (Not visible in No Console mode)
  • Session Browser - View chat history
  • Direct Chat - Open text input popup (Ctrl+Space)
  • Screen Snip - Trigger screen capture (Ctrl+Alt+X)
  • Audio Analyzer - Open audio tool (Ctrl+Alt+A)
  • TTS - Open Text-to-Speech window (Ctrl+Alt+T)
  • Settings - Open GUI settings editor
  • Prompt Editor - Edit TextEditTool prompts
  • Edit config.ini - Open configuration file (only visible with --show-console arg)
  • Edit prompts.json - Open prompts file (only visible with --show-console arg)
  • Restart - Restart the application
  • Quit - Exit completely

TextEditTool

  1. Select text in any application
  2. Press Ctrl+Space
  3. Choose an action (Proofread, Rewrite, etc.)
  4. Text is replaced or opened in chat

Without selection: Opens a quick input bar for direct questions.

SnipTool (Screen Snipping)

  1. Press Ctrl+Alt+X
  2. Click and drag to select a screen region
  3. Choose an action (Describe, Extract Text, etc.) or ask a question
  4. Results open in a chat window with the image attached, can also be copied to clipboard or typed directly into the active field.

Audio Tool

  1. Press Ctrl+Alt+A to open the Audio Analyzer
  2. Select input device (Microphone or System Audio)
  3. Click Record to capture audio
  4. Choose an action (Transcribe, Analyze, etc.)
  5. Results are streamed to a chat window or displayed in the result panel

API Endpoints

Note: These endpoints are largely deprecated.

  • ShareX Users: ShareX 19.0.1+ now has a native "Analyze image" feature.
  • Desktop Users: The built-in SnipTool (Ctrl+Alt+X) offers better integration.

Endpoints allow HTTP POST access (disabled by default). See ShareX Setup Guide if needed.

Console Commands

When console is visible, press these keys:

Key Action
S Open session browser (Sessions)
A Open Audio Analyzer
T Open TTSTool window (Text-to-Speech)
X Open Tools menu
L List recent saved sessions
I Show system info (Status)
K Toggle thinking mode
P Switch AI provider
M List available models (Use ?N for details, e.g., ?1)
R Toggle streaming mode
G Open Settings window
W Open prompt editor
H Show help

βš™οΈ Configuration

AIPromptBridge features a comprehensive GUI for all configuration needs, making it easy to manage settings without touching configuration files.

πŸŽ›οΈ Settings Window

Settings Window

Access via System Tray > Settings. This window manages the core application configuration (config.ini):

  • API Keys: Manage keys for Google Gemini, OpenRouter, and Custom providers.
  • Providers: Select default models and configure endpoint URLs.
  • Tools: Configure hotkeys and behavior for TextEditTool, SnipTool, and AudioTool.
  • Theme: Switch between 7 themes and toggle Dark/Light modes.
  • System: Configure server host/port and startup options.

✏️ Prompt Editor

Prompt Editor

Access via System Tray > Prompt Editor. This window lets you customize how the AI responds (prompts.json):

  • Actions: Create, edit, and organize actions for Text, Snip, and Audio tools.
  • Modifiers: Customize the modifier bar buttons (e.g., "Shorter", "Professional").
  • Playground: Test your prompts in real-time with text, images, or audio before saving.
  • Hot-Reload: Changes apply immediately without restarting the app.

πŸ“‚ Manual Configuration

For advanced users, configuration files are stored in the application root:

  • config.ini: Core settings and API keys.
  • prompts.json: AI system prompts and tool configurations.

πŸ’‘ Tips

For Faster Responses

  • Use non-reasoning models (e.g., gemini-2.0-flash instead of gemini-2.5-pro)
  • Disable thinking mode: Press T in console or set thinking_enabled = false
  • Keep streaming enabled for perceived faster responses

For Better Results

  • Enable thinking mode for complex tasks
  • Use specific prompts in TextEditTool
  • Add context when asking questions

Getting API Keys

API Key Management

  • Add multiple API keys (one per line) for automatic rotation
  • If one key hits rate limits, the next one is used automatically
  • The system tracks exhausted keys and skips them
  • Keys rotate on: 429 (rate limit), 401/402/403 (auth errors), empty responses
  • Security: You can also provide keys via Environment Variables instead of config.ini:
    • GEMINI_API_KEY
    • OPENROUTER_API_KEY
    • CUSTOM_API_KEY

πŸ”§ Command Line Options

AIPromptBridge.exe --no-tray          # No tray icon
AIPromptBridge.exe --show-console     # Doesn't automatically hide console at startup, also enable debug logs
AIPromptBridge.exe --no-wt            # Skip Windows Terminal detection and redirection (handled by launcher)

πŸ’‘ Console View: For the best console experience (including full color emoji support), it is highly recommended to use Windows Terminal. AIPromptBridge will attempt to automatically relaunch in Windows Terminal if detected.

πŸ“– Documentation

πŸ“ Requirements

  • Windows 10/11 (uses Windows-specific APIs for tray, console, snipping, and audio capture mechanims)
  • Windows Terminal (Highly recommended for better console view and colors)
  • Python 3.13+ (if running from source)
  • FFmpeg (Required for audio compression and conversion features)
  • API keys for at least one provider (Google Gemini recommended)

πŸ“„ License

MIT License

Attribution & Third-Party Licenses

This project uses Twemoji graphics, licensed under CC-BY 4.0.