中文 | English
Real-time bidirectional simultaneous interpretation system based on Volcengine, supporting Chinese-English translation, optimized for online meeting scenarios like Zoom. Supports Windows and macOS.
If this project helped you in an interview or at work, please leave positive feedback in Issues. Thank you for your support!
-
Dual-Channel Independent Concurrent Execution
- Channel 1: Microphone(Chinese) → Volcengine(S2S) → VB-CABLE(English) → Zoom → Other party hears English
- Channel 2: Zoom(English) → System Audio → Volcengine(S2T) → Subtitle Window(Chinese) → You see Chinese subtitles
-
Physical Isolation, No Echo
- Headphone physical isolation eliminates speaker sound captured by microphone
- Simplified architecture without complex conflict detection logic
-
Smart Subtitle Window
- Semi-transparent floating window with drag and resize support
- Double-click to toggle font size (14pt ↔ 20pt)
- ESC key for quick hide/show
- Intelligent subtitle aggregation and deduplication, English/Chinese displayed on separate lines
- Chinese text beautification (auto-spacing after punctuation)
- History playback support (configurable retention count)
-
Flexible Channel Control
- Channel 1 can be disabled (subtitle-only mode: view translations without voice output)
- Set
zh_to_en.enabled: falseinconfig.yaml
-
High-Performance Audio Processing
- 16kHz mono audio capture with low-latency transmission
- Ogg Opus audio decoding (FFmpeg support)
- Thread-safe audio queue management
- Automatic audio device detection and fallback mechanism
-
Audio Capture Module (
core/audio_capture.py)- Microphone audio capture (user voice)
- Automatic device detection and fallback support
- Real-time audio stream buffering
-
Audio Output Module (
core/audio_output.py)- VB-CABLE virtual audio device output
- Ogg Opus audio format decoding
- Dual output monitoring support (optional)
- Audio playback queue management
-
Volcengine Client (
core/volcengine_client.py)- WebSocket persistent connection management
- S2S (Speech-to-Speech) translation
- S2T (Speech-to-Text) translation
- Automatic reconnection and error retry mechanism
- Protobuf protocol encapsulation
-
Subtitle Window Module (
gui/subtitle_window.py)- Tkinter floating window
- Intelligent text formatting and beautification
- Configurable style and position
- Timestamp display (optional)
- Thread-safe updates
-
System Audio Capture (
core/system_audio_capture.py)- Windows Stereo Mix support
- macOS BlackHole virtual audio support
- Multi-device fallback strategy
[Channel 1 - You speak, they hear]
Microphone → AudioCapturer → VolcengineTranslator(s2s) → OggOpusPlayer → VB-CABLE Input → Zoom Microphone
[Channel 2 - They speak, you see]
Zoom Speaker → System Audio/Virtual Audio → SystemAudioCapturer → VolcengineTranslator(s2t) → SubtitleWindow
- Windows 10/11 - Fully supported (VB-CABLE + Stereo Mix)
- macOS - Fully supported (requires BlackHole)
Through virtual audio devices, this system supports all meeting software that allows audio input device selection:
- Zoom, Microsoft Teams, Tencent Meeting, Feishu/Lark, DingTalk, Google Meet, Webex, Skype
Also supports IM software with voice call features:
- Discord, Telegram (Desktop), WhatsApp (Desktop), WeChat (PC), QQ, Slack
Configuration: In each software's audio settings, set the microphone to CABLE Output (VB-Audio Virtual Cable).
- Operating System: Windows 10/11 or macOS
- Python: 3.8+
- Dependencies: See
requirements.txt
- VB-CABLE - Virtual audio device to pass translated English audio to Zoom
- FFmpeg - For Ogg Opus audio decoding
- BlackHole - Virtual audio device (alternative to VB-CABLE)
use_ffmpegcan be set tofalseon macOS
- Register Volcengine account: https://www.volcengine.com/
- Enable "Simultaneous Interpretation 2.0" service: https://console.volcengine.com/speech/service/10030
- Obtain
app_keyandaccess_key
pip install -r requirements.txtcp config.yaml.example config.yamlEdit config.yaml with your Volcengine credentials and audio device names.
List available audio devices:
python scripts/list_devices.pyOption A: Stereo Mix (Recommended for wired headphones)
-
Enable Windows Stereo Mix
- Right-click volume icon in taskbar → "Sound"
- Switch to "Recording" tab
- Right-click empty area → "Show Disabled Devices"
- Find "Stereo Mix" → Right-click → "Enable"
-
Zoom Audio Settings
- Microphone: CABLE Output (VB-Audio Virtual Cable)
- Speaker: Speakers (Realtek HD Audio) or default speaker
-
Connect wired headphones to Realtek sound card port
Option B: VB-CABLE B + Monitoring (For Bluetooth speakers)
-
Zoom Audio Settings
- Microphone: CABLE Output (VB-Audio Virtual Cable)
- Speaker: CABLE Input (VB-Audio Virtual Cable)
-
Modify configuration file
config.yaml:audio: system_audio: device: "CABLE Output"
-
Set Windows Audio Monitoring
- Right-click "CABLE Output" → Properties → "Listen" tab
- Check "Listen to this device"
- "Playback through this device" → Select your Bluetooth speaker
-
Install BlackHole (16ch version recommended)
-
Update device names in
config.yaml:audio: microphone: device: "MacBook Air Microphone" system_audio: device: "BlackHole 16ch" fallback_device: "BlackHole 16ch" vbcable_output: device: "MacBook Air Speakers" use_ffmpeg: false
-
Zoom Audio Settings
- Microphone: BlackHole 16ch
- Speaker: Default speaker
python main.py # Use default config.yaml
python main.py my_config.yaml # Use custom config fileAll settings are in config.yaml. See config.yaml.example for detailed comments.
audio:
microphone:
device: "Microphone" # Your microphone device name
sample_rate: 16000
channels: 1
chunk_size: 1600 # 100ms @ 16kHz
system_audio:
device: "Stereo Mix" # Windows: Stereo Mix / macOS: BlackHole 16ch
fallback_device: "Microsoft Sound Mapper - Input"
sample_rate: 16000
channels: 1
vbcable_output:
device: "CABLE Input" # Windows: CABLE Input / macOS: Speaker name
sample_rate: 24000
use_ffmpeg: true # Can be set to false on macOSchannels:
zh_to_en:
mode: "s2s" # speech to speech
source_language: "zh"
target_language: "en"
enabled: true # Set to false to disable (subtitle-only mode)
en_to_zh:
mode: "s2t" # speech to text
source_language: "en"
target_language: "zh"
enabled: truesubtitle_window:
enabled: true
width: 600
height: 800
font_size: 14 # Double-click window to toggle size
bg_color: "#000000"
text_color: "#FFFFFF"
opacity: 0.85
position: "top_right" # Options: top_center, bottom_center, top_left, top_right
max_history: 1000
show_timestamp: false- Start program:
python main.py- macOS convenience scripts:
bash scripts/healthcheck.shto check environment,bash scripts/run.shto start
- macOS convenience scripts:
- Check devices: Confirm microphone, speakers, and virtual audio devices are correctly recognized
- Wear headphones: Important! Must use headphones to avoid echo
- Start Zoom: Set microphone to "CABLE Output" (Windows) or "BlackHole 16ch" (macOS)
- Start translation:
- You speak Chinese → They hear English
- They speak English → You see Chinese subtitles (top right corner)
- Drag: Left-click drag to move window
- Double-click: Toggle font size (14pt ↔ 20pt)
- ESC key: Hide/show window
- Close program: Ctrl+C
Problem: Program shows "Device XXX not found"
Solution:
- Run
python scripts/list_devices.pyto view all devices - Update device name in
config.yamlto match actual device name
Problem: Cannot enable Stereo Mix, or no sound from Stereo Mix
Solution:
- Confirm sound card supports Stereo Mix (Realtek, etc.)
- Switch to "Option B: VB-CABLE Solution"
- Check if sound card driver is up to date
Problem: No sound after selecting virtual audio device in meeting software
Solution:
- Check if virtual audio device is correctly installed
- Restart computer and test again
- Test virtual audio device in system sound settings
Problem: Program starts but no subtitle window appears
Solution:
- Check configuration file
subtitle_window.enabled: true - Press ESC key to try showing window
- Check log file for error messages
Problem: Translation result has noticeable delay (>3 seconds)
Solution:
- Check network connection quality
- Confirm Volcengine service status
- Reduce audio quality or adjust chunk_size
- Check CPU usage
Problem: 401 error when connecting to Volcengine
Solution:
app_keyandaccess_keymust be from the same application- Confirm "Simultaneous Interpretation 2.0" service is enabled
- Check if keys have expired or been reset
Problem: Program crashes immediately after startup
Solution:
- Check Python version meets requirements (3.8+)
- Confirm all dependencies are installed
pip install -r requirements.txt - View log file for error details
- Verify Volcengine keys are correct
| Metric | Value |
|---|---|
| End-to-end latency | 1.5-3 seconds (depends on network quality) |
| Audio sample rate | 16kHz (input) / 24kHz (output) |
| Audio format | PCM (input) / Ogg Opus (output) |
| Memory usage | ~200MB |
| CPU usage | 5-15% (depends on translation frequency) |
realtime_translator/
├── main.py # Main entry point
├── config.yaml # Configuration (copied from example, gitignored)
├── config.yaml.example # Config template (Windows/macOS device examples)
├── requirements.txt # Dependencies
├── core/
│ ├── audio_capture.py # Microphone audio capture
│ ├── audio_output.py # Audio output (VB-CABLE)
│ ├── system_audio_capture.py # System audio capture
│ ├── volcengine_client.py # Volcengine client
│ └── conflict_resolver.py # Conflict resolver (v1 legacy)
├── gui/
│ └── subtitle_window.py # Subtitle window module
├── pb2/ # Protobuf generated files
├── scripts/
│ ├── list_devices.py # Audio device listing tool
│ ├── vbcable_translator.py # Single-channel translator (debug)
│ ├── run.sh # macOS/Linux startup script
│ └── healthcheck.sh # Health check script
└── README.md # Documentation
- Channel 2 intelligent subtitle aggregation and deduplication, English/Chinese on separate lines
- Chinese text beautification (auto-spacing after punctuation, smart line wrapping)
- Channel 1 can be disabled, supporting subtitle-only mode
- Subtitle display format optimization (line spacing, padding adjustments)
- macOS support (BlackHole virtual audio)
- Unified configuration template (Windows/macOS merged into one)
- Dual-channel independent concurrent execution
- Headphone physical isolation, no echo
- Simplified architecture, removed conflict detection
- Smart subtitle window (dual-buffer deduplication)
- Thread-safe UI updates
- Comprehensive error handling and retry mechanism
- Support for all mainstream meeting and IM software
- Basic microphone to VB-CABLE translation
- Volcengine S2S integration
- Audio conflict detection mechanism
- Voice Cloning - Customize output voice characteristics
- Noise Reduction - Background elimination and auto volume equalization
- Multi-language - Support for more language pairs
- Terminology Database - Industry-specific terminology customization
- GUI Control Panel - Graphical configuration and monitoring
- Multi-participant Support - Multi-channel audio separation and speaker identification
Note: Some features depend on Volcengine API support.
Author: Sue
Contact:
- GitHub: Im-Sue
- X (Twitter): @ssssy83717
- Telegram: @Sue_muyu
Copyright © 2024 Sue
This project uses a custom license:
- Allowed for personal learning, research, and non-commercial use
- Required to retain copyright notice and author information
- Prohibited without authorization
- Required to contact author for commercial licensing
For commercial licensing, please contact:
- X: @ssssy83717
- Telegram: @Sue_muyu
This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability.
- Volcengine - Simultaneous interpretation API service
- VB-CABLE - Windows virtual audio device
- BlackHole - macOS virtual audio device
- sounddevice - Python audio library
If you encounter issues or have suggestions, please contact:
- Issues: GitHub Issues
- X: @ssssy83717
- Telegram: @Sue_muyu
Quick Start: pip install -r requirements.txt && cp config.yaml.example config.yaml && python main.py
Important: Please use headphones to avoid audio echo!