LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
May 19, 2025 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP32 with >15 minutes uninterrupted conversations globally for AI toys, AI companions, AI devices and more
A lightning-fast, cross-platform AI Assistant App built with React Native.
Worlds first open-source real-time end-to-end spoken dialogue model with personalized voice cloning.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Speech-to-speech AI assistant with natural conversation flow, mid-speech interruption, vision capabilities and AI-initiated follow-ups. Features low-latency audio streaming, dynamic visual feedback, and works with local LLM/TTS services via OpenAI-compatible endpoints.
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。
A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
A modular Swift SDK for audio processing with MLX on Apple Silicon
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
AI-powered YouTube video dubbing pipeline. Automatically transcribes (Whisper), translates (Google), and generates neural dubbing (Edge-TTS) with smart audio-video synchronization and background music preservation.
FreeSWITCH module to stream audio to websocket and receive response
Samantha OS1 is a conversational AI assistant powered by the Realtime API from OpenAI
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-ready architecture.
Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."