Skip to content

KernelOverseer/caLLMe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

caLLMe

caLLMe Banner

A tiny, voice-first assistant that turns your microphone into a real-time conversation with an LLM.
caLLMe listens, transcribes, generates a response, speaks it back, and lets you interrupt at any time – all in just a few hundred lines of Python.

Features

  • Silero Voice Activity Detection – smartly starts/stops recording when you speak.
  • Groq Whisper STT – high-quality speech-to-text.
  • Groq Llama 3 LLM – streams replies token-by-token.
  • Groq PlayAI TTS – natural, low-latency speech synthesis.
  • Async audio queue – responses are played while the next ones are being generated; speak again to interrupt.
  • Simple, hackable architecture – every component lives in src/ and follows small base interfaces (VAD, STT, TTS, Gen, Player).

Quick Start

# 1. Grab the code
$ git clone https://github.com/yourname/caLLMe.git
$ cd caLLMe

# 2. Create & activate a virtual env (optional but recommended)
$ python -m venv .venv
$ source .venv/bin/activate

# 3. Install Python dependencies
$ pip install -r requirements.txt

# 4. Set your Groq API key (required for STT, TTS & LLM)
$ export GROQ_API_KEY="sk_..."

# 5. Run the assistant 🎙️
$ python src/main.py

Customising

  • Change the system prompt & initial dialogue in src/main.py.
  • Swap out models by tweaking default parameters in:
    • src/gen/groq.py (LLM)
    • src/stt/groqWhisper.py (STT)
    • src/tts/groqPlayai.py (TTS)
  • Adjust VAD sensitivity in src/vad/silerovad.py (on_threshold, off_threshold, etc.).

Troubleshooting

  • PyAudio may require system packages (e.g. portaudio, alsa-utils). On Ubuntu:
    sudo apt install portaudio19-dev python3-pyaudio
  • If audio is choppy, lower max_audio_queue in Conversation or tweak model temperatures.

Built with ❤️ & open-source software. Enjoy hacking!

About

Realtime voice conversation with llm models using an asynchronous Voice to Text to Voice pipeline.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages