JarvisPi: Always-On Local Voice Assistant

This project turns a Raspberry Pi 5 into a voice-only assistant (Jarvis-style), powered by local open-source models.
No OpenAI API or cloud dependency required.

Features

🎙️ Voice input with arecord
- Works with USB mics/speakerphones (tested on Poly Sync 10).
- Prime + retry logic ensures reliable capture.
🧠 Speech-to-Text (STT) via whisper.cpp
- Converts captured audio into text locally.
🤖 Local LLM with llama.cpp
- Runs models like qwen2.5-7b-instruct in server mode.
- Reachable via REST API (http://jarvispi.local:8080).
🔊 Text-to-Speech (TTS) via piper
- Natural-sounding voices, runs fully on-device.
- Patched to handle sample-rate quirks.
- Added priming + prepended silence so the first word isn’t clipped.
⚡ Configurable via environment variables (mic device, sample rate, TTS timings, etc.)

Workflow

Record audio → USB mic with arecord
Transcribe → Whisper model (tiny/small recommended on RPi5)
LLM completion → POST transcript to llama-server
TTS synthesis → Piper generates speech
Playback → Audio routed back to speakerphone

Install Summary

Clone repos & build
- llama.cpp (for local LLM)
- whisper.cpp (for STT)
- piper (for TTS)
Run llama-server (systemd unit provided):
```
curl http://jarvispi.local:8080/health
```

Set up voices (Piper):

# example
voices/en_US-amy-medium.onnx
voices/en_US-amy-medium.onnx.json

Symlink espeak-ng data (fix missing phontab):

sudo ln -s /usr/lib/aarch64-linux-gnu/espeak-ng-data /usr/share/espeak-ng-data

Run the voice loop:
```
./ask_voice.py
```

Config (Environment Variables)

Variable	Default	Purpose
`MIC_DEV`	`hw:2,0`	ALSA device for mic capture
`RATE`	`16000`	Capture sample rate
`SECONDS_TO_RECORD`	`5`	Recording duration
`WHISPER_BIN`	`/srv/asr/whisper.cpp/build/bin/whisper-cli`	Path to whisper binary
`WHISPER_MODEL`	`/srv/asr/whisper.cpp/models/ggml-small.en.bin`	STT model
`LLAMA_HOST`	`http://127.0.0.1:8080`	llama-server REST endpoint
`N_PREDICT`	`256`	Tokens to predict
`PIPER_BIN`	`/srv/tts/piper/build/piper`	Path to piper binary
`PIPER_MODEL`	`/srv/tts/voices/en_US-amy-medium.onnx`	TTS voice model
`PIPER_CONFIG`	`$PIPER_MODEL.json`	Voice config JSON
`SPEAKER_DEV`	`plughw:2,0`	ALSA device for playback
`TTS_PAD_MS`	`400`	Prepend silence (ms) to WAV
`TTS_PRIME_SEC`	`0.30`	Duration of ALSA `/dev/zero` prime
`TTS_SETTLE_SEC`	`0.20`	Wait after prime before playback

What Works Today

✅ End-to-end local pipeline (record → Whisper → llama-server → Piper → speaker)
✅ No cloud dependencies (fully offline)
✅ Fix for clipped first words in audio playback
✅ Configurable timings for different hardware
✅ Web access to LLM via http://jarvispi.local:8080

Next Steps

Add wake-word (e.g. with openWakeWord)
Continuous VAD-based listening
Auto-start ask_voice.py as a systemd service
Explore lightweight voices & smaller STT models for faster runtime

License

MIT — free to use, hack, and extend.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
instabuild		instabuild
piper		piper
systemd-files		systemd-files
whisper		whisper
.env.example		.env.example
README.md		README.md
ask_voice.py		ask_voice.py
ask_voiceV1.py		ask_voiceV1.py
voicebot.py		voicebot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JarvisPi: Always-On Local Voice Assistant

Features

Workflow

Install Summary

Config (Environment Variables)

What Works Today

Next Steps

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JarvisPi: Always-On Local Voice Assistant

Features

Workflow

Install Summary

Config (Environment Variables)

What Works Today

Next Steps

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages