Narrator-Camera or NarrateCam is a real-time computer vision project that captures webcam footage, uses a large language model (LLM) to generate funny Deadpool-style (or any personality of choice) descriptions of the scene, enhances them with a personality-aware context engine, and finally speaks them out loud using text-to-speech.
- 📸 Captures live video feed from webcam
- 🧠 Describes frames using an image LLM (LLaVA)
- 🗣️ Adds contextual, personality-driven humor (Deadpool-style)
- 🔊 Speaks the description using text-to-speech (TTS)
.
├── main.py # Core script to capture webcam, send frames, display, and speak
├── llm_api.py # Handles interaction with image and text LLMs
└── text_to_speech.py # Converts generated text to speech using pyttsx3
- Python 3.8+
- OpenCV
- pyttsx3
- OpenAI-compatible LLM server (like LM Studio)
- Local LLaVA (
llava-v1.5-7b) and Dolphin (dolphin3.0-llama3.1-8b) models running viahttp://localhost:1234/v1
pip install opencv-python pyttsx3 openai-
Run your LLMs: Launch your LLaVA and Dolphin models using LM Studio or another OpenAI-compatible server on
localhost:1234. -
Start the app: Run the main script:
python main.py
-
See it in action:
- A webcam window opens
- Every 5 seconds, the current frame is described in Deadpool-style humor
- The description is read aloud using TTS
- Press
qto quit
main.pycaptures frames from the webcam.- Every 5 seconds, the current frame is base64-encoded and sent to the image model (
llava-v1.5-7b) for visual description. - That description is passed into a contextual personality engine (
dolphin3.0-llama3.1-8b) that keeps a brief context history. - The final personalized message is spoken aloud using
pyttsx3.
🎥 Scene: "A guy sitting in front of a laptop with a look of existential dread. Classic."
🧠 Contextual Deadpool-style message: "Oh, the face of someone trying to debug on a Friday night. Bravo!"
🔊 Spoken via TTS.
- Make sure LM Studio or the API server is running with the correct models.
- If audio playback fails, check that your system supports
pyttsx3(Windows and macOS generally work out of the box).
MIT License