Narrator-Camera: Real-Time Personality-Powered Webcam Narrator 🕶️🎤

Narrator-Camera or NarrateCam is a real-time computer vision project that captures webcam footage, uses a large language model (LLM) to generate funny Deadpool-style (or any personality of choice) descriptions of the scene, enhances them with a personality-aware context engine, and finally speaks them out loud using text-to-speech.

🔧 Features

📸 Captures live video feed from webcam
🧠 Describes frames using an image LLM (LLaVA)
🗣️ Adds contextual, personality-driven humor (Deadpool-style)
🔊 Speaks the description using text-to-speech (TTS)

🗂️ Project Structure

.
├── main.py               # Core script to capture webcam, send frames, display, and speak
├── llm_api.py            # Handles interaction with image and text LLMs
└── text_to_speech.py     # Converts generated text to speech using pyttsx3

🛠️ Requirements

Python 3.8+
OpenCV
pyttsx3
OpenAI-compatible LLM server (like LM Studio)
Local LLaVA (llava-v1.5-7b) and Dolphin (dolphin3.0-llama3.1-8b) models running via http://localhost:1234/v1

Install dependencies

pip install opencv-python pyttsx3 openai

🚀 Getting Started

Run your LLMs: Launch your LLaVA and Dolphin models using LM Studio or another OpenAI-compatible server on localhost:1234.
Start the app: Run the main script:
```
python main.py
```
See it in action:
- A webcam window opens
- Every 5 seconds, the current frame is described in Deadpool-style humor
- The description is read aloud using TTS
- Press q to quit

🧠 How It Works

main.py captures frames from the webcam.
Every 5 seconds, the current frame is base64-encoded and sent to the image model (llava-v1.5-7b) for visual description.
That description is passed into a contextual personality engine (dolphin3.0-llama3.1-8b) that keeps a brief context history.
The final personalized message is spoken aloud using pyttsx3.

🗣 Example Output

🎥 Scene: "A guy sitting in front of a laptop with a look of existential dread. Classic."
🧠 Contextual Deadpool-style message: "Oh, the face of someone trying to debug on a Friday night. Bravo!"
🔊 Spoken via TTS.

📌 Notes

Make sure LM Studio or the API server is running with the correct models.
If audio playback fails, check that your system supports pyttsx3 (Windows and macOS generally work out of the box).

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
testing		testing
LICENSE		LICENSE
README.md		README.md
llm_api.py		llm_api.py
main.py		main.py
text_to_speech.py		text_to_speech.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Narrator-Camera: Real-Time Personality-Powered Webcam Narrator 🕶️🎤

🔧 Features

🗂️ Project Structure

🛠️ Requirements

Install dependencies

🚀 Getting Started

🧠 How It Works

🗣 Example Output

📌 Notes

📄 License

About

Uh oh!

Releases

Packages

Languages

License

Soham-KT/Narrator-Camera

Folders and files

Latest commit

History

Repository files navigation

Narrator-Camera: Real-Time Personality-Powered Webcam Narrator 🕶️🎤

🔧 Features

🗂️ Project Structure

🛠️ Requirements

Install dependencies

🚀 Getting Started

🧠 How It Works

🗣 Example Output

📌 Notes

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages