MURF-AI-DEMO.mp4
This project is a submission for the MURF AI Coding Challenge. It's a Chrome extension that acts as a personal AI agent, capable of understanding the content of a user's screen and responding to voice queries with voice answers, powered by the Murf TTS API.
- Voice-Activated Queries: Activate the assistant and ask questions using only your voice.
- Visual Understanding: The assistant takes a screenshot of your current tab to understand the context of your query.
- Real-time AI Responses: Leverages a multimodal Large Language Model (LLM) to generate intelligent answers.
- Natural Voice Output: Uses the Murf AI TTS API to convert the AI's text response into high-quality, natural-sounding speech.
- Seamless Integration: Runs as a lightweight and intuitive Chrome extension on any webpage.
The workflow is designed to be seamless and intuitive for the user.
- Activation: The user clicks the extension icon to launch the full-screen assistant overlay.
- Voice Input: The user clicks the record button and speaks their query (e.g., "Summarize this article for me.").
- Data Capture: The extension's content script captures the transcribed text and takes a screenshot of the visible webpage.
- Backend Processing: The text and screenshot are sent to a cloud-hosted Django backend.
- LLM Analysis: The backend forwards the data to a multimodal LLM (via OpenRouter) for analysis.
- Voice Generation: The LLM's text response is sent to the Murf AI TTS API, which generates a high-quality audio file.
- Audio Response: The URL of the audio file is sent back to the extension, which plays it for the user.
- Frontend: HTML5, CSS3, JavaScript (Chrome Extension APIs)
- Backend: Python, Django, Django REST Framework
- Text-to-Speech: Murf AI TTS API
- AI Model: Multimodal LLM via OpenRouter
- Deployment: Gunicorn, Render (Cloud Platform)
- Navigate to the
backenddirectory:cd backend - Install dependencies:
pip install -r requirements.txt - Create a
.envfile and add yourMURF_API_KEYandOPENROUTER_API_KEY. - Run the server:
python manage.py runserver
- Open Chrome and navigate to
chrome://extensions. - Enable "Developer mode".
- Click "Load unpacked" and select the
extensionfolder. - Pin the extension to your toolbar for easy access.
The final goal is to help users with an instant, vision-based voice assistant on their web pages. Future improvements could include:
- Streaming audio responses for faster feedback.
- Deeper page context by analyzing the page's HTML content.
- A more conversational, multi-turn dialogue system.