Murf AI Vision-Based Voice Assistant

MURF-AI-DEMO.mp4

This project is a submission for the MURF AI Coding Challenge. It's a Chrome extension that acts as a personal AI agent, capable of understanding the content of a user's screen and responding to voice queries with voice answers, powered by the Murf TTS API.

Key Features

Voice-Activated Queries: Activate the assistant and ask questions using only your voice.
Visual Understanding: The assistant takes a screenshot of your current tab to understand the context of your query.
Real-time AI Responses: Leverages a multimodal Large Language Model (LLM) to generate intelligent answers.
Natural Voice Output: Uses the Murf AI TTS API to convert the AI's text response into high-quality, natural-sounding speech.
Seamless Integration: Runs as a lightweight and intuitive Chrome extension on any webpage.

How It Works

The workflow is designed to be seamless and intuitive for the user.

Activation: The user clicks the extension icon to launch the full-screen assistant overlay.
Voice Input: The user clicks the record button and speaks their query (e.g., "Summarize this article for me.").
Data Capture: The extension's content script captures the transcribed text and takes a screenshot of the visible webpage.
Backend Processing: The text and screenshot are sent to a cloud-hosted Django backend.
LLM Analysis: The backend forwards the data to a multimodal LLM (via OpenRouter) for analysis.
Voice Generation: The LLM's text response is sent to the Murf AI TTS API, which generates a high-quality audio file.
Audio Response: The URL of the audio file is sent back to the extension, which plays it for the user.

Tech Stack

Frontend: HTML5, CSS3, JavaScript (Chrome Extension APIs)
Backend: Python, Django, Django REST Framework
Text-to-Speech: Murf AI TTS API
AI Model: Multimodal LLM via OpenRouter
Deployment: Gunicorn, Render (Cloud Platform)

Setup and Installation

Backend

Navigate to the backend directory: cd backend
Install dependencies: pip install -r requirements.txt
Create a .env file and add your MURF_API_KEY and OPENROUTER_API_KEY.
Run the server: python manage.py runserver

Frontend (Chrome Extension)

Open Chrome and navigate to chrome://extensions.
Enable "Developer mode".
Click "Load unpacked" and select the extension folder.
Pin the extension to your toolbar for easy access.

Future Scope

The final goal is to help users with an instant, vision-based voice assistant on their web pages. Future improvements could include:

Streaming audio responses for faster feedback.
Deeper page context by analyzing the page's HTML content.
A more conversational, multi-turn dialogue system.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
PRIVACY.md		PRIVACY.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Murf AI Vision-Based Voice Assistant

Key Features

How It Works

Tech Stack

Setup and Installation

Backend

Frontend (Chrome Extension)

Future Scope

About

Uh oh!

Releases

Packages

Languages

jayanthkonanki/Murf-AI-Vision

Folders and files

Latest commit

History

Repository files navigation

Murf AI Vision-Based Voice Assistant

Key Features

How It Works

Tech Stack

Setup and Installation

Backend

Frontend (Chrome Extension)

Future Scope

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages