Skip to content

Abubakarsidiq01/Aura

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AURA - Advanced Vision Assistant

Voice-Only Conversational AI Platform with Advanced Prompt Engineering

AURA Logo

Node.js Express Google Gemini License

🎯 Overview

AURA is a cutting-edge voice-only conversational AI vision assistant that combines advanced prompt engineering with Google's Gemini AI models. Experience hands-free image analysis through natural voice interactions - just say "Hey AURA" and ask your questions!

✨ Key Features

  • 🎀 Voice-Only Interface - Pure speech interaction, no typing required
  • πŸ” Advanced Image Analysis - Powered by Google Gemini AI models
  • 🧠 Intelligent Prompt Engineering - Context-aware specialized prompts for different use cases
  • πŸ”„ Model Switching - Choose between Gemini 1.5 Flash (fast) or Pro (detailed)
  • 🌐 Interactive Network Background - Beautiful animated canvas that responds to mouse movement
  • πŸ“Š Real-time Dashboard - Global statistics and session management
  • 🎯 Wake Word Detection - Always listening for "Hey AURA" or "AURA"
  • πŸ’¬ Casual Conversational Responses - Short, friendly responses optimized for voice

πŸš€ Quick Start

Prerequisites

  • Node.js 16+ installed
  • Google AI API key
  • Google Cloud API key (for Text-to-Speech)

Installation

  1. Clone the repository

    git clone https://github.com/elikem1z/aura-backend.git
    cd aura-backend
  2. Install dependencies

    npm install
  3. Set up environment variables Create a .env file in the root directory:

    GOOGLE_AI_API_KEY=your_google_ai_api_key_here
    GOOGLE_CLOUD_API_KEY=your_google_cloud_api_key_here
    PORT=3000
  4. Start the application

    npm start
  5. Access AURA Open your browser to http://localhost:3000

🎀 How to Use

Voice Interaction Flow

  1. Upload an Image: Drag & drop or click to upload an image
  2. Say the Wake Word: "Hey AURA" or "AURA"
  3. Ask Your Question: AURA automatically starts listening
  4. Get Response: Receive both visual and voice responses
  5. Continue Conversation: Ready for next voice command

Supported Commands

  • General Analysis: "What's in this image?"
  • Medical Context: "Is this X-ray showing any abnormalities?"
  • Technical Diagnostics: "What's wrong with this equipment?"
  • Creative Analysis: "Analyze the artistic composition"
  • Business Intelligence: "What does this chart tell us?"

πŸ—οΈ Architecture

Unified Node.js Application

AURA Backend/
β”œβ”€β”€ app.js                 # Main server with AI integration
β”œβ”€β”€ prompt-engine.js       # Advanced prompt engineering system
β”œβ”€β”€ public/               # Frontend static files
β”‚   β”œβ”€β”€ index.html        # Voice-only interface
β”‚   └── script.js         # Network animation + app logic
β”œβ”€β”€ uploads/              # Temporary image storage
β”œβ”€β”€ static/audio/         # Generated TTS audio files
└── .env                  # Environment configuration

Technology Stack

  • Backend: Node.js + Express
  • AI Engine: Google Gemini 1.5 Flash/Pro
  • Voice Processing: Web Speech API + Google Cloud TTS
  • Prompt Engineering: Custom intelligent system with 9+ specialized use cases
  • Frontend: Vanilla JavaScript with animated canvas
  • Styling: Modern glassmorphism design with CSS3

🧠 Advanced Features

Prompt Engineering System

The application includes a sophisticated prompt engineering system with:

  • 9 Specialized Use Cases: Medical, architectural, security, business, educational, technical, creative, scientific, and quality control
  • Context-Aware Responses: Adapts based on conversation history and user intent
  • Global Intelligence: Cross-user statistics and pattern recognition
  • Response Optimization: Ultra-short (under 50 words), casual responses for voice interaction

Voice Recognition Capabilities

  • Wake Word Detection: Continuous background listening
  • Command Recognition: Automatic analysis triggering
  • Casual Responses: 15+ different greeting variations
  • Clean TTS: HTML-stripped text for natural speech synthesis

Interactive Network Background

  • 150 Animated Particles: Optimized for 60fps performance
  • Mouse-Responsive Connections: Lines extend from particles to cursor
  • Particle Attraction: Nodes gently move toward mouse
  • Dynamic Connections: Real-time connections between nearby particles

πŸ“Š Performance Metrics

Compared to the previous multi-layer architecture:

  • 50% faster response times (eliminated proxy layer)
  • 60% less memory usage (single Node.js process)
  • 80% faster boot time
  • 68% smaller bundle size

πŸ”§ Configuration

Environment Variables

Variable Description Required
GOOGLE_AI_API_KEY Google AI API key for Gemini Yes
GOOGLE_CLOUD_API_KEY Google Cloud API key for TTS Yes
PORT Server port (default: 3000) No

Model Selection

  • Gemini 1.5 Flash: Fast, efficient for quick analysis (8K tokens)
  • Gemini 1.5 Pro: Advanced analysis for complex tasks (32K tokens)

πŸ› οΈ Development

Start Development Server

npm run dev  # Uses nodemon for auto-restart

Project Structure

β”œβ”€β”€ app.js              # Main application server
β”œβ”€β”€ prompt-engine.js    # AI prompt engineering
β”œβ”€β”€ public/
β”‚   β”œβ”€β”€ index.html     # Voice-only UI
β”‚   └── script.js      # Frontend logic + animations
└── README.md          # This file

🎨 Design Philosophy

  • Voice-First: Optimized for hands-free interaction
  • Professional: Clean, emoji-free design with SVG icons
  • Conversational: Short, casual responses like a friendly assistant
  • Responsive: Beautiful on all screen sizes
  • Fast: Optimized for speed and performance

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Google Gemini AI for powerful vision capabilities
  • Google Cloud TTS for natural voice synthesis
  • Express.js for robust server framework
  • Web Speech API for voice recognition

Built with ❀️ by the AURA Team

🌐 Website β€’ πŸ“§ Contact β€’ πŸ› Issues

About

AURA is a cutting-edge voice-only conversational AI vision assistant that combines advanced prompt engineering with Google's Gemini AI models. Experience hands-free image analysis through natural voice interactions - just say "Hey AURA" and ask your questions!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors