FuseLLM is a modular AI assistant that automatically detects input types (text, audio, image, multimodal), routes tasks to appropriate Hugging Face pipelines, retrieves relevant context, fuses results, validates outputs, and returns well-formatted responses.
- Multi-Modal Input - Handles text, images, and audio inputs
- Smart Task Routing - Auto-detects input type and intent
- Contextual Understanding - Retrieves and incorporates relevant context
- Pipeline Fusion - Combines multiple model outputs intelligently
- Output Validation - Ensures responses are valid and coherent
- Ethics & Safety - Built-in content filtering
- Dual Interface - CLI and Gradio web UI with dark mode
- Extensible Architecture - Easy to add new models and capabilities
FuseLLM follows this workflow:
- Input Detection - Identifies input type (text/audio/image)
- Task Routing - Determines the appropriate processing pipeline
- Context Retrieval - Fetches relevant context when available
- Pipeline Execution - Runs the selected Hugging Face pipeline
- Result Fusion - Combines pipeline output with context
- Validation & Ethics - Ensures output quality and safety
- Formatting - Presents results in a user-friendly way
FuseLLM leverages the following Hugging Face models:
- Text Generation:
gpt2(default) - For general text generation tasks - Question Answering:
deepset/roberta-base-squad2- For extracting answers from context - Text Classification:
distilbert-base-uncased- For intent classification
- Image Classification:
google/vit-base-patch16-224- For identifying image content - Image Captioning:
nlpconnect/vit-gpt2-image-captioning- For generating descriptions of images
- Speech Recognition:
facebook/wav2vec2-base-960h- For transcribing speech to text
Important Note on Response Quality:
- The current implementation uses relatively small, general-purpose models to ensure broad compatibility and fast response times.
- As a result, the quality of responses may vary and might not always meet production-grade expectations.
- Some responses may be generic, off-topic, or contain inaccuracies.
- This is an experimental implementation and should be used for demonstration and development purposes only.
- For production use, consider fine-tuning models on your specific domain or using larger, more capable models.
- Basic intent detection and input classification
- Multi-modal input handling (text, image, audio)
- Contextual retrieval from files
- Pipeline execution with Hugging Face models
- Basic response validation and ethics checking
- CLI and Gradio interfaces
- Dark mode UI
- Enhanced intent classification
- Improved context retrieval
- Better error handling and user feedback
- Confidence score integration
- Multilingual support
- Advanced intent classification model
- Text-to-image generation
- Object detection visualization
- Code generation and analysis
- Conversation memory
- Contextual chaining across turns
- REST API with FastAPI
- Cloud deployment (Hugging Face Spaces, Docker)
- Voice input/output
- Enhanced file previews
- Chat history sidebar
- Audio waveform visualization
We welcome contributions! Here's how you can help:
- Report Bugs - Open an issue with detailed reproduction steps
- Suggest Features - Share your ideas for new features
- Submit Pull Requests - Help us improve the codebase
- Improve Documentation - Help make the project more accessible
- Test and Report - Try out the latest features and report any issues
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/your-feature - Open a pull request
- Follow PEP 8 guidelines
- Use type hints for better code clarity
- Write docstrings for public functions and classes
- Keep commits atomic and well-described
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for their amazing Transformers library
- The open-source community for their contributions and support
- All contributors who help make this project better
- Python 3.8+
- pip
- (Optional) CUDA for GPU acceleration
-
Clone the repository:
git clone https://github.com/ShashankMk031/FuseLLM.git cd FuseLLM -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
python -m src.main # Launches Gradio web UIpython -m src.main --cli # Run in CLI mode-
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
For development:
pip install -e .[dev]
# Process text input
python -m src.main --text "Your input text here"
# Process an image
python -m src.main --image path/to/image.jpg
# Process an audio file
python -m src.main --audio path/to/audio.wavLaunch the Gradio web interface:
python -m src.main --webThen open your browser to http://localhost:7860
FuseLLM/
├── src/
│ ├── core/ # Core logic components
│ │ ├── orchestrator.py # Central coordination
│ │ ├── task_router.py # Input type and intent detection
│ │ ├── pipeline_manager.py # Hugging Face pipeline management
│ │ ├── fuser.py # Output fusion
│ │ ├── retriever.py # Context retrieval
│ │ ├── validator.py # Output validation
│ │ └── ethics_checker.py # Content filtering
│ │
│ └── utils/ # Utility functions
│ ├── io_utils.py
│ ├── log_utils.py
│ └── text_utils.py
│
├── tests/ # Unit tests
├── data/ # Sample data
└── requirements.txt # Dependencies
- Detects input type (text/image/audio)
- Classifies user intent
- Routes to appropriate processing pipeline
- Manages Hugging Face pipelines
- Handles model loading and caching
- Processes inputs through appropriate models
- Fetches relevant context
- Supports multiple file types
- Implements keyword and semantic search
- Combines model outputs with context
- Handles different task types
- Ensures coherent final output
- Validates output quality
- Filters low-confidence results
- Ensures response coherence
- Filters harmful content
- Implements safety checks
- Customizable filters
pytest tests/This project uses black for code formatting and flake8 for linting. Run these before committing:
black .
flake8- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
Contributions are welcome! and is open for all.
For questions or feedback, please open an issue or contact [Your Email].
Made with ❤️ by Shashank