VisionProjectPPP is a Python-based application that I made to fulfill the requirements for my college subject "Applied Programming Practicum". It combines computer vision, gesture recognition, and object detection. It uses Kivy for the graphical user interface, MediaPipe for gesture recognition, and YOLO for object detection. The project also includes a text-to-speech (TTS) system and a database for storing detected objects.
- Gesture Recognition: Detects hand gestures such as "Thumbs Up", "Thumbs Down", and "Closed Fist" using MediaPipe.
- Object Detection: Identifies objects in real-time using the YOLOv8 model.
- Zoom Functionality: Allows zooming into detected objects for better visualization.
- Text-to-Speech (TTS): Provides audio feedback for detected gestures and objects using Google Text-to-Speech (gTTS).
- Database Integration: Stores detected objects with metadata (e.g., confidence, bounding box, timestamp) in an SQLite database.
- Camera Selection: Supports multiple camera inputs with a dropdown for easy switching.
- Interactive GUI: Built with Kivy, featuring gesture feedback, camera feed, and database management.
-
Clone the repository:
git clone https://github.com/your-username/VisionProjectPPP.git cd VisionProjectPPP -
Install the required dependencies:
pip install -r requirements.txt
-
Ensure you have a camera connected to your system.
-
Run the application:
python main.py
-
Use gestures to interact with the application:
- Thumbs Up: Start object detection.
- Thumbs Down: Pause detection and capture objects.
- Closed Fist: Reset the system.
-
Use the GUI buttons:
- Camera: Switch between available cameras.
- Database: View and manage detected objects.
main.py: Entry point for the application.Camera.py: Core logic for camera feed, gesture recognition, and object detection.TTS.py: Text-to-speech manager for audio feedback.database.py: SQLite database handler for storing detected objects.GUI.py: Kivy-based graphical user interface components.README.md: Project documentation.
- Python 3.8 or higher
- OpenCV
- MediaPipe
- Kivy
- YOLOv8 (via
ultralyticspackage) - gTTS
- SQLite3
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes and push them to your fork.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the MIT License. See the LICENSE file for details.
- MediaPipe for gesture recognition.
- YOLO for object detection.
- Kivy for the GUI framework.
- gTTS for text-to-speech functionality.
For questions or feedback, please contact [postmaster@sarifindustries.org].