A Flask web application for classifying images using OpenAI's CLIP (Contrastive Language-Image Pre-Training) model.
Credit: This project uses OpenAI's CLIP model, which was created by OpenAI to connect text and images. All credit for the original CLIP model and research goes to OpenAI.
This application uses the CLIP model to classify images based on user-provided or default labels. It efficiently processes images in individual GPU batches, which allows it to handle a large number of classification labels while managing GPU memory effectively.
- Upload and classify multiple images simultaneously
- Use custom labels or automatically generated ones from WordNet (default: 6000 labels)
- Intelligent GPU/CPU memory management
- Displays classification results with confidence percentages
- Groups images by top classification category
- Optimized for handling thousands of classification labels
- Memory Management: Processes each image individually on GPU, clearing memory between images
- Batch Processing: Handles text tokens in small batches to avoid CUDA memory issues
- Adaptive Computing: Automatically selects between GPU/CPU based on available resources
- WordNet Integration: Uses NLTK's WordNet to generate 6000 comprehensive classification labels
- Python 3.8+
- PyTorch
- Torchvision
- Flask
- CLIP (Contrastive Language-Image Pre-Training)
- NLTK
- PIL (Python Imaging Library)
- Clone the repository:
git clone https://github.com/YanivHaliwa/clip-image-classifier.git
cd clip-image-classifier
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py
-
Open your browser and go to
http://localhost:5000
-
Upload images and optionally add custom labels (comma-separated)
-
View classification results
This application can be run in a Docker container with GPU support:
- Docker installed
- NVIDIA Container Toolkit (for GPU support)
If you haven't installed the NVIDIA Container Toolkit yet (required for GPU support):
# Add the NVIDIA Container Toolkit repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Update apt and install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
Build the Docker image from the project directory:
docker build -t clip-image-classifier .
Run the container with GPU support:
docker run --gpus all -p 5000:5000 -v $(pwd)/static/uploads:/app/static/uploads clip-image-classifier
Alternatively, use docker-compose:
docker-compose up
If you encounter issues with GPU support:
- Check NVIDIA runtime availability:
docker info | grep -i runtime
- Try the older syntax if needed:
docker run --runtime=nvidia -p 5000:5000 -v $(pwd)/static/uploads:/app/static/uploads clip-image-classifier
- If you still have issues, check your NVIDIA driver installation:
nvidia-smi
This application features advanced memory management to handle large sets of classification labels:
- Cleans GPU memory between images with
torch.cuda.empty_cache()
- Uses configuration to reduce memory fragmentation
- Implements intelligent fallback to CPU when GPU memory is constrained
Main interface for uploading images and entering classification labels
Classification results showing confidence scores and image grouping
MIT License
Created by Yaniv Haliwa for security testing and educational purposes.