A powerful multimodal search plugin for Fess that enables semantic search across text, images, and other media formats using CLIP (Contrastive Language-Image Pre-training) embeddings and vector similarity search.
- Multimodal Search: Search across text and images using natural language queries
- CLIP Integration: Leverages OpenAI's CLIP model for generating high-quality embeddings
- Vector Similarity: Uses OpenSearch/Elasticsearch KNN capabilities for fast vector search
- Seamless Integration: Easy installation as a Fess plugin
- Scalable Architecture: Built for enterprise-scale search deployments
- Open Source: Apache 2.0 licensed with full source code availability
The plugin extends Fess with the following components:
- CasClient: Communicates with CLIP-as-a-Service for embedding generation
- MultiModalSearchHelper: Configures vector field mappings and query rewriting
- KNNQueryBuilder: Builds k-nearest neighbor queries for vector similarity search
- CasExtractor: Extracts and processes image content during crawling
- EmbeddingIngester: Handles vector embedding storage and indexing
- Fess: Version 15.0 or higher
- Java: OpenJDK 11 or higher
- OpenSearch/Elasticsearch: With KNN plugin support
- Docker: For running the CLIP service
- GPU (optional): For faster embedding generation
Download the plugin JAR from Maven Central and install it via the Fess administration console.
Alternatively, add the dependency to your project:
<dependency>
<groupId>org.codelibs.fess</groupId>
<artifactId>fess-webapp-multimodal</artifactId>
<version>15.1.0</version>
</dependency>
Clone the repository and start the CLIP API server:
git clone https://github.com/codelibs/fess-webapp-multimodal.git
cd fess-webapp-multimodal/docker
docker compose up -d
The CLIP API will be available at http://localhost:51000
.
Add the following system properties in Fess administration console:
fess.multimodal.content.field=content_vector
fess.multimodal.content.dimension=512
fess.multimodal.content.method=hnsw
fess.multimodal.content.engine=lucene
fess.multimodal.content.space_type=cosinesimil
fess.multimodal.min_score=0.5
- Navigate to Scheduler β Execute Config Reloader
- Navigate to Maintenance β Execute Re-indexing
Configure and start crawling directories containing images and documents. The plugin will automatically:
- Extract text and image content
- Generate CLIP embeddings
- Store vectors in the search index
Search for images using natural language descriptions:
"red sports car on highway"
"sunset over mountains"
"person playing guitar"
Find related content across different media types:
"beach vacation" β Returns both text documents and beach images
"cooking recipe" β Returns recipe text and food images
Property | Description | Default | Example |
---|---|---|---|
fess.multimodal.content.field |
Vector field name | content_vector |
image_vector |
fess.multimodal.content.dimension |
Vector dimensions | 512 |
768 |
fess.multimodal.content.method |
KNN algorithm | hnsw |
ivf |
fess.multimodal.content.engine |
Search engine | lucene |
nmslib |
fess.multimodal.content.space_type |
Distance metric | cosinesimil |
l2 |
fess.multimodal.min_score |
Minimum similarity score | 0.5 |
0.7 |
The CLIP service can be customized by modifying docker/clip_config.yaml
:
jtype: Flow
version: '1'
with:
port: 51000
protocol: http
cors: true
executors:
- name: clip_t
uses:
jtype: CLIPEncoder
metas:
py_modules:
- clip_server.executors.clip_torch
Run the test suite:
mvn clean test
For integration testing with sample data:
# Install test data using FiftyOne
pip install fiftyone
fiftyone zoo datasets load open-images-v7 --split validation --kwargs max_samples=1000 -d ./test-images
# Configure Fess to crawl the test-images directory
- Embedding Generation: ~50ms per image (with GPU), ~200ms (CPU only)
- Search Latency: <100ms for vector similarity queries
- Throughput: 1000+ documents/minute during indexing
- Index Size: ~2KB additional storage per document for vectors
git clone https://github.com/codelibs/fess-webapp-multimodal.git
cd fess-webapp-multimodal
mvn clean package
src/main/java/org/codelibs/fess/multimodal/
βββ client/ # CLIP service client
βββ crawler/ # Content extraction
βββ helper/ # Search configuration
βββ index/ # Query builders
βββ query/ # Query processing
βββ rank/ # Result ranking
βββ util/ # Utilities
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
CLIP Service Connection Failed
# Check if CLIP service is running
curl http://localhost:51000/health
# Check Docker logs
docker logs clip_server
Vector Search Not Working
- Ensure KNN plugin is installed in OpenSearch/Elasticsearch
- Verify vector field mapping in index settings
- Check minimum score threshold configuration
Performance Issues
- Enable GPU support for CLIP service
- Increase JVM heap size for Fess
- Optimize KNN index parameters
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- OpenAI CLIP for the foundational multimodal model
- Jina AI for the CLIP server implementation
- CodeLibs for the Fess search platform
- All contributors who have helped improve this project
- Issues: GitHub Issues
- Documentation: Fess Official Docs