Fess Semantic Search Plugin

A powerful semantic search plugin for Fess, the open-source enterprise search server. This plugin extends Fess's search capabilities by integrating neural search using OpenSearch's machine learning features and vector similarity search.

✨ Features

Neural Search Integration: Leverages OpenSearch ML Commons plugin for semantic vector search
Automatic Query Rewriting: Converts traditional text queries to neural queries when appropriate
Rank Fusion Processing: Combines traditional and semantic search results for improved relevance
Content Chunking: Processes long documents in chunks for better semantic matching
Configurable Models: Supports multiple pre-trained transformer models from HuggingFace
Seamless Integration: Works as a drop-in plugin for existing Fess installations

🚀 Quick Start

Prerequisites

Fess 15.0+ (Full-text Enterprise Search Server)
OpenSearch 2.x with ML Commons plugin enabled
Docker and Docker Compose (recommended for setup)

1. Clone and Setup Docker Environment

git clone https://github.com/codelibs/docker-fess.git
cd docker-fess/compose

2. Configure Plugin in Docker Compose

Add the following line to your compose.yaml:

environment:
  - "FESS_PLUGINS=fess-webapp-semantic-search:15.1.0"

3. Start Services

docker compose -f compose.yaml -f compose-opensearch2.yaml up -d

4. Initialize ML Models and Pipeline

Download and run the setup script:

curl -o setup.sh https://raw.githubusercontent.com/codelibs/fess-webapp-semantic-search/main/tools/setup.sh
chmod +x setup.sh
./setup.sh localhost:9200

The setup script will:

Display available pre-trained models
Register your selected model in OpenSearch
Create the neural search pipeline
Provide the configuration settings

5. Configure Fess

In Fess Admin Panel (Admin > General > System Properties), add the configuration provided by the setup script:

fess.semantic_search.pipeline=neural_pipeline
fess.semantic_search.content.field=content_vector
fess.semantic_search.content.dimension=384
fess.semantic_search.content.method=hnsw
fess.semantic_search.content.engine=lucene
fess.semantic_search.content.model_id=<your-model-id>

6. Create Index and Start Crawling

Go to Admin > Maintenance and start reindexing
Create your crawling configuration
Start the crawler
Begin semantic searching!

📖 Available Models

The plugin supports various pre-trained transformer models:

Model	Dimension	Description
all-MiniLM-L6-v2	384	Fast and efficient, good for general use
all-mpnet-base-v2	768	Higher quality, slower performance
all-distilroberta-v1	768	RoBERTa-based, good performance
msmarco-distilbert-base-tas-b	768	Optimized for passage retrieval
multi-qa-MiniLM-L6-cos-v1	384	Specialized for question answering
paraphrase-multilingual-MiniLM-L12-v2	384	Multilingual support

⚙️ Configuration Options

Core Settings

Property	Description	Default
`fess.semantic_search.pipeline`	Neural search pipeline name	-
`fess.semantic_search.content.model_id`	ML model ID in OpenSearch	-
`fess.semantic_search.content.field`	Vector field name	-
`fess.semantic_search.content.dimension`	Vector dimension size	-

Advanced Settings

Property	Description	Default
`fess.semantic_search.content.method`	Vector search method	`hnsw`
`fess.semantic_search.content.engine`	Vector search engine	`lucene`
`fess.semantic_search.content.space_type`	Distance calculation method	`cosinesimil`
`fess.semantic_search.min_score`	Minimum similarity score	-
`fess.semantic_search.min_content_length`	Minimum content length for processing	-
`fess.semantic_search.content.chunk_size`	Number of chunks to return	`1`

HNSW Parameters

Property	Description	Default
`fess.semantic_search.content.param.m`	HNSW M parameter	`16`
`fess.semantic_search.content.param.ef_construction`	HNSW ef_construction parameter	`100`

🏗️ Architecture

Core Components

SemanticSearchHelper: Central component managing neural search configuration and model interactions
NeuralQueryBuilder: Custom OpenSearch query builder for neural/vector search queries
SemanticPhraseQueryCommand: Converts phrase queries to neural queries when appropriate
SemanticTermQueryCommand: Handles term-based semantic search queries
SemanticSearcher: Extends Fess's DefaultSearcher for rank fusion processing

Integration Points

Query Processing: Integrates with Fess's QueryParser to rewrite queries for semantic search
Document Processing: Adds rewrite rules for OpenSearch mapping and settings to support vector fields
Rank Fusion: Registers as a searcher in Fess's rank fusion processor
DI Container: Uses LastaDi for dependency injection

🔧 Development

Building from Source

git clone https://github.com/codelibs/fess-webapp-semantic-search.git
cd fess-webapp-semantic-search
mvn clean package

Running Tests

mvn test

Code Quality

mvn clean compile javadoc:javadoc

📦 Installation Methods

Maven Repository

The plugin is available from Maven Central:

<dependency>
    <groupId>org.codelibs.fess</groupId>
    <artifactId>fess-webapp-semantic-search</artifactId>
    <version>15.1.0</version>
</dependency>

Manual Installation

Download the JAR from Maven Repository
Place it in your Fess webapp/WEB-INF/lib/ directory
Restart Fess

Plugin Management

See the Fess Plugin Guide for detailed installation instructions.

🤝 Contributing

We welcome contributions!

Development Setup

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests for new functionality
Run the test suite (mvn test)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

This project uses:

Maven for build management
JUnit for testing
CheckStyle for code formatting
JavaDoc for documentation

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🔗 Links

📊 Version Compatibility

Plugin Version	Fess Version	OpenSearch Version
15.0.x	15.0+	2.x
14.9.x	14.9+	2.x

🆘 Support

Documentation: Fess Documentation
Issues: GitHub Issues
Discussions: GitHub Discussions
Community: Fess Community

🙏 Acknowledgments

CodeLibs for developing and maintaining Fess
HuggingFace for providing pre-trained transformer models
OpenSearch team for ML Commons plugin
All contributors who have helped improve this plugin

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

License

codelibs/fess-webapp-semantic-search

Folders and files

Latest commit

History

Repository files navigation