Skip to content

codelibs/fess-webapp-semantic-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

85 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fess Semantic Search Plugin

Java CI with Maven Maven Central License

A powerful semantic search plugin for Fess, the open-source enterprise search server. This plugin extends Fess's search capabilities by integrating neural search using OpenSearch's machine learning features and vector similarity search.

✨ Features

  • Neural Search Integration: Leverages OpenSearch ML Commons plugin for semantic vector search
  • Automatic Query Rewriting: Converts traditional text queries to neural queries when appropriate
  • Rank Fusion Processing: Combines traditional and semantic search results for improved relevance
  • Content Chunking: Processes long documents in chunks for better semantic matching
  • Configurable Models: Supports multiple pre-trained transformer models from HuggingFace
  • Seamless Integration: Works as a drop-in plugin for existing Fess installations

πŸš€ Quick Start

Prerequisites

  • Fess 15.0+ (Full-text Enterprise Search Server)
  • OpenSearch 2.x with ML Commons plugin enabled
  • Docker and Docker Compose (recommended for setup)

1. Clone and Setup Docker Environment

git clone https://github.com/codelibs/docker-fess.git
cd docker-fess/compose

2. Configure Plugin in Docker Compose

Add the following line to your compose.yaml:

environment:
  - "FESS_PLUGINS=fess-webapp-semantic-search:15.1.0"

3. Start Services

docker compose -f compose.yaml -f compose-opensearch2.yaml up -d

4. Initialize ML Models and Pipeline

Download and run the setup script:

curl -o setup.sh https://raw.githubusercontent.com/codelibs/fess-webapp-semantic-search/main/tools/setup.sh
chmod +x setup.sh
./setup.sh localhost:9200

The setup script will:

  • Display available pre-trained models
  • Register your selected model in OpenSearch
  • Create the neural search pipeline
  • Provide the configuration settings

5. Configure Fess

In Fess Admin Panel (Admin > General > System Properties), add the configuration provided by the setup script:

fess.semantic_search.pipeline=neural_pipeline
fess.semantic_search.content.field=content_vector
fess.semantic_search.content.dimension=384
fess.semantic_search.content.method=hnsw
fess.semantic_search.content.engine=lucene
fess.semantic_search.content.model_id=<your-model-id>

6. Create Index and Start Crawling

  1. Go to Admin > Maintenance and start reindexing
  2. Create your crawling configuration
  3. Start the crawler
  4. Begin semantic searching!

πŸ“– Available Models

The plugin supports various pre-trained transformer models:

Model Dimension Description
all-MiniLM-L6-v2 384 Fast and efficient, good for general use
all-mpnet-base-v2 768 Higher quality, slower performance
all-distilroberta-v1 768 RoBERTa-based, good performance
msmarco-distilbert-base-tas-b 768 Optimized for passage retrieval
multi-qa-MiniLM-L6-cos-v1 384 Specialized for question answering
paraphrase-multilingual-MiniLM-L12-v2 384 Multilingual support

βš™οΈ Configuration Options

Core Settings

Property Description Default
fess.semantic_search.pipeline Neural search pipeline name -
fess.semantic_search.content.model_id ML model ID in OpenSearch -
fess.semantic_search.content.field Vector field name -
fess.semantic_search.content.dimension Vector dimension size -

Advanced Settings

Property Description Default
fess.semantic_search.content.method Vector search method hnsw
fess.semantic_search.content.engine Vector search engine lucene
fess.semantic_search.content.space_type Distance calculation method cosinesimil
fess.semantic_search.min_score Minimum similarity score -
fess.semantic_search.min_content_length Minimum content length for processing -
fess.semantic_search.content.chunk_size Number of chunks to return 1

HNSW Parameters

Property Description Default
fess.semantic_search.content.param.m HNSW M parameter 16
fess.semantic_search.content.param.ef_construction HNSW ef_construction parameter 100

πŸ—οΈ Architecture

Core Components

  • SemanticSearchHelper: Central component managing neural search configuration and model interactions
  • NeuralQueryBuilder: Custom OpenSearch query builder for neural/vector search queries
  • SemanticPhraseQueryCommand: Converts phrase queries to neural queries when appropriate
  • SemanticTermQueryCommand: Handles term-based semantic search queries
  • SemanticSearcher: Extends Fess's DefaultSearcher for rank fusion processing

Integration Points

  • Query Processing: Integrates with Fess's QueryParser to rewrite queries for semantic search
  • Document Processing: Adds rewrite rules for OpenSearch mapping and settings to support vector fields
  • Rank Fusion: Registers as a searcher in Fess's rank fusion processor
  • DI Container: Uses LastaDi for dependency injection

πŸ”§ Development

Building from Source

git clone https://github.com/codelibs/fess-webapp-semantic-search.git
cd fess-webapp-semantic-search
mvn clean package

Running Tests

mvn test

Code Quality

mvn clean compile javadoc:javadoc

πŸ“¦ Installation Methods

Maven Repository

The plugin is available from Maven Central:

<dependency>
    <groupId>org.codelibs.fess</groupId>
    <artifactId>fess-webapp-semantic-search</artifactId>
    <version>15.1.0</version>
</dependency>

Manual Installation

  1. Download the JAR from Maven Repository
  2. Place it in your Fess webapp/WEB-INF/lib/ directory
  3. Restart Fess

Plugin Management

See the Fess Plugin Guide for detailed installation instructions.

🀝 Contributing

We welcome contributions!

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite (mvn test)
  6. Commit your changes (git commit -m 'Add some amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

Code Style

This project uses:

  • Maven for build management
  • JUnit for testing
  • CheckStyle for code formatting
  • JavaDoc for documentation

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ”— Links

πŸ“Š Version Compatibility

Plugin Version Fess Version OpenSearch Version
15.0.x 15.0+ 2.x
14.9.x 14.9+ 2.x

πŸ†˜ Support

πŸ™ Acknowledgments

  • CodeLibs for developing and maintaining Fess
  • HuggingFace for providing pre-trained transformer models
  • OpenSearch team for ML Commons plugin
  • All contributors who have helped improve this plugin

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published