|
1 | 1 | # anserini-solr-plugin |
2 | | -Solr Plugin that supports Anserini style query expansion and reranking against Solr indexes |
| 2 | + |
| 3 | +Solr Plugin that supports [Anserini](https://github.com/castorini/anserini) style query expansion and reranking against Solr indexes. |
| 4 | + |
| 5 | +### Description |
| 6 | + |
| 7 | +Supports following similarity implementations for paragraph text. |
| 8 | + |
| 9 | +* **Query Likelihood (QL)** -- via built-in DirichletLM Similarity |
| 10 | +* **BM25** -- via built in BM25 Similarity (default) |
| 11 | + |
| 12 | +Supports following query rewriting functionality (query A). |
| 13 | + |
| 14 | +* **Bag of Words (BoW)** -- constructs OR query out of individual terms |
| 15 | +* **Sequential Dependency Model (SDM)** -- constructs query out of individual terms, bigrams (ordered and unordered). |
| 16 | + |
| 17 | +Supports following query reranking functionality (query B). Constructs more complex query based on results returned from Query A and applies it to the top ${rerankCutoff} results from Query A. |
| 18 | + |
| 19 | +* **Relevance Model 3 (RM3)** -- extracts feature vectors from query and results from query A and top feature vectors from top terms from top documents of the result, and interpolates them to create new reranking query. |
| 20 | +* **Axiomatic Reranker** -- computes mutual information between query terms and terms in top ${rerankedCutoff} documents, plus random documents not from top results, and scored. Uses top K terms to create new reranking query. |
| 21 | +* **Identity Reranker** -- a do-nothing reranker, passes the results from query A unchanged. Useful for debugging. |
| 22 | + |
| 23 | +### Building |
| 24 | + |
| 25 | +Steps to build the JAR file from the code and deploy to Solr are as follows: |
| 26 | + |
| 27 | +```bash |
| 28 | +$ mvn clean package |
| 29 | +$ mkdir -p ${SOLR_HOME}/server/solr/lib |
| 30 | +$ cp target/anserini-solr-plugins-1.0-SNAPSHOT.jar ${SOLR_HOME}/server/solr/lib/ |
| 31 | +``` |
| 32 | + |
| 33 | +### Configuration |
| 34 | + |
| 35 | +The plugin expects additional field types `text_bm` and `text_ql` to be defined in managed-schema.xml of the `${SOLR_HOME}/server/solr/${INDEX_NAME}/conf/managed-schema`. These can be found in [solr/schema-additions.xml](solr/schema-additions.xml). This is needed to support the QL and BM25 similarities defined above. |
| 36 | + |
| 37 | +The plugin requires two fields `para_text_bm` and `para_text_ql` with field types `text_bm` and `text_ql` as defined in the previous step. There are no other specific field requirements. An example schema can be found in [solr/update-schema.sh](solr/update-schema.sh). |
| 38 | + |
| 39 | +Please restart Solr after these steps so its class loader can pick up the new JAR file you provided it in the Building section. |
| 40 | + |
| 41 | +The plugin is defined (in `${SOLR_HOME}/server/solr/${INDEX_NAME}/conf/solrconfig.xml`) as detailed in [solr/update-plugin.sh](solr/update-plugin.sh). |
| 42 | + |
| 43 | +### Running |
| 44 | + |
| 45 | +Plugin can be run using HTTP GET requests. A typical URL would be something like the following. |
| 46 | + |
| 47 | +``` |
| 48 | +http://localhost:8983/solr/my_index_name/anserini?q=what+are+nails+made+of |
| 49 | +``` |
| 50 | + |
| 51 | +Main parameters to tweak behavior are listed below. |
| 52 | + |
| 53 | +* q -- question, URL encoded. Mandatory parameter. |
| 54 | +* sim -- ql (Query Likelihood) or bm (BM25), default bm. |
| 55 | +* qtyoe -- Query Expansion type. Valid values are bow (Bag of Words) or sdm (Sequential Dependency Model), default is bow. |
| 56 | +* rtype -- Reranking type. Valid values are ax (Axiomatic), rm3 (Relevance Model 3), and id (Identity), default is rm3. |
| 57 | +* start and rows -- for pagination, defaults to 0 and 10 respectively. |
| 58 | + |
| 59 | +For certain qtype and rtype, there are some additional parameters that are listed in [solr/update-plugin.sh](solr/update-plugin.sh) with prefixes "sdm.", "ax.", and "rm3." |
| 60 | + |
| 61 | +### Dependencies |
| 62 | + |
| 63 | +Currently the only dependency is Solr, since we have copy-pasted relevant parts of Anserini functionality in the interests of time. Plan is to make Anserini a dependency and leverage its functionality directly. |
| 64 | + |
| 65 | +* Solr 8.1.1 |
| 66 | + |
0 commit comments