Skip to content

Conversation

@Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Mar 17, 2025

Adds a hybrid retriever for simple hybrid search across lexical & semantic text fields:

PUT wiki-index
{
  "mappings": { 
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text",
        "fields": {
          "semantic": {
            "type": "semantic_text"
          }
        }
      }
    }
  }
}

POST wiki-index/_search
{
  "retriever": {
    "hybrid": {
      "fields": ["content", "content.semantic"],
      "query": "foo"
    }
  }
}

Semantic reranking using the text_similarity_reranker is integrated:

POST wiki-index/_search
{
  "retriever": {
    "hybrid": {
      "fields": ["content", "content.semantic"],
      "query": "foo",
      "rerank": true,
      "rerank_inference_id": "my-reranker-service",
      "rerank_field": "content"
    }
  }
}

You can use the caret notation to boost matches in certain fields:

POST wiki-index/_search
{
  "retriever": {
    "hybrid": {
      "fields": ["content", "content.semantic^3"],
      "query": "foo"
    }
  }
}

And you can use query_settings to customize the query run against certain fields:

POST wiki-index/_search
{
  "retriever": {
    "hybrid": {
      "fields": ["content", "content.semantic"],
      "query": "foo bar",
      "query_settings": {
        "content": {
          "type": "match",
          "operator": "and"
        }
      }
    }
  }
}

POST wiki-index/_search
{
  "retriever": {
    "hybrid": {
      "fields": ["content", "content.semantic"],
      "query": "foo bar",
      "query_settings": {
        "content": {
          "type": "match_phrase",
          "slop": 1
        }
      }
    }
  }
}

@Mikep86 Mikep86 changed the title Hybrid Retriever POC Hybrid Retriever POC - DO NOT MERGE Mar 17, 2025
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great POC! As it's a POC I didn't review the code itself, but I left one suggestion to think about when we move to production implementation.

One thing that would be really compelling as an example, is to generate a really complex query using the linear retriever, and then generate the same (nicer, smaller) query using the hybrid retriever.

public static final String NAME = "hybrid";
public static final ParseField FIELDS_FIELD = new ParseField("fields");
public static final ParseField QUERY_FIELD = new ParseField("query");
public static final ParseField RERANK_FIELD = new ParseField("rerank");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we incorporated rerank into this POC - I'd also like to see rule incorporated as I think business rules will be a critical part of the hybrid retriever. In that vein I wonder if there's something more generic we can do with the retrievers that we call - Something like this, that could be easily extended as we add any additional future retrievers or want more customization.

POST wiki-index/_search
{
  "retriever": {
    "hybrid": {
      "fields": ["content", "content.semantic"],
      "query": "foo",
      "rank_modifiers": [
      "rule": { 
         ... 
       },
      "rerank": {
        "inference_id": "my-reranker-service",
        "field": "content"
      }]
    }
  }
}

@Mikep86 Mikep86 closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants