Skip to content

Conversation

@kderusso
Copy link
Member

@kderusso kderusso commented Aug 7, 2025

Provides a prototype implementation of the EXTRACT_SNIPPETS function in ES|QL, for discussion on the best approaches to pursue.

Here is an example of how to call this field:

POST _query?format=txt
{
  "query": """
  FROM books METADATA _score
  | EVAL snippets = extract_snippets(synopsis, "hobbit takes a ring on a long journey", 3, 10)
  | KEEP title, snippets, _score
  | SORT _score DESC 
  | LIMIT 10
  """
}

You can break apart multiple values using MV_EXPAND:

POST _query?format=txt
{
  "query": """
  FROM books METADATA _score
  | EVAL snippets = extract_snippets(synopsis, "hobbit takes a ring on a long journey", 1, 10)
  | MV_EXPAND snippets
  | KEEP title, snippets, _score
  | SORT _score DESC 
  | LIMIT 10
  """
}

EXTRACT_SNIPPETS will work on text and semantic_text fields, though semantic_text fields do not yet support char length so the whole chunk will be returned.

Some notes on this POC:

  • We've had some discussions on whether this should be a function or a command. This POC is currently implemented as a function. We think a function is OK for now, the only reason to use a command is the fact that we want to perform inference via an async call. As long as we only perform inference once (like we do for the semantic match query) a function should be OK.
  • This POC uses the data node to retrieve the field value for snippet extraction. This restriction could potentially be removed by using an in-memory Lucene index to perform the highlighting query if needed.

@kderusso
Copy link
Member Author

kderusso commented Aug 8, 2025

@carlosdelest @jimczi I plan to reach out to you next week RE: this PR.

Two main questions:

  1. Why is the QueryBuilderResolver not doing a full rewrite?
  2. Why is appending BytesRef only ever returning the first item? Does this need to be a List instead?

@kderusso kderusso force-pushed the kderusso/esql-extract-snippets branch 3 times, most recently from cb98553 to 2872969 Compare August 12, 2025 19:39
@kderusso kderusso force-pushed the kderusso/esql-extract-snippets branch from 2872969 to 5b9347c Compare August 12, 2025 19:40
Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when testing this, I noticed we add some weird characters at the end of the snippet.
Example:

    [
      """Harry and Sally have known each other for years, and�����""",
      ...
    ],

@kderusso kderusso changed the title [POC] Support extract_snippets function in ES|QL [WIP] Support extract_snippets function in ES|QL Aug 21, 2025
Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to ExtractSnippetTests like we have for the other functions - when we run these, we will also generate some docs files that need to be included in the PR.
we always include these generated files (some are for Kibana), even if the function is in snapshot.

**Example**

```{applies_to}
stack: preview 9.2.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a placeholder

@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

🔍 Preview links for changed docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants