Skip to content

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jan 30, 2025

Another take for adding scoring in FTFs disjunctions.

This approach attempts a simpler approach than #121153.

Changes

  • A new score() method is added to ExpressionEvaluator, so expressions can be evaluated for their result via eval() and also get the corresponding score as a separate score() method.
  • LuceneQueryExpressionEvaluator returns a BooleanBlock for matches, and retains a DoubleBlock that will be returned in the score() method
  • BooleanLogic uses an ExpressionEvaluator that implements the score() method, and sums the scores accordingly
  • Not is implemented as well - there are other mappings in EvalMapper like IsNotNulls, IsNulls that could be implemented in the same way
  • FilterOperator invokes the score() method if needed to change the overall score given the score() of the evaluated expression

Pros

  • It's a new isolated method on the ExpressionEvaluator interface, with a default implementation that only needs to be overriden by LuceneQueryExpressionEvaluator and logical operators evaluators.
  • It's simpler than changing LuceneQueryExpressionEvaluator returned Block to DoubleBlock. Expression evaluation is the same.
  • Changes to boolean logic look simpler than in the previous approach

Cons

  • A new method on the ExpressionEvaluator interface, that few classes care about. Looks unrelated to evaluating the expression and should be kept separate from it.
  • LuceneQueryExpressionEvaluator retains state - the score is calculated at the same time as the expression evaluation, but is returned as part of score(). We could calculate it separately at the cost of doing the query again (without scores for eval(), with scores for score()).

See the previous approach on #121153

@carlosdelest carlosdelest added >enhancement :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.0.0 labels Jan 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carlosdelest, this approach looks cleaner to me compared to the other one, so I decided to start trying some queries with this one. I'm trying to understand how the scores work with the logical operator in ES|QL, and this is the query that I'm starting with:

Q1

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0)" 
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|1.5904955863952637

Q2

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(author, \"Neal\") and length(author) > 0)"
}
'
    author     |author.keyword |     name      | name.keyword  |  page_count   |      release_date      |      _score      
---------------+---------------+---------------+---------------+---------------+------------------------+------------------
Neal Stephenson|Neal Stephenson|Snow Crash     |Snow Crash     |470            |1992-06-01T00:00:00.000Z|1.5404449701309204

Q3

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(author, \"Alastair\") and length(author) > 0)"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|1.5404449701309204

Combine Q1 and Q2 - the scores don't persist on each leg of the OR, I was expecting no change in the scores, seems like each score increases by 1.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0) or (match(author, \"Neal\") and length(author) > 0)"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Neal Stephenson  |Neal Stephenson  |Snow Crash      |Snow Crash      |470            |1992-06-01T00:00:00.000Z|2.5404449701309204
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|2.5904955863952637

Combine Q1 and Q3 - OR and AND return different scores, are they as expected?

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0) or (match(author, \"Alastair\") and length(author) > 0)"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |     _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+-----------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|4.130940556526184
* Connection #0 to host localhost left intact
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0) and (match(author, \"Alastair\") and length(author) > 0)"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|3.1309404373168945

If the lengths are removed, and the matches are pushed down to Lucene, the combined scores are like below

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\")) or (match(author, \"Neal\"))"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Neal Stephenson  |Neal Stephenson  |Snow Crash      |Snow Crash      |470            |1992-06-01T00:00:00.000Z|1.5404449701309204
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|1.5904955863952637
* Connection #0 to host localhost left intact
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\")) or (match(author, \"Alastair\"))"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|3.1309404373168945
* Connection #0 to host localhost left intact
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where (match(name, \"Space\")) and (match(author, \"Alastair\"))"
}
'
     author      | author.keyword  |      name      |  name.keyword  |  page_count   |      release_date      |      _score      
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585            |2000-03-15T00:00:00.000Z|3.1309404373168945

appendMatch();
}

protected void appendMatch() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why appendMatch() throws IOException, but appendNoMatch() doesn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks weird, right?

appendMatch() reads the score from the Scorable, which can result in an IOException. appendNoMatch() does not need to read the scoring.

@fang-xing-esql
Copy link
Member

We might need to define the expected behavior of AND, OR and NOT with scores especially for a little bit complicated queries. Below are some experiments with NOT.

Q1

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where not match(name, \"Space\")"
}
'
    author     |author.keyword |       name       |   name.keyword   |  page_count   |      release_date      |    _score     
---------------+---------------+------------------+------------------+---------------+------------------------+---------------
Neal Stephenson|Neal Stephenson|Snow Crash        |Snow Crash        |470            |1992-06-01T00:00:00.000Z|0.0            
George Orwell  |George Orwell  |1984              |1984              |328            |1985-06-01T00:00:00.000Z|0.0            
Ray Bradbury   |Ray Bradbury   |Fahrenheit 451    |Fahrenheit 451    |227            |1953-10-15T00:00:00.000Z|0.0            
Aldous Huxley  |Aldous Huxley  |Brave New World   |Brave New World   |268            |1932-06-01T00:00:00.000Z|0.0            
Margaret Atwood|Margaret Atwood|The Handmaids Tale|The Handmaids Tale|311            |1985-06-01T00:00:00.000Z|0.0            

Q2

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where not length(name) <= 10"
}
'
     author      | author.keyword  |       name       |   name.keyword   |  page_count   |      release_date      |    _score     
-----------------+-----------------+------------------+------------------+---------------+------------------------+---------------
Alastair Reynolds|Alastair Reynolds|Revelation Space  |Revelation Space  |585            |2000-03-15T00:00:00.000Z|0.0            
Ray Bradbury     |Ray Bradbury     |Fahrenheit 451    |Fahrenheit 451    |227            |1953-10-15T00:00:00.000Z|0.0            
Aldous Huxley    |Aldous Huxley    |Brave New World   |Brave New World   |268            |1932-06-01T00:00:00.000Z|0.0            
Margaret Atwood  |Margaret Atwood  |The Handmaids Tale|The Handmaids Tale|311            |1985-06-01T00:00:00.000Z|0.0            

Combine Q1 and Q2 with OR - there is one row with score = 1.0 that looks unexpected to me.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where not match(name, \"Space\") or not length(name) <= 10"
}
'
     author      | author.keyword  |       name       |   name.keyword   |  page_count   |      release_date      |    _score     
-----------------+-----------------+------------------+------------------+---------------+------------------------+---------------
Neal Stephenson  |Neal Stephenson  |Snow Crash        |Snow Crash        |470            |1992-06-01T00:00:00.000Z|1.0            
Alastair Reynolds|Alastair Reynolds|Revelation Space  |Revelation Space  |585            |2000-03-15T00:00:00.000Z|0.0            
George Orwell    |George Orwell    |1984              |1984              |328            |1985-06-01T00:00:00.000Z|0.0            
Ray Bradbury     |Ray Bradbury     |Fahrenheit 451    |Fahrenheit 451    |227            |1953-10-15T00:00:00.000Z|0.0            
Aldous Huxley    |Aldous Huxley    |Brave New World   |Brave New World   |268            |1932-06-01T00:00:00.000Z|0.0            
Margaret Atwood  |Margaret Atwood  |The Handmaids Tale|The Handmaids Tale|311            |1985-06-01T00:00:00.000Z|0.0            

Combine Q1 and Q2 with AND - the rows all have score = -1

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from books metadata _score | where not match(name, \"Space\") and not length(name) <= 10"
}
'
    author     |author.keyword |       name       |   name.keyword   |  page_count   |      release_date      |    _score     
---------------+---------------+------------------+------------------+---------------+------------------------+---------------
Ray Bradbury   |Ray Bradbury   |Fahrenheit 451    |Fahrenheit 451    |227            |1953-10-15T00:00:00.000Z|-1.0           
Aldous Huxley  |Aldous Huxley  |Brave New World   |Brave New World   |268            |1932-06-01T00:00:00.000Z|-1.0           
Margaret Atwood|Margaret Atwood|The Handmaids Tale|The Handmaids Tale|311            |1985-06-01T00:00:00.000Z|-1.0           

@carlosdelest carlosdelest changed the title [PoC 2] ESQL - Add scoring for full text functions disjunctions using ExpressionEvaluator ESQL - Add scoring for full text functions disjunctions using ExpressionEvaluator Feb 4, 2025
@carlosdelest
Copy link
Member Author

@fang-xing-esql I'm afraid scoring was not totally fine on the previous pushes - I was focusing on the scoring mechanism and not in its actual implementation on every use case. I'd say it is more stable now, I've been able to simplify some code paths.

I plan to test this more extensively, in the meantime feel free to play around with scores now.

There is a big caveat with scores in disjunctions - they are not consistent when there are clauses that can and cannot be pushed down. Currently, clauses pushed down to Lucene contribute to the score (adding +1 to the score for each condition that matches). Clauses not pushed down do not contribute to the score.

So, the following query:

where (match(title, \"Lord\") and length(title) > 0) 

will have the same scores as

where match(title, \"Lord\")

but the following one will increase the scores in 1.0 for results that satisfy the year > 2000 clause:

where match(title, \"Lord\") and year > 2000

I think the consistent scoring for clauses pushed or not pushed down to Lucene should be handled as a separate issue (I believe this has been identified already).

@carlosdelest
Copy link
Member Author

Closing in favour of #121793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.0.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants