-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL - Add scoring for full text functions disjunctions using ExpressionEvaluator #121322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL - Add scoring for full text functions disjunctions using ExpressionEvaluator #121322
Conversation
|
Hi @carlosdelest, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest, this approach looks cleaner to me compared to the other one, so I decided to start trying some queries with this one. I'm trying to understand how the scores work with the logical operator in ES|QL, and this is the query that I'm starting with:
Q1
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0)"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|1.5904955863952637
Q2
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(author, \"Neal\") and length(author) > 0)"
}
'
author |author.keyword | name | name.keyword | page_count | release_date | _score
---------------+---------------+---------------+---------------+---------------+------------------------+------------------
Neal Stephenson|Neal Stephenson|Snow Crash |Snow Crash |470 |1992-06-01T00:00:00.000Z|1.5404449701309204
Q3
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(author, \"Alastair\") and length(author) > 0)"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|1.5404449701309204
Combine Q1 and Q2 - the scores don't persist on each leg of the OR, I was expecting no change in the scores, seems like each score increases by 1.
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0) or (match(author, \"Neal\") and length(author) > 0)"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Neal Stephenson |Neal Stephenson |Snow Crash |Snow Crash |470 |1992-06-01T00:00:00.000Z|2.5404449701309204
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|2.5904955863952637
Combine Q1 and Q3 - OR and AND return different scores, are they as expected?
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0) or (match(author, \"Alastair\") and length(author) > 0)"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+-----------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|4.130940556526184
* Connection #0 to host localhost left intact
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\") and length(name) > 0) and (match(author, \"Alastair\") and length(author) > 0)"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|3.1309404373168945
If the lengths are removed, and the matches are pushed down to Lucene, the combined scores are like below
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\")) or (match(author, \"Neal\"))"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Neal Stephenson |Neal Stephenson |Snow Crash |Snow Crash |470 |1992-06-01T00:00:00.000Z|1.5404449701309204
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|1.5904955863952637
* Connection #0 to host localhost left intact
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\")) or (match(author, \"Alastair\"))"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|3.1309404373168945
* Connection #0 to host localhost left intact
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from books metadata _score | where (match(name, \"Space\")) and (match(author, \"Alastair\"))"
}
'
author | author.keyword | name | name.keyword | page_count | release_date | _score
-----------------+-----------------+----------------+----------------+---------------+------------------------+------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space|Revelation Space|585 |2000-03-15T00:00:00.000Z|3.1309404373168945
| appendMatch(); | ||
| } | ||
|
|
||
| protected void appendMatch() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why appendMatch() throws IOException, but appendNoMatch() doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks weird, right?
appendMatch() reads the score from the Scorable, which can result in an IOException. appendNoMatch() does not need to read the scoring.
|
We might need to define the expected behavior of
Combine Combine |
…valuator-full-text-functions-disjunctions' into enhancement/esql-scoring-evaluator-full-text-functions-disjunctions
…ression is not true
This reverts commit 4f291ad.
|
@fang-xing-esql I'm afraid scoring was not totally fine on the previous pushes - I was focusing on the scoring mechanism and not in its actual implementation on every use case. I'd say it is more stable now, I've been able to simplify some code paths. I plan to test this more extensively, in the meantime feel free to play around with scores now. There is a big caveat with scores in disjunctions - they are not consistent when there are clauses that can and cannot be pushed down. Currently, clauses pushed down to Lucene contribute to the score (adding +1 to the score for each condition that matches). Clauses not pushed down do not contribute to the score. So, the following query: will have the same scores as but the following one will increase the scores in 1.0 for results that satisfy the I think the consistent scoring for clauses pushed or not pushed down to Lucene should be handled as a separate issue (I believe this has been identified already). |
|
Closing in favour of #121793 |
Another take for adding scoring in FTFs disjunctions.
This approach attempts a simpler approach than #121153.
Changes
score()method is added toExpressionEvaluator, so expressions can be evaluated for their result viaeval()and also get the corresponding score as a separatescore()method.LuceneQueryExpressionEvaluatorreturns aBooleanBlockfor matches, and retains aDoubleBlockthat will be returned in thescore()methodBooleanLogicuses anExpressionEvaluatorthat implements thescore()method, and sums the scores accordinglyNotis implemented as well - there are other mappings inEvalMapperlikeIsNotNulls,IsNullsthat could be implemented in the same wayFilterOperatorinvokes thescore()method if needed to change the overall score given thescore()of the evaluated expressionPros
ExpressionEvaluatorinterface, with a default implementation that only needs to be overriden byLuceneQueryExpressionEvaluatorand logical operators evaluators.LuceneQueryExpressionEvaluatorreturnedBlocktoDoubleBlock. Expression evaluation is the same.Cons
ExpressionEvaluatorinterface, that few classes care about. Looks unrelated to evaluating the expression and should be kept separate from it.LuceneQueryExpressionEvaluatorretains state - the score is calculated at the same time as the expression evaluation, but is returned as part of score(). We could calculate it separately at the cost of doing the query again (without scores for eval(), with scores for score()).See the previous approach on #121153