- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Fix behavior for _index LIKE for ESQL #130019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a726c97    to
    7dce974      
    Compare
  
    | @nik9000 This is the draft PR | 
        
          
                ...r/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClustersIT.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      This reverts commit 7484410.
| I have addressed the 3 NOCOMMITS. Configuration.java - Tried to move the setting from the Configuration to  LucenePushDownPredicate FilterTests.java | 
| Hi @julian-elastic, I've updated the changelog YAML for you. | 
        
          
                server/src/main/java/org/elasticsearch/index/query/WildcardQueryBuilder.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/index/query/WildcardQueryBuilder.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/capabilities/TranslationAware.java
          
            Show resolved
            Hide resolved
        
      | var wildcardQuery = QueryParser.escape(BytesRefs.toString(prefix.fold(FoldContext.small()))) + "*"; | ||
|  | ||
| return new WildcardQuery(source(), fieldName, wildcardQuery); | ||
| return new WildcardQuery(source(), fieldName, wildcardQuery, false, false); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be false, true? So we force a string match in case folks are running START_WITH against _index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for ENDS_WITH I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change you requested to false, true. Then I tried adding UTs to cover the ENDS_WITH/BEGINS_WITH on index. It seems like ENDS_WITH/BEGINS_WITH is broken on an index, similar to LIKE before the change, as they all used WildcardQuery. It is fixed in some cases with my change, broken with others. If we want comprehensive fix, I can work on it, but it will delay the LIKE PR and it is getting pretty big already.
Do we want to stick with the current broken behavior and handle ENDS_WITH/BEGINS_WITH in a separate PR? I want to get this one checked in first if possible.
To make matters worse QueryParser.escape() in the line right above is the wrong escape sequence for our PR. If you call it on "remote-index" it will escape the dash and change the wildcardQuery to "remote\-index" which is wrong I think and will produce no data. QueryParser.escape() does regular expression escaping, not wildcard escaping. It seems the correct escaping is in WildcardFieldMapper.escapeWildcardSyntax(). But changing it here, might break more cases/cause different data if we CCS to an older server that uses the old WildcardQuery workflow with the new escaping. So it is not trivial to change the false to true. I think keeping it as false should preserve the old wrong behavior, so no regressions and buys us time to investigate properly and handle all the cases. I think we need to discuss...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this and will fork it into an open issue.
        
          
                ...plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/ExpressionQueryBuilder.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | 🔍 Preview links for changed docs | 
537d56c    to
    8540550      
    Compare
  
    | var wildcardQuery = QueryParser.escape(BytesRefs.toString(prefix.fold(FoldContext.small()))) + "*"; | ||
|  | ||
| return new WildcardQuery(source(), fieldName, wildcardQuery); | ||
| return new WildcardQuery(source(), fieldName, wildcardQuery, false, false); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this and will fork it into an open issue.
| Let's get this in. Once it's landed I can explain the backport process. It's even more  | 
12107af    to
    8540550      
    Compare
  
    | Merged with #130849 | 
Fixes
_index LIKE <pattern>to always have normal text matching semantics.Implement a generic ExpressionQuery and ExpressionQueryBuilder that can be serialized to the data node. Then the ExpressionQueryBuilder can build an Automaton using TranslationAware.asLuceneQuery() and execute it in Lucine.
Introduces a breaking change for LIKE on _index fields. The old like behavior is not correct and does not have normal like semantics from ESQL. Customers upgrading from old build to new build might see a regression, where the data changes due to the like filters on clustering produces different results, but the new results are correct.
Behavior for ESQL
New CCS to New => New behavior everywhere
Old CCS to New => Old behavior everywhere (the isForESQL flag is not passed in from old)
New CCS to Old => New behavior for new, old behavior for old (the isForESQL cannot be passed, old does not know about it).
Old CCS to Old => Old behavior everywhere
Closes #129511