- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Improve filter handling for ESQL CCS #126807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ef16bf4    to
    9f05ded      
    Compare
  
    aa80117    to
    7313daf      
    Compare
  
    26a7147    to
    7fea91d      
    Compare
  
    acb85a4    to
    a51bf2a      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass review. I didn't review the tests yet and I need to do a deeper round on the core logic, but these are my initial thoughts.
        
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlExecutionInfo.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlCCSUtils.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
          
            Show resolved
            Hide resolved
        
      … missing indices So we have to drop FILTERED status for now
| return originalTypes; | ||
| } | ||
| 
               | 
          ||
| public String toString() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required for the fix strictly speaking, but it improves observability of columns and makes it much easier to inspect them when debugging.
| 
           Pinging @elastic/es-analytical-engine (Team:Analytics)  | 
    
| 
           Pinging @elastic/es-search-foundations (Team:Search Foundations)  | 
    
| 
               | 
          ||
| private String getInexistentIndexErrorMessage() { | ||
| return "\"reason\" : \"Found 1 problem\\nline 1:1: Unknown index "; | ||
| return "Unknown index "; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any valid reason for this change?
Fyi, the line:column information there is relevant for the accuracy of the parsing error messaging system. Those two numbers are linked to the Node itself (which comes from the ANTLR parser) and is used by the Verifier to build a consistent error message, no matter the error message itself and which Node it comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is too strict as it doesn't allow changing any details of how the errors are reported, and that led to some failures, but this may be older versions of the patch and I'll re-check why exactly it is needed. In general, I think our tests relying on specific error messages makes some things hard to refactor or fix (i.e. we have some parts of code produce "unknown index" and some "no such index" and we have a lot of tests that rely on that, and that means we essentially are testing implementation details instead of functionality, down to exact wording of the error message). In this particular case, beyond checking that there's an unknown index, it also checks that this is the only problem and that it is reported in a very specific way - which means if we changed how exactly unknown indices are handled in a particular case (e.g., say, moved the check from runtime to planning time, as we may have to do), this test would likely break. It may not be necessary now for this patch - going to check that - but I think this is something we may want to consider.
| assertThat(clusterMetatata.getFailedShards(), equalTo(0)); | ||
| } | ||
| 
               | 
          ||
| protected void assertClusterMetadataNoShards(EsqlExecutionInfo.Cluster clusterMetatata, int shards, long took, String indexExpression) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be simplified by calling the one above assertClusterMetadataSuccess and passing 0 as the shards parameter. Also, this method doesn't even use the int shards parameter and can be completely removed from the method signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can clean up those. I left them somewhat verbose to make the debugging easier but now I can condense them. I do want to keep the different methods because that makes it easier to see which case is happening from looking at the test code, but I certainly can make it less copy-pasty.
        
          
                ...rnalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithFiltersIT.java
          
            Show resolved
            Hide resolved
        
      | } | ||
| 
               | 
          ||
| protected EsqlQueryResponse runQuery(String query, Boolean ccsMetadataInResponse, QueryBuilder filter) { | ||
| EsqlQueryRequest request = EsqlQueryRequest.syncEsqlQueryRequest(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this, also, be execute with async search as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I didn't even look into async side yet, had my hands full with the sync case. I'll look into that now.
| try (EsqlQueryResponse resp = runQuery("from logs-*,c*:logs-*", randomBoolean(), filter)) { | ||
| List<List<Object>> values = getValuesList(resp); | ||
| assertThat(values, hasSize(docsTest1)); | ||
| // FIXME: this is currently inconsistent with the non-wildcard case, since empty wildcard is not an error, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the issue here and the expected behavior.
I will describe a test I performed in non-CCS scenario where, from my point of view, things look ok. Data and index names are the one we use in CSV unit and IT tests.
from airports*,emp* metadata _index | keep *name*, _index
This returns columns as  first_name   |   last_name   |                    name                    |        _index        and data from indices
airports_mp                        
employees_incompatible             
employees                          
airports_web                       
airports_not_indexed_nor_doc_values
airports_not_indexed               
airports_no_doc_values             
airports
- I am adding a filter to the request to filter out all the 
airports*indices: 
    "query":"from airports*,emp* metadata _index | keep *name*, _index",
    "filter": {
        "bool": {
            "filter": [
                {
                    "exists": {
                        "field": "emp_no"
                    }
                }
            ]
        }
    }
and I get back columns as  first_name   |   last_name   |        _index        and data from indices
employees_incompatible
employees 
If I understand the FIXME right, the issue is that the remote columns are not added to the response, and this is consistent with the existent non-CCS behavior. And those columns shouldn't be added to the response, since the filter eliminates completely those indices (and, thus, their columns).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple, reproduceable scenario is appreciated.
Would be worth having this scenario reproduced (and compared with) in both CCS and non-CCS scenarios.
Thank you 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that the columns are sometimes added and sometimes aren't, depending on obscure factors like presence of wildcards (even though the underlying index is the same), other indices, structure of the query, etc. Basically, if anything is not resolved on the first lookup, the second lookup will add the fields - but whether or not second lookup happens may be completely unrelated to the index in question (e.g. second lookup can happen because of something related to another index), and I think this would look completely random to the user. IMO there should be a consistent system of when the fields are there, and it should not depend on the implementation details of two lookups that it is now.
I think it doesn't  really depend on CCS, just easier to set up and reproduce in CCS scenarios. This patch is not intending to fix it, just make the more serious failures (like filtered queries resulting in unknown index errors) go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smalyshev I think it is important to see those very specific scenarios of inconsistency, since this is a new functionality that haven't been tried by our users much.
To give a bit more context, the way field names are resolved using field_caps has been a functionality that didn't change for at least 5 years in various projects (but using more or less the same code).
ES|QL is the first one where this field_caps filtering is happening and the possible inconsistency of the columns in the response is exactly a likely side-effect/bug/known issue of this change. On one side, ES|QL gains (in most cases) performance improvement by having the filter applied to field_caps, but there may be unknown inconsistencies we might not have foreseen, thus my previous asks about a specific use case without CCS where this might be an inconsistency. If we know of the specific use cases we could either improve the code or update our documentation with this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into whether I can reproduce the inconsistency without CCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor requests/questions. LGTM
| && executionInfo.getClusterStates(EsqlExecutionInfo.Cluster.Status.RUNNING).findAny().isEmpty()) { | ||
| // for a CCS, if all clusters have been marked as SKIPPED, nothing to search so send a sentinel Exception | ||
| // to let the LogicalPlanActionListener decide how to proceed | ||
| LOGGER.debug("No more clusters to search, ending analysis stage"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logging useful? If we see that in the logs, how does it help? Is it useful for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is as it shows what is happening in the debug log - and yes, this is mainly for debugging.
        
          
                x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @smalyshev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
          💔 Backport failed
 You can use sqren/backport to manually backport by running   | 
    
          💚 All backports created successfully
 Questions ?Please refer to the Backport tool documentation  | 
    
* Test for CCS with filters * Partial fix for CCS/filters problems (cherry picked from commit 12451b6) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/ColumnInfoImpl.java
This is a partial fix for how ESQL works with CCS and filters. This contains:
Missing index handling for index expressions including existing/non-existing mix, and some LIMIT 0 scenarios, are still broken, not intending to fix it in this patch.
Implements tests for #118054