Collect and display execution metadata for ES|QL cross cluster searches #112595

quux00 · 2024-09-06T13:17:08Z

Enhance ES|QL responses to include information about took time (search latency), shards, and clusters against which the query was executed.

The goal of this PR is to begin to provide parity between the metadata displayed for cross-cluster searches in _search and ES|QL.

This PR adds the following features:

add overall took time to all ES|QL query responses. And to emphasize: "all" here means: async search, sync search, local-only and cross-cluster searches, so it goes beyond just CCS.
add _clusters metadata to the final response for cross-cluster searches, for both async and sync search (see example below)
tracking/reporting counts of skipped shards from the can_match (SearchShards API) phase of ES|QL processing
marking clusters as skipped if they cannot be connected to (during the field-caps phase of processing)

Out of scope for this PR:

honoring the skip_unavailable cluster setting
showing _clusters metadata in the async response while the search is still running
showing any shard failure messages (since any shard search failures in ES|QL are automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that this also means that the failed shard count is always 0 in ES|QL _clusters section.

Things changed with respect to behavior in _search:

the timed_out field in _clusters/details/mycluster was removed in the ESQL response, since ESQL does not support timeouts. It could be added back later if/when ESQL supports timeouts.
the failures array in _clusters/details/mycluster/_shards was removed in the ESQL response, since any shard failure causes the whole query to fail.

Example output from ES|QL CCS:

POST /_query
{
  "query": "from blogs,remote2:blo*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5"
}

{
  "took": 49,
  "columns": [
    {
      "name": "authors.first_name",
      "type": "text"
    },
    {
      "name": "publish_date",
      "type": "date"
    }
  ],
  "values": [
    [
      "Tammy",
      "2009-11-04T04:08:07.000Z"
    ],
    [
      "Theresa",
      "2019-05-10T21:22:32.000Z"
    ],
    [
      "Jason",
      "2021-11-23T00:57:30.000Z"
    ],
    [
      "Craig",
      "2019-12-14T21:24:29.000Z"
    ],
    [
      "Alexandra",
      "2013-02-15T18:13:24.000Z"
    ]
  ],
  "_clusters": {
    "total": 3,
    "successful": 2,
    "running": 0,
    "skipped": 1,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "successful",
        "indices": "blogs",
        "took": 43,
        "_shards": {
          "total": 13,
          "successful": 13,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote2": {
        "status": "skipped",  // remote2 was offline when this query was run
        "indices": "remote2:blo*",
        "took": 0,
        "_shards": {
          "total": 0,
          "successful": 0,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote1": {
        "status": "successful",
        "indices": "remote1:blogs",
        "took": 47,
        "_shards": {
          "total": 13,
          "successful": 13,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}

Fixes #112402 and #110935

quux00 · 2024-09-13T15:23:36Z

Question for reviewers / Product Managers:

Do we want took time in millis or nanos?

The argument for using millis:

took time will be consistent with _search which uses millis, so I've gone with millis, since my goal was to keep the metadata like _search as much as possible unless we decide to change some parts.

This potentially makes it simpler for Kibana as well if they already have code that parses and displays took time.

The argument for using nanos:

ESQL Driver Profiles use took_nanos. So should we also use nanos for overall search latency? Will using millis feel inconsistent from profiles to end users?

And apparently ESQL already issues a took-nanos HTTP header, based on this issue: #110935. I tracked that down - the took-nanos for the header is added here: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlResponseListener.java.

Based on #110935, I wonder if the EsqlResponseListener implementation of tracking total took time is flawed. It doesn't work for async-searches that are still running when the initial response is returned. Should we remove/change this implementation? Should it instead grab the overall took time out of the EsqlQueryResponse that I've now added? I asked that question in the code also in this commit.

quux00 · 2024-09-13T15:31:38Z

Question for reviewers / Product Managers:

Do we want to have the partial status in the ESQL metadata? Is that possible to have now? Will it ever be possible?

I've kept all the cluster statuses that we use on the _search side. These are the meanings of those statuses in _search:

successful: All shards searched returned successfully
running: Search is still running on that cluster
partial: At least one shard search failed or the search timed out on the server side (and thus may have searched all index segments), so we are returning partial data.
skipped: No shards were searched because the cluster was offline and marked as skip_unavailable=true
failed: The cluster was offline and marked as skip_unavailable=false. This causes the whole search to fail (return 4xx/5xx)

Do we want to keep these same status meanings in ES|QL?

UPDATE: Based on reviewer feedback and a discussion with Product, we will keep the same meanings, but not document the partial state as a possible status in the ES|QL API docs, since it currently can never be set. We will keep it in the code pending needing that state in the future, if ES|QL ever starts returning partial data (either due some some shard searches failing or ES|QL timeouts, as we have in _search).

quux00 · 2024-09-13T16:21:10Z

Edge case question for reviewers / Product Managers:

What should the status of the cluster be in the following scenario?

The user makes a cross-cluster query where the index expression on a cluster matches no indices:

POST /_query
{
  "query": "from blogs,remote2:no_such_index,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5"
}

Currently, in the code in this PR, I mark this as "SUCCESSFUL" (see example below), but with total shards = 0, indicating that nothing was searched. Is that the behavior we want? Or should we mark it as "SKIPPED"?

(toggle) **ES|QL response when cluster specified with index expression that matching no indices**

{
  "took": 25,
  "columns": [
    {
      "name": "authors.first_name",
      "type": "text"
    },
    {
      "name": "publish_date",
      "type": "date"
    }
  ],
  "values": [
    [
      "Theresa",
      "2020-09-14T23:13:55.000Z"
    ],
  ...
  ],
  "_clusters": {
    "total": 3,
    "successful": 2,
    "running": 0,
    "skipped": 1,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "successful",
        "indices": "blogs",
        "took": 24,
        "_shards": {
          "total": 13,
          "successful": 13,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote2": {
        "status": "successful",
        "indices": "remote2:blogs",
        "took": 24,
        "_shards": {
          "total": 4,
          "successful": 4,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote1": {
        "status": "successful",
        "indices": "remote1:blogs",
        "took": 0,
        "_shards": {
          "total": 0,
          "successful": 0,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}

Comparison to behavior in `_search`

In _search, the handling of this differs depending on whether the user used a wildcard or not. With a wildcard that matches nothing you get status=successful, _shards.total=0:

        "remote1": {
          "status": "successful",
          "indices": "x*",
          "took": 0,
          "timed_out": false,
          "_shards": {
            "total": 0,
            "successful": 0,
            "skipped": 0,
            "failed": 0
          }
        }

With no wildcard, you get status=skipped and a no_such_index failure entry:

        "remote1": {
          "status": "skipped",
          "indices": "my_index",
          "timed_out": false,
          "failures": [
            {
              "shard": -1,
              "index": null,
              "reason": {
                "type": "index_not_found_exception",
                "reason": "no such index [my_index]",
                "index_uuid": "_na_",
                "resource.type": "index_or_alias",
                "resource.id": "my_index",
                "index": "my_index"
              }
            }
          ]
        }

elasticsearchmachine · 2024-09-13T16:27:16Z

Hi @quux00, I've created a changelog YAML for you.

elasticsearchmachine · 2024-09-13T16:28:03Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

…nfo2

dnhatn

I reviewed the production changes, and they look good to me. Thanks, Michael!

dnhatn · 2024-09-29T05:33:54Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

            for (SearchShardsGroup group : resp.getGroups()) {
                var shardId = group.shardId();
                if (group.skipped()) {
+                    totalShards++;


nit: can you move totalShard++ outside and remove line 566?

dnhatn · 2024-09-29T05:35:09Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

        @Override
        public void messageReceived(ClusterComputeRequest request, TransportChannel channel, Task task) {
-            ChannelActionListener<ComputeResponse> listener = new ChannelActionListener<>(channel);
+            ChannelActionListener<ComputeResponse> lnr = new ChannelActionListener<>(channel);


I think the renaming is a leftover?

Fixed. Thanks.

dnhatn · 2024-09-29T16:37:27Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeListener.java

+    /**
+     * @return true if the "local" querying/coordinator cluster is being searched in a cross-cluster search
+     */
+    private boolean coordinatingClusterIsSearchedInCCS() {


I think we can remove the methods coordinatingClusterIsSearchedInCCS, runningOnRemoteCluster, and isCCSListener, shouldRecordTookTime as we should be able to collect execution info similarly to how we handle profiles and warnings. These methods require careful consideration of when and how to pass the clusterAlias, which I think increases the risk of errors. That said, it's entirely up to you whether to keep them as they are or remove them.

I'm not sure that's true. I find the ComputeListener very hard to reason about, so I could be wrong. The core problems with the profiles/warnings approach is that they accumulate with an intent to be shown in the final response, but the EsqlExecutionInfo needs to be updated in real time as the search is running so that when we add _clusters to the async-search response of still running searches (in a later PR) it will have all the current data in the EsqlExecution info, not some possibly inaccessible rollup with the ComputeListener.

And I don't see how we can remove runningOnRemoteCluster() since we need to know in the refs Listener whether to populate the ComputeResponse with execution info metadata or not. The remote and local cluster refs listeners have to act differently.

I also don't see how we can remove isCCSListener since the acquireCompute handler needs to know whether it is listener for a ComputeResponse with remote metadata or not, as again it needs to behave differently than a data-node handler listener. That's why I split them into separate methods in my earlier iteration of the PR, making the context or use case clear.

Finally, another reason context is needed is because when the Status of a cluster gets set to SUCCESSFUL differs between remote and local clusters. For remote clusters, we mark it done when the remote ComputeResponse comes back. But for local, we have to wait until the coordinator is done (in case it has to run operators that aggregate/merge data from the remotes), so those get set in different places, and context/state is needed to know what code to execute in these methods that are shared for the 4 uses cases for which ComputeListener is used.

Again, I could be wrong and there's a cleaner way to do this, but I was not able to figure it out. I'd need some detailed guidance on how we solve the above problems and do local rollup accumulations like we do for profiles and warnings.

Let's discuss this later.

…nfo2

…sters

…nfo2

elasticsearchmachine · 2024-09-30T20:05:07Z

💚 Backport successful

Status	Branch	Result
✅	8.x

…es (elastic#112595) Enhance ES|QL responses to include information about `took` time (search latency), shards, and clusters against which the query was executed. The goal of this PR is to begin to provide parity between the metadata displayed for cross-cluster searches in _search and ES|QL. This PR adds the following features: - add overall `took` time to all ES|QL query responses. And to emphasize: "all" here means: async search, sync search, local-only and cross-cluster searches, so it goes beyond just CCS. - add `_clusters` metadata to the final response for cross-cluster searches, for both async and sync search (see example below) - tracking/reporting counts of skipped shards from the can_match (SearchShards API) phase of ES|QL processing - marking clusters as skipped if they cannot be connected to (during the field-caps phase of processing) Out of scope for this PR: - honoring the `skip_unavailable` cluster setting - showing `_clusters` metadata in the async response **while** the search is still running - showing any shard failure messages (since any shard search failures in ES|QL are automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section. Things changed with respect to behavior in `_search`: - the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL response, since ESQL does not support timeouts. It could be added back later if/when ESQL supports timeouts. - the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL response, since any shard failure causes the whole query to fail. Example output from ES|QL CCS: ```es POST /_query { "query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5" } ``` ```json { "took": 49, "columns": [ { "name": "authors.first_name", "type": "text" }, { "name": "publish_date", "type": "date" } ], "values": [ [ "Tammy", "2009-11-04T04:08:07.000Z" ], [ "Theresa", "2019-05-10T21:22:32.000Z" ], [ "Jason", "2021-11-23T00:57:30.000Z" ], [ "Craig", "2019-12-14T21:24:29.000Z" ], [ "Alexandra", "2013-02-15T18:13:24.000Z" ] ], "_clusters": { "total": 3, "successful": 2, "running": 0, "skipped": 1, "partial": 0, "failed": 0, "details": { "(local)": { "status": "successful", "indices": "blogs", "took": 43, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } }, "remote2": { "status": "skipped", // remote2 was offline when this query was run "indices": "remote2:bl*", "took": 0, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 } }, "remote1": { "status": "successful", "indices": "remote1:blogs", "took": 47, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } } } } } ``` Fixes elastic#112402 and elastic#110935

…es (#112595) (#113820) Enhance ES|QL responses to include information about `took` time (search latency), shards, and clusters against which the query was executed. The goal of this PR is to begin to provide parity between the metadata displayed for cross-cluster searches in _search and ES|QL. This PR adds the following features: - add overall `took` time to all ES|QL query responses. And to emphasize: "all" here means: async search, sync search, local-only and cross-cluster searches, so it goes beyond just CCS. - add `_clusters` metadata to the final response for cross-cluster searches, for both async and sync search (see example below) - tracking/reporting counts of skipped shards from the can_match (SearchShards API) phase of ES|QL processing - marking clusters as skipped if they cannot be connected to (during the field-caps phase of processing) Out of scope for this PR: - honoring the `skip_unavailable` cluster setting - showing `_clusters` metadata in the async response **while** the search is still running - showing any shard failure messages (since any shard search failures in ES|QL are automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section. Things changed with respect to behavior in `_search`: - the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL response, since ESQL does not support timeouts. It could be added back later if/when ESQL supports timeouts. - the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL response, since any shard failure causes the whole query to fail. Example output from ES|QL CCS: ```es POST /_query { "query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5" } ``` ```json { "took": 49, "columns": [ { "name": "authors.first_name", "type": "text" }, { "name": "publish_date", "type": "date" } ], "values": [ [ "Tammy", "2009-11-04T04:08:07.000Z" ], [ "Theresa", "2019-05-10T21:22:32.000Z" ], [ "Jason", "2021-11-23T00:57:30.000Z" ], [ "Craig", "2019-12-14T21:24:29.000Z" ], [ "Alexandra", "2013-02-15T18:13:24.000Z" ] ], "_clusters": { "total": 3, "successful": 2, "running": 0, "skipped": 1, "partial": 0, "failed": 0, "details": { "(local)": { "status": "successful", "indices": "blogs", "took": 43, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } }, "remote2": { "status": "skipped", // remote2 was offline when this query was run "indices": "remote2:bl*", "took": 0, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 } }, "remote1": { "status": "successful", "indices": "remote1:blogs", "took": 47, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } } } } } ``` Fixes #112402 and #110935

## Summary Now that this PR elastic/elasticsearch#112595 landed in our snapshot, we can have the query time in the inspector. <img width="798" alt="image" src="https://github.com/user-attachments/assets/1ba1c59e-f094-4a56-964d-d76bdc1db8b3"> <img width="1017" alt="image" src="https://github.com/user-attachments/assets/48464d7c-60c0-4924-bfcb-85f82b7caa40">

## Summary Now that this PR elastic/elasticsearch#112595 landed in our snapshot, we can have the query time in the inspector. <img width="798" alt="image" src="https://github.com/user-attachments/assets/1ba1c59e-f094-4a56-964d-d76bdc1db8b3"> <img width="1017" alt="image" src="https://github.com/user-attachments/assets/48464d7c-60c0-4924-bfcb-85f82b7caa40"> (cherry picked from commit 2621bb7)

…es (elastic#112595) Enhance ES|QL responses to include information about `took` time (search latency), shards, and clusters against which the query was executed. The goal of this PR is to begin to provide parity between the metadata displayed for cross-cluster searches in _search and ES|QL. This PR adds the following features: - add overall `took` time to all ES|QL query responses. And to emphasize: "all" here means: async search, sync search, local-only and cross-cluster searches, so it goes beyond just CCS. - add `_clusters` metadata to the final response for cross-cluster searches, for both async and sync search (see example below) - tracking/reporting counts of skipped shards from the can_match (SearchShards API) phase of ES|QL processing - marking clusters as skipped if they cannot be connected to (during the field-caps phase of processing) Out of scope for this PR: - honoring the `skip_unavailable` cluster setting - showing `_clusters` metadata in the async response **while** the search is still running - showing any shard failure messages (since any shard search failures in ES|QL are automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section. Things changed with respect to behavior in `_search`: - the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL response, since ESQL does not support timeouts. It could be added back later if/when ESQL supports timeouts. - the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL response, since any shard failure causes the whole query to fail. Example output from ES|QL CCS: ```es POST /_query { "query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5" } ``` ```json { "took": 49, "columns": [ { "name": "authors.first_name", "type": "text" }, { "name": "publish_date", "type": "date" } ], "values": [ [ "Tammy", "2009-11-04T04:08:07.000Z" ], [ "Theresa", "2019-05-10T21:22:32.000Z" ], [ "Jason", "2021-11-23T00:57:30.000Z" ], [ "Craig", "2019-12-14T21:24:29.000Z" ], [ "Alexandra", "2013-02-15T18:13:24.000Z" ] ], "_clusters": { "total": 3, "successful": 2, "running": 0, "skipped": 1, "partial": 0, "failed": 0, "details": { "(local)": { "status": "successful", "indices": "blogs", "took": 43, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } }, "remote2": { "status": "skipped", // remote2 was offline when this query was run "indices": "remote2:bl*", "took": 0, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 } }, "remote1": { "status": "successful", "indices": "remote1:blogs", "took": 47, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } } } } } ``` Fixes elastic#112402 and elastic#110935

## Summary Now that this PR elastic/elasticsearch#112595 landed in our snapshot, we can have the query time in the inspector. <img width="798" alt="image" src="https://github.com/user-attachments/assets/1ba1c59e-f094-4a56-964d-d76bdc1db8b3"> <img width="1017" alt="image" src="https://github.com/user-attachments/assets/48464d7c-60c0-4924-bfcb-85f82b7caa40">

…l exceptions (#115017) The model for calculating per-cluster `took` times from remote clusters in #112595 was flawed. It attempted to use Java's System.nanoTime between the local and remote clusters, which is not safe. This results in per-cluster took times that have arbitrary (invalid) values including negative values which cause exceptions to be thrown by the `TimeValue` constructor. (Note: the overall took time calculation was done correctly, so it was the remote per-cluster took times that were flawed.) In this PR, I've done a redesign to address this. A key decision of this re-design was whether to always calculate took times only on the querying cluster (bypassing this whole problem) or to continue to allow the remote clusters to calculate their own took times for the remote processing and report that back to the querying cluster via the `ComputeResponse`. I decided in favor of having remote clusters compute their own took times for the remote processing and to additionally track "planning" time (encompassing field-caps and policy enrich remote calls), so that total per-cluster took time is a combination of the two. In _search, remote cluster took times are calculated entirely on the remote cluster, so network time is not included in the per-cluster took times. This has been helpful in diagnosing issues on user environments because if you see an overall took time that is significantly larger than the per cluster took times, that may indicate a network issue, which has happened in diagnosing cross-cluster issues in _search. I moved relative time tracking into `EsqlExecutionInfo`. The "planning time" marker is currently only used in cross-cluster searches, so it will conflict with the INLINESTATS 2 phase model (where planning can be done twice). We will improve this design to handle a 2 phase model in a later ticket, as part of the INLINESTATS work. I tested the current overall took time calculation model with local-only INLINESTATS queries and they work correctly. I also fixed another secondary bug in this PR. If the remote cluster is an older version that does not return took time (and shard info) in the ComputeResponse, the per-cluster took time is then calculated on the querying cluster as a fallback. Finally, I fixed some minor inconsistencies about whether the `_shards` info is shown in the response. The rule now is that `_shards` is always shown with 0 shards for SKIPPED clusters, with actual counts for SUCCESSFUL clusters and for remotes running an older version that doesn't report shard stats, the `_shards` field is left out of the XContent response. Fixes #115022

…l exceptions (elastic#115017) The model for calculating per-cluster `took` times from remote clusters in elastic#112595 was flawed. It attempted to use Java's System.nanoTime between the local and remote clusters, which is not safe. This results in per-cluster took times that have arbitrary (invalid) values including negative values which cause exceptions to be thrown by the `TimeValue` constructor. (Note: the overall took time calculation was done correctly, so it was the remote per-cluster took times that were flawed.) In this PR, I've done a redesign to address this. A key decision of this re-design was whether to always calculate took times only on the querying cluster (bypassing this whole problem) or to continue to allow the remote clusters to calculate their own took times for the remote processing and report that back to the querying cluster via the `ComputeResponse`. I decided in favor of having remote clusters compute their own took times for the remote processing and to additionally track "planning" time (encompassing field-caps and policy enrich remote calls), so that total per-cluster took time is a combination of the two. In _search, remote cluster took times are calculated entirely on the remote cluster, so network time is not included in the per-cluster took times. This has been helpful in diagnosing issues on user environments because if you see an overall took time that is significantly larger than the per cluster took times, that may indicate a network issue, which has happened in diagnosing cross-cluster issues in _search. I moved relative time tracking into `EsqlExecutionInfo`. The "planning time" marker is currently only used in cross-cluster searches, so it will conflict with the INLINESTATS 2 phase model (where planning can be done twice). We will improve this design to handle a 2 phase model in a later ticket, as part of the INLINESTATS work. I tested the current overall took time calculation model with local-only INLINESTATS queries and they work correctly. I also fixed another secondary bug in this PR. If the remote cluster is an older version that does not return took time (and shard info) in the ComputeResponse, the per-cluster took time is then calculated on the querying cluster as a fallback. Finally, I fixed some minor inconsistencies about whether the `_shards` info is shown in the response. The rule now is that `_shards` is always shown with 0 shards for SKIPPED clusters, with actual counts for SUCCESSFUL clusters and for remotes running an older version that doesn't report shard stats, the `_shards` field is left out of the XContent response. Fixes elastic#115022

…es fatal exceptions (#115017) (#115124) * ES|QL per-cluster took time is incorrectly calculated and causes fatal exceptions (#115017) The model for calculating per-cluster `took` times from remote clusters in #112595 was flawed. It attempted to use Java's System.nanoTime between the local and remote clusters, which is not safe. This results in per-cluster took times that have arbitrary (invalid) values including negative values which cause exceptions to be thrown by the `TimeValue` constructor. (Note: the overall took time calculation was done correctly, so it was the remote per-cluster took times that were flawed.) In this PR, I've done a redesign to address this. A key decision of this re-design was whether to always calculate took times only on the querying cluster (bypassing this whole problem) or to continue to allow the remote clusters to calculate their own took times for the remote processing and report that back to the querying cluster via the `ComputeResponse`. I decided in favor of having remote clusters compute their own took times for the remote processing and to additionally track "planning" time (encompassing field-caps and policy enrich remote calls), so that total per-cluster took time is a combination of the two. In _search, remote cluster took times are calculated entirely on the remote cluster, so network time is not included in the per-cluster took times. This has been helpful in diagnosing issues on user environments because if you see an overall took time that is significantly larger than the per cluster took times, that may indicate a network issue, which has happened in diagnosing cross-cluster issues in _search. I moved relative time tracking into `EsqlExecutionInfo`. The "planning time" marker is currently only used in cross-cluster searches, so it will conflict with the INLINESTATS 2 phase model (where planning can be done twice). We will improve this design to handle a 2 phase model in a later ticket, as part of the INLINESTATS work. I tested the current overall took time calculation model with local-only INLINESTATS queries and they work correctly. I also fixed another secondary bug in this PR. If the remote cluster is an older version that does not return took time (and shard info) in the ComputeResponse, the per-cluster took time is then calculated on the querying cluster as a fallback. Finally, I fixed some minor inconsistencies about whether the `_shards` info is shown in the response. The rule now is that `_shards` is always shown with 0 shards for SKIPPED clusters, with actual counts for SUCCESSFUL clusters and for remotes running an older version that doesn't report shard stats, the `_shards` field is left out of the XContent response. Fixes #115022 * Added fix for #115127 into this since can't get the build to pass

…l exceptions (elastic#115017) The model for calculating per-cluster `took` times from remote clusters in elastic#112595 was flawed. It attempted to use Java's System.nanoTime between the local and remote clusters, which is not safe. This results in per-cluster took times that have arbitrary (invalid) values including negative values which cause exceptions to be thrown by the `TimeValue` constructor. (Note: the overall took time calculation was done correctly, so it was the remote per-cluster took times that were flawed.) In this PR, I've done a redesign to address this. A key decision of this re-design was whether to always calculate took times only on the querying cluster (bypassing this whole problem) or to continue to allow the remote clusters to calculate their own took times for the remote processing and report that back to the querying cluster via the `ComputeResponse`. I decided in favor of having remote clusters compute their own took times for the remote processing and to additionally track "planning" time (encompassing field-caps and policy enrich remote calls), so that total per-cluster took time is a combination of the two. In _search, remote cluster took times are calculated entirely on the remote cluster, so network time is not included in the per-cluster took times. This has been helpful in diagnosing issues on user environments because if you see an overall took time that is significantly larger than the per cluster took times, that may indicate a network issue, which has happened in diagnosing cross-cluster issues in _search. I moved relative time tracking into `EsqlExecutionInfo`. The "planning time" marker is currently only used in cross-cluster searches, so it will conflict with the INLINESTATS 2 phase model (where planning can be done twice). We will improve this design to handle a 2 phase model in a later ticket, as part of the INLINESTATS work. I tested the current overall took time calculation model with local-only INLINESTATS queries and they work correctly. I also fixed another secondary bug in this PR. If the remote cluster is an older version that does not return took time (and shard info) in the ComputeResponse, the per-cluster took time is then calculated on the querying cluster as a fallback. Finally, I fixed some minor inconsistencies about whether the `_shards` info is shown in the response. The rule now is that `_shards` is always shown with 0 shards for SKIPPED clusters, with actual counts for SUCCESSFUL clusters and for remotes running an older version that doesn't report shard stats, the `_shards` field is left out of the XContent response. Fixes elastic#115022

* ES|QL per-cluster took time is incorrectly calculated and causes fatal exceptions (#115017) The model for calculating per-cluster `took` times from remote clusters in #112595 was flawed. It attempted to use Java's System.nanoTime between the local and remote clusters, which is not safe. This results in per-cluster took times that have arbitrary (invalid) values including negative values which cause exceptions to be thrown by the `TimeValue` constructor. (Note: the overall took time calculation was done correctly, so it was the remote per-cluster took times that were flawed.) In this PR, I've done a redesign to address this. A key decision of this re-design was whether to always calculate took times only on the querying cluster (bypassing this whole problem) or to continue to allow the remote clusters to calculate their own took times for the remote processing and report that back to the querying cluster via the `ComputeResponse`. I decided in favor of having remote clusters compute their own took times for the remote processing and to additionally track "planning" time (encompassing field-caps and policy enrich remote calls), so that total per-cluster took time is a combination of the two. In _search, remote cluster took times are calculated entirely on the remote cluster, so network time is not included in the per-cluster took times. This has been helpful in diagnosing issues on user environments because if you see an overall took time that is significantly larger than the per cluster took times, that may indicate a network issue, which has happened in diagnosing cross-cluster issues in _search. I moved relative time tracking into `EsqlExecutionInfo`. The "planning time" marker is currently only used in cross-cluster searches, so it will conflict with the INLINESTATS 2 phase model (where planning can be done twice). We will improve this design to handle a 2 phase model in a later ticket, as part of the INLINESTATS work. I tested the current overall took time calculation model with local-only INLINESTATS queries and they work correctly. I also fixed another secondary bug in this PR. If the remote cluster is an older version that does not return took time (and shard info) in the ComputeResponse, the per-cluster took time is then calculated on the querying cluster as a fallback. Finally, I fixed some minor inconsistencies about whether the `_shards` info is shown in the response. The rule now is that `_shards` is always shown with 0 shards for SKIPPED clusters, with actual counts for SUCCESSFUL clusters and for remotes running an older version that doesn't report shard stats, the `_shards` field is left out of the XContent response. Fixes #115022 * Update execution info at end of planning before kicking off execution phase (#115127) The revised took time model bug fix #115017 introduced a new bug that allows a race condition between updating the execution info with "end of planning" timestamp and using that timestamp during execution. This one line fix reverses the order to ensure the planning phase execution update occurs before starting the ESQL query execution phase.

…l exceptions (elastic#115017) The model for calculating per-cluster `took` times from remote clusters in elastic#112595 was flawed. It attempted to use Java's System.nanoTime between the local and remote clusters, which is not safe. This results in per-cluster took times that have arbitrary (invalid) values including negative values which cause exceptions to be thrown by the `TimeValue` constructor. (Note: the overall took time calculation was done correctly, so it was the remote per-cluster took times that were flawed.) In this PR, I've done a redesign to address this. A key decision of this re-design was whether to always calculate took times only on the querying cluster (bypassing this whole problem) or to continue to allow the remote clusters to calculate their own took times for the remote processing and report that back to the querying cluster via the `ComputeResponse`. I decided in favor of having remote clusters compute their own took times for the remote processing and to additionally track "planning" time (encompassing field-caps and policy enrich remote calls), so that total per-cluster took time is a combination of the two. In _search, remote cluster took times are calculated entirely on the remote cluster, so network time is not included in the per-cluster took times. This has been helpful in diagnosing issues on user environments because if you see an overall took time that is significantly larger than the per cluster took times, that may indicate a network issue, which has happened in diagnosing cross-cluster issues in _search. I moved relative time tracking into `EsqlExecutionInfo`. The "planning time" marker is currently only used in cross-cluster searches, so it will conflict with the INLINESTATS 2 phase model (where planning can be done twice). We will improve this design to handle a 2 phase model in a later ticket, as part of the INLINESTATS work. I tested the current overall took time calculation model with local-only INLINESTATS queries and they work correctly. I also fixed another secondary bug in this PR. If the remote cluster is an older version that does not return took time (and shard info) in the ComputeResponse, the per-cluster took time is then calculated on the querying cluster as a fallback. Finally, I fixed some minor inconsistencies about whether the `_shards` info is shown in the response. The rule now is that `_shards` is always shown with 0 shards for SKIPPED clusters, with actual counts for SUCCESSFUL clusters and for remotes running an older version that doesn't report shard stats, the `_shards` field is left out of the XContent response. Fixes elastic#115022

elasticsearchmachine added the v8.16.0 label Sep 6, 2024

quux00 force-pushed the esql/ccs-execution-info2 branch 9 times, most recently from 7ca990c to c8f1841 Compare September 11, 2024 14:42

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

quux00 force-pushed the esql/ccs-execution-info2 branch 5 times, most recently from 0195ac2 to 9ff909d Compare September 13, 2024 13:14

quux00 added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Sep 13, 2024

quux00 added auto-backport-and-merge v8.16.0 labels Sep 13, 2024

quux00 marked this pull request as ready for review September 13, 2024 16:28

quux00 requested a review from a team as a code owner September 13, 2024 16:28

quux00 requested a review from nik9000 September 13, 2024 16:28

Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…

e5e45b5

…nfo2

dnhatn approved these changes Sep 29, 2024

View reviewed changes

stratoula mentioned this pull request Sep 30, 2024

[ES|QL] Improve the last duration calculation in the history component elastic/kibana#188434

Closed

quux00 added 6 commits September 30, 2024 08:36

Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…

9a304c2

…nfo2

PR feedback

d79af98

Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…

826aab7

…nfo2

Changed status to SKIPPED when no matching index found for remote clu…

ec99687

…sters

Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…

02092e9

…nfo2

Merge remote-tracking branch 'elastic/main' into esql/ccs-execution-i…

8569bfa

…nfo2

quux00 merged commit ddba474 into elastic:main Sep 30, 2024
16 checks passed

quux00 mentioned this pull request Sep 30, 2024

[8.x] Collect and display execution metadata for ES|QL cross cluster searches (#112595) #113820

Merged

astefan mentioned this pull request Oct 1, 2024

ES|QL: pin 'now' as query start time #113777

Merged

stratoula mentioned this pull request Oct 3, 2024

[ES|QL] Pass the took time to the inspector elastic/kibana#194806

Merged

This was referenced Oct 17, 2024

ES|QL per-cluster took time is incorrectly calculated and causes fatal exceptions #115022

Closed

ES|QL per-cluster took time is incorrectly calculated and causes fatal exceptions #115017

Merged

Collect and display execution metadata for ES|QL cross cluster searches #112595

Collect and display execution metadata for ES|QL cross cluster searches #112595

Uh oh!

Conversation

quux00 commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quux00 commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quux00 commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quux00 commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comparison to behavior in _search

Uh oh!

elasticsearchmachine commented Sep 13, 2024

Uh oh!

elasticsearchmachine commented Sep 13, 2024

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn Sep 29, 2024

Choose a reason for hiding this comment

Uh oh!

quux00 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

dnhatn Sep 29, 2024

Choose a reason for hiding this comment

Uh oh!

quux00 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

dnhatn Sep 29, 2024

Choose a reason for hiding this comment

Uh oh!

quux00 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

dnhatn Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 30, 2024

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

quux00 commented Sep 6, 2024 •

edited

Loading

quux00 commented Sep 13, 2024 •

edited

Loading

quux00 commented Sep 13, 2024 •

edited

Loading

quux00 commented Sep 13, 2024 •

edited

Loading

Comparison to behavior in `_search`