Skip to content

Conversation

@kderusso
Copy link
Member

@kderusso kderusso commented Oct 1, 2025

Adds the extended search usage implemented in elastic/elasticsearch#135306 to the specification

@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

Following you can find the validation changes against the target branch for the APIs.

API Status Request Response
cluster.stats 🔴 27/27 0/27 → 26/27
indices.create 🔴 1339/1372 → 1342/1372 1372/1372
indices.put_mapping 🔴 138/145 → 139/145 148/148
indices.segments 🔴 → 🟢 6/6 → 5/5 5/6 → 5/5
indices.shard_stores 🔴 → 🟢 6/6 → 5/5 5/6 → 5/5
search 🔴 2533/2602 → 2540/2602 2602/2602
security.get_stats 🟠 → 🟢 Missing type → 2/2 Missing type → 2/2

You can validate these APIs yourself by using the make validate target.

@kderusso kderusso mentioned this pull request Oct 1, 2025
9 tasks
@kderusso kderusso added the skip-backport This pull request should not be backported label Oct 1, 2025
sections: Dictionary<Name, long>
retrievers: Dictionary<Name, long>
/* @availability stack since=9.2.0 */
extended: Dictionary<Name, ExtendedSearchUsage>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be {} - is there a better way to represent this?

Copy link
Member

@pquentin pquentin Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} is handled fine by Dictionary.

The reason validation still fails for the non-empty case is because when extended is indeed present, retrievers is the dictionary key, and then ExtendedSearchUsage validation fails because it looks for retrievers, but only sees text_similarity_reranker. It's possible to fix validation like this:

Suggested change
extended: Dictionary<Name, ExtendedSearchUsage>
extended: ExtendedSearchUsage

It also passes validation, even with {} since retrievers is optional.

But is retrievers the only possible key here?

For reference, this is the only YAML test we currently have with non-empty extended:

// Test file: /test/platinum/voting_only_node/10_basic.yml
// Test name: cluster stats with voting only node stats
{
  "cluster_name": "elasticsearch-x86-64-rest-test",
  "cluster_uuid": "wqQ9NpkhQVGMMqtWjZBWoQ",
  "indices": {
    "search": {
      "extended": {
        "retrievers": {
          "text_similarity_reranker": {
            "chunk_rescorer": 8
          }
        }
      },
      "queries": { ... },
      "rescorers": {
        "query": 8,
        "script": 8
      },
      "retrievers": {
        "knn": 3,
        "random_reranker": 2,
        "rescorer": 5,
        "rrf": 3,
        "rule": 7,
        "standard": 40,
        "text_similarity_reranker": 16
      },
      "sections": { ... },
      "total": 2423
    },
  },
  "status": "green",
  "timestamp": 1759316467252
})

And here's an empty case:

// Test file: /test/free/cluster.stats/10_basic.yml
// Test name: snapshot stats reported in get cluster stats
{
  "cluster_name": "elasticsearch-x86-64-rest-test",
  "cluster_uuid": "wqQ9NpkhQVGMMqtWjZBWoQ",
  "indices": {
    "search": {
      "extended": {},
      "queries": { ... },
      "rescorers": {},
      "retrievers": {},
      "sections": { ... },
      "total": 608
    },
  },
  "status": "green",
  "timestamp": 1759313436888
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those results look expected to me.

Unfortunately the retrievers.text_similarity_reranker.chunk_rescorer is the only field that we are sending in for 9.2. However, we anticipate that going forward, we'll definitely have other retrievers supported, with their own key:value pairs, and probably other calls for example for search or rescorers. Furthermore, when I implemented this @Mikep86 specifically asked for more flexibility so that more nesting could be returned in the future should we decide it was necessary. Therefore, we're kind of stuck with maps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maps are fine, but if the level of nesting is going to be variable, then we can't use retrievers?: Dictionary<Name, Dictionary<Name, long>>. We'd need something like retrievers?: Dictionary<Name, UserDefinedValue> (which is difficult to work with in typed clients like Java, Go, and .NET).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of keeping it as a long until we need to change it - we don't know if/when/what might change in the future so long seemed easier. What's preferable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are between a rock and a hard place. Either a possible breaking change down the road or hard to use data from typed clients.

@l-trotta @Anaethelion What do you think? This is the cluster stats API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if each key is going to have different levels of nesting, could it be better to just have fields and objects instead of Dictionary

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps here, the level of nesting for any individual retriever (such as text_similarity_reranker) should be constant. We wanted to create flexibility so that other retrievers could nest deeper if necessary.

Can we not specify this schema by the specific retriever type?

Copy link
Member

@pquentin pquentin Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be ideal! Something like:

export class SearchUsageStats {
    ...
    extended: ExtendedSearchUsage
}

export class ExtendedSearchUsage {
  retrievers?: ExtendedRetrieversSearchUsage
}

export class ExtendedRetrieversSearchUsage {
  text_similarity_reranker: Dictionary<Name, long>  // or even one subclass with `chunk_rescorer: long`
}

This allows you to add more fields in ExtendedSearchUsage as well as more retrievers than text_similarity_reranker with different nesting. We just need to be careful about the optionality of each field since changing that is a breaking change in Go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then let's be as granular as possible to avoid breaking changes. Updated in 39e38a1

@pquentin pquentin merged commit c69bbef into main Oct 6, 2025
8 checks passed
@pquentin pquentin deleted the kderusso/extended-search-usage-spec branch October 6, 2025 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-backport This pull request should not be backported specification

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants