-
Couldn't load subscription status.
- Fork 116
Add extended search usage to spec #5389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Following you can find the validation changes against the target branch for the APIs.
You can validate these APIs yourself by using the |
| sections: Dictionary<Name, long> | ||
| retrievers: Dictionary<Name, long> | ||
| /* @availability stack since=9.2.0 */ | ||
| extended: Dictionary<Name, ExtendedSearchUsage> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be {} - is there a better way to represent this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{} is handled fine by Dictionary.
The reason validation still fails for the non-empty case is because when extended is indeed present, retrievers is the dictionary key, and then ExtendedSearchUsage validation fails because it looks for retrievers, but only sees text_similarity_reranker. It's possible to fix validation like this:
| extended: Dictionary<Name, ExtendedSearchUsage> | |
| extended: ExtendedSearchUsage |
It also passes validation, even with {} since retrievers is optional.
But is retrievers the only possible key here?
For reference, this is the only YAML test we currently have with non-empty extended:
// Test file: /test/platinum/voting_only_node/10_basic.yml
// Test name: cluster stats with voting only node stats
{
"cluster_name": "elasticsearch-x86-64-rest-test",
"cluster_uuid": "wqQ9NpkhQVGMMqtWjZBWoQ",
"indices": {
"search": {
"extended": {
"retrievers": {
"text_similarity_reranker": {
"chunk_rescorer": 8
}
}
},
"queries": { ... },
"rescorers": {
"query": 8,
"script": 8
},
"retrievers": {
"knn": 3,
"random_reranker": 2,
"rescorer": 5,
"rrf": 3,
"rule": 7,
"standard": 40,
"text_similarity_reranker": 16
},
"sections": { ... },
"total": 2423
},
},
"status": "green",
"timestamp": 1759316467252
})
And here's an empty case:
// Test file: /test/free/cluster.stats/10_basic.yml
// Test name: snapshot stats reported in get cluster stats
{
"cluster_name": "elasticsearch-x86-64-rest-test",
"cluster_uuid": "wqQ9NpkhQVGMMqtWjZBWoQ",
"indices": {
"search": {
"extended": {},
"queries": { ... },
"rescorers": {},
"retrievers": {},
"sections": { ... },
"total": 608
},
},
"status": "green",
"timestamp": 1759313436888
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those results look expected to me.
Unfortunately the retrievers.text_similarity_reranker.chunk_rescorer is the only field that we are sending in for 9.2. However, we anticipate that going forward, we'll definitely have other retrievers supported, with their own key:value pairs, and probably other calls for example for search or rescorers. Furthermore, when I implemented this @Mikep86 specifically asked for more flexibility so that more nesting could be returned in the future should we decide it was necessary. Therefore, we're kind of stuck with maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maps are fine, but if the level of nesting is going to be variable, then we can't use retrievers?: Dictionary<Name, Dictionary<Name, long>>. We'd need something like retrievers?: Dictionary<Name, UserDefinedValue> (which is difficult to work with in typed clients like Java, Go, and .NET).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking of keeping it as a long until we need to change it - we don't know if/when/what might change in the future so long seemed easier. What's preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we are between a rock and a hard place. Either a possible breaking change down the road or hard to use data from typed clients.
@l-trotta @Anaethelion What do you think? This is the cluster stats API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if each key is going to have different levels of nesting, could it be better to just have fields and objects instead of Dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps here, the level of nesting for any individual retriever (such as text_similarity_reranker) should be constant. We wanted to create flexibility so that other retrievers could nest deeper if necessary.
Can we not specify this schema by the specific retriever type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be ideal! Something like:
export class SearchUsageStats {
...
extended: ExtendedSearchUsage
}
export class ExtendedSearchUsage {
retrievers?: ExtendedRetrieversSearchUsage
}
export class ExtendedRetrieversSearchUsage {
text_similarity_reranker: Dictionary<Name, long> // or even one subclass with `chunk_rescorer: long`
}This allows you to add more fields in ExtendedSearchUsage as well as more retrievers than text_similarity_reranker with different nesting. We just need to be careful about the optionality of each field since changing that is a breaking change in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then let's be as granular as possible to avoid breaking changes. Updated in 39e38a1
Adds the extended search usage implemented in elastic/elasticsearch#135306 to the specification