-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Field caps transport changes to return for each original expression what it was resolved to #136632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| listener.onResponse(FieldCapabilitiesResponse.builder().withFailures(failures).build()); | ||
| } else { | ||
| // throw back the first exception | ||
| listener.onFailure(failures.get(0).getException()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a bit tricky for CCS/CPS index resolution. As commented it literally re-throws back the first exception as is it was reported by remote.
For example for FieldCapsRequest(indices=[r1:index1,r2:index1,r3:index1]) (note index names are the same, only remotes are different), it could throw something like: org.elasticsearch.transport.RemoteTransportException with a cause org.elasticsearch.index.IndexNotFoundException: no such index [index1]. Unfortunately this lacks the information about remote originally caused the issue even though we know it from failures.get(0).getIndices() (although it might be caused my multiple indices).
This information is very important for ES|QL (and for other usecases) in order to properly report index resolution issues.
I believe there are 2 ways to resolve it:
- introduce a flag that causes FC to always resolve listener nonExceptionally with list of result and failures (basically first branch here). This way each caller might deduce the root cause of the issue on top of all exceptions and construct corresponding error message.
- introduce a flag that causes FC to wrap failures into new exception (maybe IndexResolutionException) retaining the index information and summarizing the errors. This way error reporting/formatting logic could be the same for all callers.
I believe this should be done separately. Let me open a draft/proposal with such change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idegtiarenko created a pr to address this: #136680
server/src/internalClusterTest/java/org/elasticsearch/search/fieldcaps/FieldCapabilitiesIT.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/search/fieldcaps/FieldCapabilitiesIT.java
Outdated
Show resolved
Hide resolved
|
|
||
| private final String[] indices; | ||
| private final ResolvedIndexExpressions resolvedLocally; | ||
| private final transient Map<String, ResolvedIndexExpressions> resolvedRemotely; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why resolvedLocally are serialized but resolvedRemotely not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the "coordinating" node can populate resolvedRemotely, because it knows from which remote the response was from.
Also, since chaining is disabled each FieldCapsResponse can only have the locally resolved index expressions as of now.
So. If the coordinating node receives the following index expression (in ccs notation) local-index,r*:remote-index
The coordinating node will populate resolvedLocally and forward the "r*:remote-index" to all matching remotes/projects.
The remotes/projects will then "resolveLocally" and respond back to the coordinating node. So for the remotes/projects we will never have the resolvedRemotely section populated.
When the coordinator receives back the responses from the remotes will then populate its local resolvedRemotely based on the responses it gets back.
resolvedRemotely then can be used either by ES|QL code to get infos on which original index expression what resolved to what and where and/or by CPS error handling code to possibly throw errors.
Sounds reasonable?
LMK if you have any other further questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify - is the coordinator node always the same node that sent the original FieldCapabilitiesRequest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the coordinator node always the same node that sent the original FieldCapabilitiesRequest?
Yes, the coordinator node is the node receiving the rest FC request, that node will then forward the requests to other data nodes and/or to other clusters as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this answer is correct, but it's worth triple checking: there are situation in CCS where requests are proxyed, hence read from a node and written directly to another one.
Yes, the coordinator node is the node receiving the rest FC request
Agreed, that's correct. There are also cases where the call is not originated from REST, and the transport field caps action is called directly via a client. With that said, the node that coordinates the field caps execution is one and only one, across multiple clusters/projects, correct? (This is slightly different in search with minimize roundtrips for instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way: I find the distinction between resolvedLocally and resolvedRemotely hard to follow. Maybe it's me. Isn't the difference between the two the key in the data structure - original expression vs remote alias - ? Could you add a full example of a real life situation and how the two would be populated and how the info is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the node that coordinates the field caps execution is one and only one, across multiple clusters/projects, correct?
Yes, that is correct, no matter if an internal user (ES|QL) calls the transport action directly or if the request goes through the REST layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the distinction between resolvedLocally and resolvedRemotely makes the most sense in CPS. To further explain it immagine this: we have three projects, origin with logs-es, p1 with metrics-es and p2 empty.
The client asks for two flat expressions logs* and metrics*.
The resolvedLocally will have the following structure (populated before contacting the linked projects):
for logs*
original: logs*
localExpressions: [{logs-es, SUCESS}]
remoteExpressions: [p1:logs*, p2:logs*]
metrics*
original: metrics*
localExpressions: [{NOT_VISIBLE}]
remoteExpressions: [p1:metrics*, p2:metrics*]
Then we send the request to both p1 and p2 and we get back:
from p1:
original: logs*
localExpressions: [{NOT_VISIBLE}]
original: metrics*
localExpressions: [{metrics-es, SUCESS}]
and from p2:
original: logs*
localExpressions: [{NOT_VISIBLE}]
original: metrics*
localExpressions: [{NOT_VISIBLE}]
Now we need to check if we have to throw errors based on the flat world resolution. For "logs*" we found it locally, so no error need to be thrown (and at this point in time we don't really care to check if it was found in linked projects).
For "metrics*" we check locally, we see "NOT_VISIBLE", therefore we loop on all the "remoteExpressions" inside the resolvedLocally and, split project alias, check that project alias inside resolvedRemotely and check if the index expression is present there with "SUCCESS", this is only the case for p1 and not p2.
Hope this make sense, I know it's a bit convoluted.
That said can it be done with only one map? Yes, but would be more confusing IMO. Plus by having the distinction we also "forcibly" disable project chaining, since we never send through the wire resolvedRemotely but only resolvedLocally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a little bit of the above as a code comment. I know this is just the response class, but at least a comment as to why resolvedRemotely is not serialized and what the distinction be resolvedRemotely and resolvedLocally is - that will help future maintainers with debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesResponse.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesResponse.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/ResolvedIndexExpressions.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
|
I believe this does not properly support exclusions in the new structure yet. I believe the same passes on 9.1 or on main. |
| } | ||
| } | ||
|
|
||
| public void testResolvedToMatchingEverywhere() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be difficult (impossible?) in the IT test framework, but is there a way to test that cross-cluster chaining does not occur here? (We'd need to link a remoteB to our remote-cluster, but not the origin cluster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could maybe connect origin -> remote -> origin to reproduce that sceario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- CPS chaining is impossible because we are not overriding
allowCrossProjectthus with this PR the endpoint will not be CPS compatible. - In this repo we are not able to test the project linking scenario outlined in the comment.
This is though a valid point. I have a change ready to 1) allowCrossProject and 2) test the scenario.
As a side note: With the resolvedLocally/resolvedRemotely data structure distinction and since we are only sending through the wire resolvedLocally for each remote/project we effectively prevent chaining (at least prevent it from surfacing it to the user, actual prevention needs to be done via changing the indicesOptions resolveCrossProject to false when we send the request to the linked projects)
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
| index, | ||
| new ResolvedIndexExpression.LocalExpressions( | ||
| Set.of(), | ||
| ResolvedIndexExpression.LocalIndexResolutionResult.CONCRETE_RESOURCE_NOT_VISIBLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(asking to learn) - I see that Nikolaj already added this LocalIndexResolutionResult enum, but what does CONCRETE_RESOURCE_NOT_VISIBLE mean? Not sure how that differs from CONCRETE_RESOURCE_UNAUTHORIZED.
And then for your code here, how do you know that based on it being a failure it is NOT_VISIBLE vs. UNAUTHORIZED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had exactly the same question :) tuning in to read the answer on this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was introduced to distinguish between an auth error or an IndexNotFound error. This distinction is needed in order to return 403 or 404 to the user and we need to know it also from the linked projects.
CONCRETE_RESOURCE_NOT_VISIBLE means that a non-wildcard expression was resolved to nothing, either because it doesn't exists or because it is closed.
CONCRETE_RESOURCE_UNAUTHORIZED means that the expression could be resolved to a concrete index but the requesting user has no authorization to access it.
NB: I can expand on why CONCRETE_RESOURCE_NOT_VISIBLE applies only to non-wildcard if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the answer may be that we have to trust the security decisions that were already made, but isn't unauthorized already giving away the info that the index does exist? That's no longer a concern then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, earlier we discussed with @n1v0lg that we want to "hide" if an index exists and it's un auth or does not exists/is closed. This is no longer the case, we are already sending to the users in CCS 403 if un auth or 404 if not found and we want to keep this distinction also in CPS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer may be that we have to trust the security decisions that were already made
Long story short yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, can we add code comments for CONCRETE_RESOURCE_NOT_VISIBLE in LocalIndexResolutionResult to clarify that. Even reading the javadoc on that enum that's still not obvious/clear. Doesn't have to be in the PR, just a general request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in LocalIndexResolutionResult to further explain it as suggested!
idegtiarenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be passing all the tests.
I am inclined to merge it into main to enable integration into ESQL while we are still working on remaining features (like properly support exclusions in the new data-structure).
quux00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor requests.
| index, | ||
| new ResolvedIndexExpression.LocalExpressions( | ||
| Set.of(), | ||
| ResolvedIndexExpression.LocalIndexResolutionResult.CONCRETE_RESOURCE_NOT_VISIBLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, can we add code comments for CONCRETE_RESOURCE_NOT_VISIBLE in LocalIndexResolutionResult to clarify that. Even reading the javadoc on that enum that's still not obvious/clear. Doesn't have to be in the PR, just a general request.
|
|
||
| private final String[] indices; | ||
| private final ResolvedIndexExpressions resolvedLocally; | ||
| private final transient Map<String, ResolvedIndexExpressions> resolvedRemotely; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a little bit of the above as a code comment. I know this is just the response class, but at least a comment as to why resolvedRemotely is not serialized and what the distinction be resolvedRemotely and resolvedLocally is - that will help future maintainers with debugging.
server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java
Outdated
Show resolved
Hide resolved
| // Locally resolved index expressions for the current project. | ||
| // This is sent over the wire from linked projects to the coordinating project of a request to | ||
| // inform the coordinating project of the local resolution state (for each remote). | ||
| private final ResolvedIndexExpressions resolvedLocally; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there was a better way to name / explain this . It may be me, but when I read local here, I don't understand whether it's local to the linked projects or to the coordinating project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is tricky, because what's remote in one context is local in another context.
If you think it will be better I can specify that it's always local to the FieldCapsResponse creator. So if it's created in the coordinator it's local to the coordinator. if instead is created in one of the linked project is local to the linked project and remote wrt the coordinator. (thus saved in resolvedRemotely of the coord)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated, hopefully is clearer now
quux00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BASE=ec0efafc2fdc49adffe4f58806ea71d28b7d9f49 HEAD=171872a6ca36f8594f0e7fa7ad2981aa23a8d1a2 Branch=main
BASE=ec0efafc2fdc49adffe4f58806ea71d28b7d9f49 HEAD=171872a6ca36f8594f0e7fa7ad2981aa23a8d1a2 Branch=main
BASE=ec0efafc2fdc49adffe4f58806ea71d28b7d9f49 HEAD=171872a6ca36f8594f0e7fa7ad2981aa23a8d1a2 Branch=main
With this PR we are introducing a new Transport level parameter to include resolutions for each original expression. This parameter can't be set from the REST layer and that do not affect the REST API response.
The parameter name is
includeResolvedTo, that defaults to false (current behaviour) and if set to true will populate the fieldCaps responseresolvedLocallyand possiblyresolvedRemotely(the latter only in CCS/CPS context).resolvedLocallyis of typeResolvedIndexExpressionsthat contains a list ofResolvedIndexExpression. TheResolvedIndexExpressionlist has one entry for each original index expression passed to FC and each element of the list will contain the original (unchanged) expression passed to FC, what it was resolved to locally (concrete indices also with the result of the resolution: SUCCESS/UNAUTH/NOT_VISIBLE) and, if in CPS context, also a list of remote expressions that the original expression was resolved to (flatlogs*will have remote expressionsp1:logs*andp2:logs*given that p1 and p2 are linked project that are not projectRouting excluded).resolvedRemotelyis only populated in CCS/CPS context and contains a map form the remote/project alias to theResolvedIndexExpressionsfor that remote/project.Note:
resolvedLocallyandresolvedRemotely.get(projectAlias)are both of typeResolvedIndexExpressionsbut the latter can't contain remoteExpression (because we are not allowing CCS/CPS chaining) and theoriginalexpression will contain the expression that the remote project received, therefore if the coordinator want to query p1:logs*, the p1 response will contain original="logs" and thatResolvedIndexExpressionwill be stored into theresolvedRemotelyat key p1.Note also that
resolvedRemotelyis transient and never sent through the wire between the remotes/linked project for the reasons I listed above. If in the future we want to enable CPS changing we can remove this limitation.The decision to re-use
ResolvedIndexExpressionand not to create another ad-hoc type without the remote expression was done because we want to avoid the proliferation of nearly-identical records.Code written together with @alexey-ivanov-es 😄 🙏