Skip to content

Conversation

slobodanadamovic
Copy link
Contributor

@slobodanadamovic slobodanadamovic commented Sep 5, 2025

This refactoring is meant to be purely structural, with no functional changes.
It introduces two interfaces:

  • RemoteClusterAuthenticationService - extracted based on the existing CrossClusterAccessAuthenticationService
  • RemoteClusterTransportInterceptor - which aims to abstract and move all "cross-cluster access" logic from SecurityServerTransportInterceptor

This is prerequisite for making remote cluster security logic pluggable (which I plan to add in a followup PR).

Relates to ES-12801


Note: The PR can also be reviewed commit-by-commit. I've tried to make each commit as small as possible and self-sustained.
Note2: Test refactoring will be handled in a followup PR.

- define a new interface based on CrossClusterAccessAuthenticationService
- hide method that accepts ApiKeyCredentials as it's only used for testing
@slobodanadamovic slobodanadamovic self-assigned this Sep 5, 2025
@slobodanadamovic slobodanadamovic added >refactoring :Security/Security Security issues without another label Team:Security Meta label for security team and removed v9.2.0 labels Sep 5, 2025
@slobodanadamovic slobodanadamovic changed the title Refactor remote cluster security interceptor and authenticator Introduce remote cluster security interceptor and authenticator Sep 7, 2025
* Allows to provide remote cluster interception that's capable of intercepting remote connections
* both on the receiver and the sender side.
*/
public interface RemoteClusterTransportInterceptor {
Copy link
Contributor Author

@slobodanadamovic slobodanadamovic Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to any naming suggestions here. I went with using RemoteCluster prefix for both new interfaces as it felt consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good name

@slobodanadamovic slobodanadamovic added the test-full-bwc Trigger full BWC version matrix tests label Sep 7, 2025
@slobodanadamovic slobodanadamovic marked this pull request as ready for review September 7, 2025 20:54
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the detail of the interceptors yet. Will try to finish that today

* @param listener callback to receive {@code null} on successful authentication,
* or an exception on authentication failure
*/
void authenticateHeaders(Map<String, String> headers, ActionListener<Void> listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused about the contract of this method.

  • The name of the method implies that is does authentication.
  • The javadoc implies that it doesn't actually do authentication (or isn't required to?).
  • The implementation in CrossClusterAccessAuthenticationService.tryAuthenticate appears to require and authenticate the API Key, but not set the Authentication.

I'm not quite sure what to make of that.

Perhaps the issue is what "if headers contain valid remote cluster credentials" actually means. I would read that and assume it was just checking whether the headers exist, but I think it might mean it does more than that. So, I guess my question is what doesn't it do? Is it just that it doesn't create an Authentication object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the issue is what "if headers contain valid remote cluster credentials" actually means.

Yeah, this is confusing. It's vague because the "validation" may mean different for cross-project than for RCS 2.0.

The order of method calls is:

  1. read only headers on the remote cluster port and call authenticateHeaders
  2. consume the full request body and call authenticate

The authenticateHeaders method validates and verifies the cross cluster credentials based on the received headers. It's called early during request processing, and only for requests on the remote cluster port. We only read headers (body is not consumed yet) at the stage when this method is called. This means that required cross cluster headers must be present and credentials must be valid. Fro RCS 2.0, there is no authentication nor role building happening at this stage, hence the without the overhead of full authentication processing comment.

But thinking further, this may not hold true in all use cases. This method could chose to build authentication object and stash it in the thread context, but it's not required at this phase. It gets required when we call ServerTransportFilter to process inbound message.

Also, I leaned on the fact that all communication is going over RCS port and REMOTE_CLUSTER_PROFILE. This is explicit in SecurityNetty4Transport, but implicit when authenticateHeaders is called. I can try to make that contract clearer. And also make it explicit and change RemoteClusterTransportInterceptor#getProfileTransportFilters to provide a single filter for the REMOTE_CLUSTER_PROFILE only. Currently, it's pretty generic as I wanted to keep flexibility to inject different transport filters for other profiles. We may not need to be able to do it, but I couldn't foresee it at this stage. Let me know what you think.

Copy link
Contributor

@tvernum tvernum Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RCS 2.0, there is no authentication

I think this is where the semantic mismatch occurs. For me, verifying the api key (which, unless I'm horribly confused, is done in authenticateHeaders) is authentication. That's the primary credential that determines whether this is an authenticated request.

We don't determine the identity of the end-user, their roles (intersection with the API key) or construct an Authentication object, but in my mental model, those are side-effects of the actual "authentication" which is checking the validity of the API key.

I don't want to bike shed this, or argue about precise terminology, but it seems we draw the box around "authentication" differently, and your mindset is "unless we've done all the things we typically do during authentication, then we haven't done authentication" and mine is "if we've verified credentials, that's authentication".

So calling it authenticateHeaders but saying it doesn't do "full authentication" is confusing to me, I don't know what that means.

Calling it authenticateConnection would be more meaningful to me, at least in the context of RCS2.0, but maybe that's not a terminology that fits with CPS (I'm not sure what you think we'll do in that method for CPS).

Copy link
Contributor Author

@slobodanadamovic slobodanadamovic Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think semantics are important and this should have read as For RCS 2.0, there is no authentication object nor role building happening at this stage.

I tried to make distinction between authenticateHeaders and authenticate methods, but failed to name correctly the other typical things we do after we execute authentication. Which could be summed up as "building authentication informations and stashing them in the thread context".

Both methods are called for each request. I don't have full picture on why this was implemented the way it is, but I'm assuming it has to do with how thread context is handled at these different points in time.

We may need to do more refactoring here, because we execute authentication two times for the same RCS request. This may have not been a big of an issue for RCS 2.0, but I think it will be for CPS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to do more refactoring here, because we execute authentication two times for the same RCS request. This may have not been a big of an issue for RCS 2.0, but I think it will be for CPS.

I think we can defer this conversation until we know what the CPS version looks like. That's going to require coming up with a clearer concept of why we have 2 methods and what we're supposed to do in each one, and trying to name the methods now is probably going to be a waste of effort.

TransportInterceptor.AsyncSender interceptSender(TransportInterceptor.AsyncSender sender);

/**
* This method returns {@code true} if the {@code connection} is targeting a remote cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This method returns {@code true} if the {@code connection} is targeting a remote cluster.
* This method returns {@code true} if the outbound {@code connection} is targeting a remote cluster.

It should always be an outbound connection (if my mental model of this stuff is correct) but I think it's helpful for the javadoc to be explicit about that.

* Allows to provide remote cluster interception that's capable of intercepting remote connections
* both on the receiver and the sender side.
*/
public interface RemoteClusterTransportInterceptor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good name

/**
* Returns {@code true} if any of the remote cluster access headers are in the security context.
* This method is used to assert we don't have access headers already in the security context,
* before we even run remote cluster intercepts. Serves as a sanity check that we properly clear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* before we even run remote cluster intercepts. Serves as a sanity check that we properly clear
* before we even run remote cluster intercepts. Serves as an integrity check that we properly clear

XPackLicenseState licenseState,
SecurityContext securityContext,
ThreadPool threadPool,
Settings settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It feels like these arguments are in a random order. Or at least, it's a completely different order than they were in SecurityServerTransportInterceptor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are in different order. I did't pay attention on preserving the order.
I can see how it could have been helpful to review if they were. I can reorder them.

private SecurityContext securityContext;
private ClusterService clusterService;
private MockLicenseState mockLicenseState;
private DestructiveOperations destructiveOperations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally decide not to split up this test case? It feels like it should become 2 sets of tests, but maybe that's a followup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional to avoid too many changes. I will split them in a followup PR.

@slobodanadamovic slobodanadamovic removed the test-full-bwc Trigger full BWC version matrix tests label Sep 9, 2025
@slobodanadamovic slobodanadamovic merged commit 53ef68f into elastic:main Sep 9, 2025
41 checks passed
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
…tic#134245)

This refactoring is meant to be purely structural, with no functional changes. 
It introduces two interfaces:
- `RemoteClusterAuthenticationService` - extracted based on the existing `CrossClusterAccessAuthenticationService`
- `RemoteClusterTransportInterceptor` - which aims to abstract and move all "cross-cluster access" logic from `SecurityServerTransportInterceptor` 

This is prerequisite for making remote cluster security logic pluggable (which I plan to add in a followup PR).

Relates to ES-12801
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Sep 9, 2025
…tic#134245)

This refactoring is meant to be purely structural, with no functional changes. 
It introduces two interfaces:
- `RemoteClusterAuthenticationService` - extracted based on the existing `CrossClusterAccessAuthenticationService`
- `RemoteClusterTransportInterceptor` - which aims to abstract and move all "cross-cluster access" logic from `SecurityServerTransportInterceptor` 

This is prerequisite for making remote cluster security logic pluggable (which I plan to add in a followup PR).

Relates to ES-12801
elasticsearchmachine pushed a commit that referenced this pull request Oct 6, 2025
Initially, the
`RemoteClusterTransportInterceptor#getProfileTransportFilters` required
remote-cluster security extensions to provide transport filters for all
transport profiles. The method was too generic and not specific to only
`_remote_cluster` transport profile. This meant that RCS extensions were
free to decide which filter they wanted to "override" with its custom
transport filter implementation. This turned out to be unnecessary,
because RCS extensions only ever need to provide a custom implementation
for the remote cluster profile. This refactoring removes the need to
provide the "default" `ServerTransportFilter` in order for security to
work. 

Followups to:  -
#134785 (comment)
-
#134785 (comment)
-
#134245 (comment)
---  - Converted `ServerTransportFilter` to interface with a default
implementation.  - Refactored `RemoteClusterTransportInterceptor` to
allow optionally providing only a custom remote cluster transport
filter. It's no longer required from RCS extensions to provide the
default `ServerTransportFilter` implementation. - Split transport
interceptor and filter tests:   - `ServerTransportFilterTests` became
abstract and got split into two tests:
`CrossClusterAccessServerTransportFilterTests` and
`DefaultServerTransportFilterTests`   - Cross-cluster access tests got
extracted from `SecurityServerTransportInterceptorTests` into its own
`CrossClusterAccessTransportInterceptorTests`class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Security/Security Security issues without another label Team:Security Meta label for security team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants