Skip to content

Conversation

jfreden
Copy link
Contributor

@jfreden jfreden commented Oct 9, 2025

This is a follow up to #134137, #134893, #135674 and #134604.

This PR pulls the DN out of the provided signature leaf certificate and uses it to match against the configured certificate_identity pattern on the cross cluster api key.

This PR does not deal with audit logging, that will be done in a followup PR.

assertCrossClusterSearchSuccessfulWithResult();

// Change the CA to the default trust store to make sure untrusted signature fails auth even if it's not required
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why fail if a certificate is invalid but not required? Is it to protect against misconfiguration?

// Always validate a signature if provided
if (signature != null && verifySignature(signature, crossClusterAccessHeaders, listener) == false) {
// TODO audit logging
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With signature verification happening before API key validation, if the certificate is invalid but the API key is also expired, the user sees 'invalid certificate' and never learns about the expired key. Is this the intended priority?

I do think this way makes sense; if trust can't be established, then there isn't much point in validating the API key itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order right now is:

  1. Validate that the API key is valid, this happens first because of authenticateHeaders
  2. Validate that the provided certificate identity is ok also happens in authenticateHeaders
  3. Validate signature

With the current implementation (1) and (2) would actually not generate an audit log from what I can see. Since (1) is the current implementation I think it's a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the intended priority?

My thinking is that validating the signature on every request, even failing ones is potentially expensive, so doing it in auth headers is probably not right. The point of auth headers is to short circuit the auth flow.

Doing the signature verification first in the auth flow decouples it from the rest of the api key service, if we don't do it there we require the api key credentials to contain all the signable payload + signature info and to pass it along to the api key service where we do validation. Not sure if that's a good enough reason, but I don't think it matters if we do it first or last, so keeping decoupling and doing it early is probably acceptable.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now I responded to you on Slack but not here. This argument makes sense to me, I don't ultimately see a huge difference between checking it first or last, and architecturally it's easier to do here than trying to couple it into the rest of the auth flow.

assertCrossClusterSearchSuccessfulWithResult();

// Change the CA to the default trust store to make sure untrusted signature fails auth even if it's not required
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why fail if a certificate is invalid but not required? Is it to catch misconfigurations early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if a signature is provided I think we should try to validate it and fail if it's not correct since it's unexpected. Better to be less lenient in this case I think.

Copy link
Contributor

@tvernum tvernum Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also gives customers a way to progressively switch over to signing. If they configure signing on the origin cluster, and it doesn't break, then they can rely on the fact that it's probably working correctly. Then they can turn on enforcement on the linked cluster.

And if that first step does break, then they can just turn off signing on the origin, without needing to reconfigure both sides.

@gmjehovich
Copy link
Contributor

Thanks for working on this! The approach looks good. Moving cert validation into the authentication flow should fix the double audit logging I was seeing. I just left some questions to help with my understanding

@jfreden jfreden force-pushed the validate_signature_identity branch from 79d328b to d1293d5 Compare October 10, 2025 11:46
@jfreden jfreden changed the title Validate signature identity Validate certificate identity from cross cluster creds Oct 10, 2025
@jfreden jfreden force-pushed the validate_signature_identity branch 3 times, most recently from 93eef5b to cc2e580 Compare October 14, 2025 14:40
}
}

private ApiKeyService createApiKeyService(Settings baseSettings, FeatureService customFeatureService) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused so I removed it.

Copy link
Contributor

@gmjehovich gmjehovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked some into the test failures from the CI. The serverless ones seem unrelated, but the failing test on the ci test is interesting.

It looks like that case in testInvalidHeaders never triggers the callback passed to the call to authenticationService.authenticate as [this line] (

final CrossClusterAccessSubjectInfo subjectInfo = crossClusterAccessHeaders.getCleanAndValidatedSubjectInfo();
) with the call to crossClusterAccessHeaders.getCleanAndValidatedSubjectInfo() should be triggering that exception I think

// Always validate a signature if provided
if (signature != null && verifySignature(signature, crossClusterAccessHeaders, listener) == false) {
// TODO audit logging
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now I responded to you on Slack but not here. This argument makes sense to me, I don't ultimately see a huge difference between checking it first or last, and architecturally it's easier to do here than trying to couple it into the rest of the auth flow.

@jfreden jfreden force-pushed the validate_signature_identity branch from 87edbc9 to 2618ac9 Compare October 15, 2025 08:02
@jfreden jfreden force-pushed the validate_signature_identity branch from 8cce705 to 99fac10 Compare October 15, 2025 08:57
@jfreden jfreden added :Security/Security Security issues without another label >enhancement labels Oct 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jfreden, I've created a changelog YAML for you.

@jfreden jfreden changed the title Validate certificate identity from cross cluster creds Validate Certificate Identity provided in Cross Cluster API Key Certificate Oct 15, 2025
@jfreden jfreden marked this pull request as ready for review October 15, 2025 09:05
@jfreden jfreden requested a review from tvernum October 15, 2025 09:05
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Oct 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@jfreden jfreden requested a review from a team October 15, 2025 14:27
@jfreden
Copy link
Contributor Author

jfreden commented Oct 15, 2025

Ended up adding a pattern cache. Still need to add some testing for it but the basics are there. Will continue tomorrow.

@jfreden jfreden force-pushed the validate_signature_identity branch from 8652f82 to d63b4ec Compare October 16, 2025 07:01
TimeValue.timeValueHours(48L),
Property.NodeScope
);
private static final int MAX_PATTERN_CACHE_SIZE = 1000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1000 concurrent cross cluster api keys being used at the same time is a large number (and unlikely). I initially set this to 100 but this give us some good margins.

);
public static final Setting<TimeValue> CERTIFICATE_IDENTITY_PATTERN_CACHE_TTL_SETTING = Setting.timeSetting(
"xpack.security.authc.api_key.certificate_identity_pattern_cache.ttl",
TimeValue.timeValueHours(48L),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a pattern live for a very long time (48 hours) makes sense here since there might be a long time between access for certain keys.

Copy link
Contributor

@gmjehovich gmjehovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I'm approving this PR to merge. 👍

I left a separate, non-blocking comment/question about the memory safety risk concerning the pattern cache limit along with the fact that the certificate_identity field is user defined.

TimeValue.timeValueHours(48L),
Property.NodeScope
);
private static final int MAX_PATTERN_CACHE_SIZE = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a cache here makes sense, good addition!

I'm curious about the potential maximum size the cache can reach in terms of memory usage. This sets the max number of entries, but since we're permissive with how users define the certificate\_identity regex, I wonder how big the memory usage could get if people start using huge regexes. In the worst case, I wonder if this could be used maliciously to exhaust memory?

What are your thoughts here? Is this actually something we need to be concerned about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unlikely that a user would be creating API keys maliciously, since creating a cross cluster api key requires the manage_security cluster privilege, which allows a user to do all security-related operations. So if they're malicious and hold the manage_security privilege we probably have bigger issues.

In terms of accidentally creating a lot of API keys with complex patterns and then using them all concurrently over 48 hours, I think could happen for a script that's out of control, but probably not intentionally. If that would happen and each pattern would be very complex, maybe 100KB? That's 100MB of the heap, which isn't insignificant on a small node, but also unlikely. We could lower the limit to 100 patterns since we're not expected to hit that either to be completely safe.

updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build());
assertCrossClusterAuthFail("Failed to verify cross cluster api key signature certificate from [(");

// TODO add test for certificate identity regex matching (should 200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this test?
I can't see it.

Copy link
Contributor Author

@jfreden jfreden Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do randomFrom("CN=instance", "^CN=instance$", "(?i)^CN=instance$", "^CN=[A-Za-z0-9_]+$") when we set up the API key as part of setting up the query cluster.

authcContext = authenticationService.newContext(action, request, apiKeyCredentials);
var signature = crossClusterAccessHeaders.signature();

// Always validate a signature if provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a specific suggestion but this comment threw me a little because I was worried that we could accept a certificate identity (and verify it against the API key) but then not actually require that the request was signed.

That's not true because signature is an instance of X509CertificateSignature so everything is contained together, and it's not possible to have signature == null and somehow have a certificate identity.

But if there's a way to make that clearer in this code, it would be good.

One option would be to rename the signature var to be signingInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the variable name and added some more comments. Let me know what you think. Thanks!

@jfreden jfreden force-pushed the validate_signature_identity branch from f39c655 to ed4093a Compare October 17, 2025 11:29
@jfreden jfreden added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cloud-deploy Publish cloud docker image for Cloud-First-Testing >enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants