-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Validate Certificate Identity provided in Cross Cluster API Key Certificate #136299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
assertCrossClusterSearchSuccessfulWithResult(); | ||
|
||
// Change the CA to the default trust store to make sure untrusted signature fails auth even if it's not required | ||
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, why fail if a certificate is invalid but not required? Is it to protect against misconfiguration?
// Always validate a signature if provided | ||
if (signature != null && verifySignature(signature, crossClusterAccessHeaders, listener) == false) { | ||
// TODO audit logging | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With signature verification happening before API key validation, if the certificate is invalid but the API key is also expired, the user sees 'invalid certificate' and never learns about the expired key. Is this the intended priority?
I do think this way makes sense; if trust can't be established, then there isn't much point in validating the API key itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order right now is:
- Validate that the API key is valid, this happens first because of authenticateHeaders
- Validate that the provided certificate identity is ok also happens in authenticateHeaders
- Validate signature
With the current implementation (1) and (2) would actually not generate an audit log from what I can see. Since (1) is the current implementation I think it's a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the intended priority?
My thinking is that validating the signature on every request, even failing ones is potentially expensive, so doing it in auth headers is probably not right. The point of auth headers is to short circuit the auth flow.
Doing the signature verification first in the auth flow decouples it from the rest of the api key service, if we don't do it there we require the api key credentials to contain all the signable payload + signature info and to pass it along to the api key service where we do validation. Not sure if that's a good enough reason, but I don't think it matters if we do it first or last, so keeping decoupling and doing it early is probably acceptable.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realizing now I responded to you on Slack but not here. This argument makes sense to me, I don't ultimately see a huge difference between checking it first or last, and architecturally it's easier to do here than trying to couple it into the rest of the auth flow.
assertCrossClusterSearchSuccessfulWithResult(); | ||
|
||
// Change the CA to the default trust store to make sure untrusted signature fails auth even if it's not required | ||
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, why fail if a certificate is invalid but not required? Is it to catch misconfigurations early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if a signature is provided I think we should try to validate it and fail if it's not correct since it's unexpected. Better to be less lenient in this case I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also gives customers a way to progressively switch over to signing. If they configure signing on the origin cluster, and it doesn't break, then they can rely on the fact that it's probably working correctly. Then they can turn on enforcement on the linked cluster.
And if that first step does break, then they can just turn off signing on the origin, without needing to reconfigure both sides.
Thanks for working on this! The approach looks good. Moving cert validation into the authentication flow should fix the double audit logging I was seeing. I just left some questions to help with my understanding |
79d328b
to
d1293d5
Compare
93eef5b
to
cc2e580
Compare
} | ||
} | ||
|
||
private ApiKeyService createApiKeyService(Settings baseSettings, FeatureService customFeatureService) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused so I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked some into the test failures from the CI. The serverless ones seem unrelated, but the failing test on the ci test is interesting.
It looks like that case in testInvalidHeaders
never triggers the callback passed to the call to authenticationService.authenticate
as [this line] (
Line 127 in 87edbc9
final CrossClusterAccessSubjectInfo subjectInfo = crossClusterAccessHeaders.getCleanAndValidatedSubjectInfo(); |
crossClusterAccessHeaders.getCleanAndValidatedSubjectInfo()
should be triggering that exception I think
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/xpack/remotecluster/RemoteClusterSecurityCrossClusterApiKeySigningIT.java
Show resolved
Hide resolved
// Always validate a signature if provided | ||
if (signature != null && verifySignature(signature, crossClusterAccessHeaders, listener) == false) { | ||
// TODO audit logging | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realizing now I responded to you on Slack but not here. This argument makes sense to me, I don't ultimately see a huge difference between checking it first or last, and architecturally it's easier to do here than trying to couple it into the rest of the auth flow.
87edbc9
to
2618ac9
Compare
8cce705
to
99fac10
Compare
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
Hi @jfreden, I've created a changelog YAML for you. |
Pinging @elastic/es-security (Team:Security) |
Ended up adding a pattern cache. Still need to add some testing for it but the basics are there. Will continue tomorrow. |
8652f82
to
d63b4ec
Compare
TimeValue.timeValueHours(48L), | ||
Property.NodeScope | ||
); | ||
private static final int MAX_PATTERN_CACHE_SIZE = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 1000 concurrent cross cluster api keys being used at the same time is a large number (and unlikely). I initially set this to 100 but this give us some good margins.
); | ||
public static final Setting<TimeValue> CERTIFICATE_IDENTITY_PATTERN_CACHE_TTL_SETTING = Setting.timeSetting( | ||
"xpack.security.authc.api_key.certificate_identity_pattern_cache.ttl", | ||
TimeValue.timeValueHours(48L), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a pattern live for a very long time (48 hours) makes sense here since there might be a long time between access for certain keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! I'm approving this PR to merge. 👍
I left a separate, non-blocking comment/question about the memory safety risk concerning the pattern cache limit along with the fact that the certificate_identity
field is user defined.
TimeValue.timeValueHours(48L), | ||
Property.NodeScope | ||
); | ||
private static final int MAX_PATTERN_CACHE_SIZE = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a cache here makes sense, good addition!
I'm curious about the potential maximum size the cache can reach in terms of memory usage. This sets the max number of entries, but since we're permissive with how users define the certificate\_identity
regex, I wonder how big the memory usage could get if people start using huge regexes. In the worst case, I wonder if this could be used maliciously to exhaust memory?
What are your thoughts here? Is this actually something we need to be concerned about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's unlikely that a user would be creating API keys maliciously, since creating a cross cluster api key requires the manage_security
cluster privilege, which allows a user to do all security-related operations. So if they're malicious and hold the manage_security
privilege we probably have bigger issues.
In terms of accidentally creating a lot of API keys with complex patterns and then using them all concurrently over 48 hours, I think could happen for a script that's out of control, but probably not intentionally. If that would happen and each pattern would be very complex, maybe 100KB? That's 100MB of the heap, which isn't insignificant on a small node, but also unlikely. We could lower the limit to 100 patterns since we're not expected to hit that either to be completely safe.
updateClusterSettingsFulfillingCluster(Settings.builder().putNull("cluster.remote.signing.certificate_authorities").build()); | ||
assertCrossClusterAuthFail("Failed to verify cross cluster api key signature certificate from [("); | ||
|
||
// TODO add test for certificate identity regex matching (should 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have this test?
I can't see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do randomFrom("CN=instance", "^CN=instance$", "(?i)^CN=instance$", "^CN=[A-Za-z0-9_]+$")
when we set up the API key as part of setting up the query cluster.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
authcContext = authenticationService.newContext(action, request, apiKeyCredentials); | ||
var signature = crossClusterAccessHeaders.signature(); | ||
|
||
// Always validate a signature if provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a specific suggestion but this comment threw me a little because I was worried that we could accept a certificate identity (and verify it against the API key) but then not actually require that the request was signed.
That's not true because signature
is an instance of X509CertificateSignature
so everything is contained together, and it's not possible to have signature == null
and somehow have a certificate identity.
But if there's a way to make that clearer in this code, it would be good.
One option would be to rename the signature
var to be signingInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the variable name and added some more comments. Let me know what you think. Thanks!
…ecurity/authc/ApiKeyService.java Co-authored-by: Tim Vernum <[email protected]>
…ecurity/authc/ApiKeyService.java Co-authored-by: Tim Vernum <[email protected]>
f39c655
to
ed4093a
Compare
This is a follow up to #134137, #134893, #135674 and #134604.
This PR pulls the DN out of the provided signature leaf certificate and uses it to match against the configured
certificate_identity
pattern on the cross cluster api key.This PR does not deal with audit logging, that will be done in a followup PR.