Skip to content

Conversation

@n1v0lg
Copy link
Contributor

@n1v0lg n1v0lg commented Sep 27, 2024

Draft PR for ECK regression around moving operator-defined role mappings from the security index to cluster-state (ES-9628).

Some background first that will make the implementation choices below easier to follow:

  1. The only source of cluster-state role mappings today are operator-defined file-based settings (ECK and Serverless). In stateful, there is no supported way to enable and use cluster-state role mappings. This means that cluster-state role mappings are currently synonymous with operator-defined role mappings.
  2. We want to make this fix quickly; therefore it does not include any migrations or data clean up (e.g., removing outdated operator-defined role mappings from the security index). It also does not include force-writing file-based role mappings to cluster state.

The PR contains the following changes, bringing API behavior back as closely as possible to what it was before the regression:

  1. The native role mapping Get API now includes role mappings stored in cluster-state, in addition to native role mappings. If a role mapping with the same name is defined both in the security index and in cluster state, the cluster state mapping is returned and the native role mapping skipped.
  2. The same applies to the actual role mapping process: cluster state role mappings are now included in role mapping, and take precedence if there is a name clash with a native role mapping.
  3. The native role mapping Put and Delete APIs prevent writes to role mappings with names clashing with cluster-state role mappings.

Implementation notes:

  • This PR extends the NativeRoleMappingStore to include a ReservedRoleMapping class; this class is used to prevent writes with clashing names and plugs into reading role mappings, both to serve the API and role resolution.
  • It defaults cluster_state_role_mappings.enabled to true. Ideally, we would instead enable them for ECK. However, this would require coupling this fix with an ECK release; furthermore, in ECK, users can modify any and all settings. I've made the PR as much of a noop as possible for when cluster-state role mappings are not populated. Therefore, having them always on should be low impact on environments where they are not available.

@n1v0lg n1v0lg added >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) labels Sep 27, 2024
@n1v0lg n1v0lg self-assigned this Sep 27, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @n1v0lg, I've created a changelog YAML for you.

new ReservedRoleMappings(clusterStateRoleMapper)
);
final ClusterStateRoleMapper clusterStateRoleMapper = new ClusterStateRoleMapper(settings, scriptService, clusterService);
final UserRoleMapper userRoleMapper = new CompositeRoleMapper(nativeRoleMappingStore, clusterStateRoleMapper);
Copy link
Contributor Author

@n1v0lg n1v0lg Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (getting rid of CompositeRoleMapper) is a little nuanced. The reason I'm making it is:

The PR changes the NativeRoleMappingStore to fetch "reserved" (i.e., cluster-state-based) role mappings and use these instead of any native role mappings with the same name (see mergeWithReserved). This means that we read cluster-state role mappings in the NativeRoleMappingStore, and substitute mappings based on what we read. If we stuck with the CompositeRoleMapper here, we'd "double-read" cluster-state-based role mappings.

An alternative I originally considered is to filter out native role mappings that match reserved ones and then simply rely on CompositeRoleMapper here to fill in cluster-state ones. However, this would introduce a race condition:

  1. When we get role mappings from the native store, we read cluster-state and filter out.
  2. We returning the result.
  3. Concurrently, cluster state changes (mappings added or removed).
  4. We read cluster state again, including the new mappings.

However, this means we have a "stale" view obtained in step 2.

By going with current approach in this PR instead (remove CompositeRoleMapper), we avoid this race condition since we only read cluster state once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky! I think your change here looks good. I thought of an alternative explained below.

I think it would be clearer to keep the role mapping stores separate and then leave the merging of them to some external class. Priorities can change and the way we want to merge them could change too. With your change ClusterStateRoleMapper is injected into the NativeRoleMappingStore through the wrapping class ReservedRoleMappings and you're using ClusterStateRoleMapper "internals" (getMappings - not exactly an internal but I think it was only public for testing purposes). It does violate the UserRoleMapper interface a little.

I think adding a new MergingCompositeRoleMapper (or a better name) that handles both read and write (so both getRoleMappings and putRoleMapping) would be more general. For read it would merge the result of the provided UserRoleMappers where priority could be the order of arguments passed (so ClusterStateRoleMapper would be passed first to have priority). For writes it could check the ClusterStateRoleMapper for reserved names before writing (according to the configuration) to the native store.

Copy link
Contributor Author

@n1v0lg n1v0lg Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered this (see a stub of it here) however it comes out a much bigger change, and I'm not sure it's worth it because the abstraction itself is sort of neither here nor there. The problems are:

The new class is neither Merging, nor a role mapper. Instead it's a class that merges on reads, but prevents reserved writes. So it's more of a ReservedRoleMappingWrappedStore. And it's not just a role mapper (that only handles resolving roles) but a role mapping store. The latter requires a whole new interface or abstract class, and plugging that interface in everywhere NativeRoleMappingStore is currently used. Do-able, but NativeRoleMappingStore is used in several actions and tests.

I'm just not sure that it's worth it to introduce all this generic structure to handle something that's unlikely to generalize. The change is already quite nuanced so I worry that making the delta bigger makes it harder to review and reason about. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see what you mean. The interfaces are note exactly built for that and the change is big.

I'm just not sure that it's worth it to introduce all this generic structure to handle something that's unlikely to generalize. WDYT?

I think the coupling this introduces is a little concerning, but I don't think this is the right time to be making a huge refactor, so I think leaving it is fine.

} else {
names = new HashSet<>(Arrays.asList(request.getNames()));
}
// TODO make sure result is deterministic
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean here is ordered the same each time: the respond now contains the output of ClusterStateRoleMapper#getMappings which is a set and therefore not well-ordered.

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good (only a quick review so far)

* of the {@link NativeRoleMappingStore} and this mapper.</li>
* </ul>
*/
// TODO we need to register this setting as an escape hatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should expose this as setting if it is always true and will break things if we set it false. I think we rely (and should double check) the no-op,no-error behavior when the settings.json is not present. We should update the comments to ensure that it is know this is the ECK and serverless path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should expose this as setting if it is always true and will break things if we set it false.

I'm thinking of stateful, non-ECK deployments. The cluster-state role mapper still has some overhead in that it checks cluster state. If there is an issue we aren't thinking off, I'd really like there to be an escape hatch for stateful, non-ECK customers that by definition don't use cluster-state role mappings. I don't feel very strongly, but I think having a filtered, undocumented setting is a nice-to-have.

this.scriptService = scriptService;
this.lastLoadCacheEnabled = LAST_LOAD_CACHE_ENABLED_SETTING.get(settings);
this.enabled = settings.getAsBoolean(NATIVE_ROLE_MAPPINGS_ENABLED, true);
this.reservedRoleMappings = reservedRoleMappings;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this since it simplifies things and nicely makes this change a no-op for serverless since IIUC this is completely disabled in serverless (and likley always will be)

public void getRoleMappings(Set<String> names, ActionListener<List<ExpressionRoleMapping>> listener) {
innerGetRoleMappings(
names,
listener.delegateFailureAndWrap((l, mappings) -> l.onResponse(reservedRoleMappings.mergeWithReserved(mappings)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am getting abit lost to where ClusterStateRoleMapper#resolveRoles for reservedRoleMappings is called so that is garunteed to have the mappings already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterStateRoleMapper::resolveRoles is just wrapping ClusterStateRoleMapper::getMappings that looks at the current cluster state and resolves the roles from the current cluster state. ClusterStateRoleMapper::resolveRoles was previously called from the CompositeRoleMapper but with this change getMappings is instead called directly (so resolveRoles is skipped) as far as I can see.

return clusterStateRoleMapper.getMappings().stream().anyMatch(roleMapping -> roleMapping.getName().equals(roleMappingName));
}

// TODO find a cleaner way
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach would also be cleaner here.

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Oct 1, 2024

Closing in favor of: #113900

@n1v0lg n1v0lg closed this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) test-update-serverless v8.15.3 v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants