Skip to content

Conversation

@Deydra71
Copy link
Contributor

@Deydra71 Deydra71 commented Apr 16, 2025

Jira: OSPRH-14737

This PR introduces a new ApplicationCredential (AC) controller in the keystone-operator. It watches ApplicationCredential custom resources and performs these actions:

  1. Creates Keystone ApplicationCredentials for each CR (authenticating as that user due to Keystone’s default policy)
  2. Stores the AC’s ID and Secret in a k8s secret
  3. Implements rotation logic based on expirationDays and gracePeriodDays:
    • Reconciles at least once a day, rotating any AC that’s within or past its grace window
    • If an AC is already in the grace period at the next reconcile, it rotates immediately
    • The old ApplicationCredential in Keystone is not revoked on rotation (it naturally expires)

Additionally:

  • The controller waits for a KeystoneAPI resource to be Ready before proceeding with AC operations

Notes:

  • CRD & RBAC for the ApplicationCredential resource are not automatically installed yet. These must be applied manually until openstack-operator integration is complete

To apply rbac permissions run oc edit clusterrole keystone-operator-manager-role and add:

- apiGroups:
  - keystone.openstack.org
  resources:
  - applicationcredentials
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

- apiGroups:
  - keystone.openstack.org
  resources:
  - applicationcredentials/finalizers
  verbs:
  - patch
  - update

- apiGroups:
  - keystone.openstack.org
  resources:
  - applicationcredentials/status
  verbs:
  - get
  - patch
  - update

Example AC CR for barbican service user:

apiVersion: keystone.openstack.org/v1beta1
kind: ApplicationCredential
metadata:
  name: ac-barbican
  namespace: openstack
spec:
  expirationDays: 365
  gracePeriodDays: 182
  passwordSelector: BarbicanPassword
  roles:
  - service
  secret: osp-secret
  userName: barbican

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 16, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Deydra71
Once this PR has been reviewed and has the lgtm label, please assign dprince for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

) (string, error) {

// The name of the Secret containing the service passwords
const ospSecretName = "osp-secret"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this - but basically we provide the ability for the operators to specify the secret that contains the admin user password. This is osp-secret by default - but it need not be. See https://github.com/openstack-k8s-operators/barbican-operator/blob/main/api/v1beta1/common_types.go#L44 for instance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should not hardcode any value like this. It could be part of the deployment YAML spec.

return "", fmt.Errorf("failed to get Secret/%s: %w", ospSecretName, err)
}

key := capitalizeFirst(userName) + "Password"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a default, but its not what is specified. In barbican, for instance we have a PasswordSelectors field - https://github.com/openstack-k8s-operators/barbican-operator/blob/main/api/v1beta1/common_types.go#L49 which identifies the correct key. But, this needn't be the case.

Ultimately, I think you are going to need to have the AC specification include the name of the user, the name of the user secret and the relevant field. You can set these appropriately in openstack-operator.

Comment on lines 279 to 276
// Always assign these roles:
Roles: []applicationcredentials.Role{
{Name: "admin"},
{Name: "service"},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes this controller less generic and therefore potentially less useful. Perhaps this is another parameter - like the access rules that should be passed in as part of the AC spec.

The same can also be said for the Unrestricted field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is support for all of Roles, Unrestricted and AccessRules in the gophercloud call - https://pkg.go.dev/github.com/gophercloud/gophercloud/openstack/identity/v3/applicationcredentials#CreateOpts

}

// Otherwise check again in 24 hours
return defaultRequeue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be return rotateAt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rotateAt is a timestamp, we need return a duration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that makes sense. I (mis)understood this function when I first read it.

So if I understand this correctly now, we should be returning a recheck duration of 24 hours, unless we are already in the grace period - in which case we would be returning a 0 to immediately recheck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly right

// default requeue is 24h as minimal grace period is 1 day
defaultRequeue := 24 * time.Hour
if ac.Status.ExpiresAt == nil || ac.Status.ExpiresAt.IsZero() {
return defaultRequeue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious - why would we want to requeue if the application credential does not expire?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this as an ultimate fallback to wake the controller at least once a day. If because of some error the status of the CR is not updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

return "fallback" // placeholder for generating failure
}
s := hex.EncodeToString(b)
return s[:n]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unlikely, but should we check for collisions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check if the AC with same suffix already exists, but unless we are creating millions of AC in short period the chance is basically zero. Or we can increase n.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this function be a lib-common utility in the long term?


logger := r.GetLogger(ctx)

// Only if user explicitly does "oc delete" do we revoke the AC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its actually quite useful, I think, to have the delete revoke the application credential. This gives us a nice way to do a revocation.

The problem is, of course, that there will be some time between when the app cred is revoked and the new one is issued.

I wonder if there is a way to trigger a rotation without doing a delete. Could we implement a reconcileUpdate to do this instead? Then, the procedure when trying to revoke the cert would be to -

  1. patch the AC so that we are within the grace period. This triggers the creation of a new AC.
  2. Wait for the new AC to be propagated.
  3. Revoke the old AC by deleting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just concerned about step 2. - Wait for the new AC to be propagated. Because AC controller would need a feedback that the service deployment successfully rolled out with new credential, which would add another logic to watch the deployment status in AC controller. I’d prefer to leave revocation as a manual step for now.

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not finished my initial review, but probably won't have time today to do so. will continue tomorrow, and just wanted to add what I have so far.

Comment on lines 144 to 152
// Decide if we need to create (or rotate)
needsRotation := false
if instance.Status.ACID == "" {
needsRotation = true
logger.Info("AC does not exist, creating")
} else if r.shouldRotateNow(instance) {
needsRotation = true
logger.Info("AC is within grace period, rotating")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets split this out into a func

// Check if KeystoneAPI is ready
keystoneAPI, err := keystonev1.GetKeystoneAPI(ctx, helperObj, instance.Namespace, nil)
if err != nil {
logger.Info("KeystoneAPI not found, requeue", "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should set KeystoneAPIReadyCondition

return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
}
if !keystoneAPI.IsReady() {
logger.Info("KeystoneAPI not ready, requeue")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Ensure we have an initial ReadyCondition
condList := condition.CreateList(
condition.UnknownCondition(condition.ReadyCondition, condition.InitReason, condition.ReadyInitMessage),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ReadyCondition gets auto added in Init.

should init KeystoneAPIReadyCondition and set it when checking for the keystoneapi bellow

			condition.UnknownCondition(keystonev1.KeystoneAPIReadyCondition, condition.InitReason, keystonev1.KeystoneAPIReadyInitMessage),

I think we could just use the same condition init as in https://github.com/openstack-k8s-operators/keystone-operator/blob/main/controllers/keystoneendpoint_controller.go#L111-L127 and just init it with what is used in this controller?

}

// Check if KeystoneAPI is ready
keystoneAPI, err := keystonev1.GetKeystoneAPI(ctx, helperObj, instance.Namespace, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.Info("KeystoneAPI not ready, requeue")
return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'd move all the above tasks into the main reconcile func as these are also the steps done for handling deletion. then you won't need login to check for the keystoneapi in the cleanup method. like https://github.com/openstack-k8s-operators/keystone-operator/blob/main/controllers/keystoneendpoint_controller.go#L225 where the admin client got checked before

logger := r.GetLogger(ctx)
adminOS, ctrlResult, err := keystonev1.GetAdminServiceClient(ctx, helperObj, keystoneAPI)
if err != nil {
return "", err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil {
logger.Error(err, "Failed to find user ID")
instance.Status.Conditions.Set(condition.FalseCondition(
condition.ReadyCondition,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be not the correct condition. the ReadyCondition is only set in the defer function based on the sub condition status during the reconcile

}
savedConditions := instance.Status.Conditions.DeepCopy()

// Defer patch logic (skips if we are deleting)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keystoneAPI, err := keystonev1.GetKeystoneAPI(ctx, helperObj, instance.Namespace, nil)
if err == nil && keystoneAPI.IsReady() {
userID, userErr := r.findUserIDAsAdmin(ctx, helperObj, keystoneAPI, instance.Spec.UserName)
if userErr == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to reduce those amount of nested ifs.

what if there is an userErr, which is not "user not found"? is it ok continue and remove the finalizer? shouldn't we return the actual error if it is not "user not found" ?

can't we just do

userID, userErr := r.findUserIDAsAdmin(ctx, helperObj, keystoneAPI, instance.Spec.UserName)
if userErr != nil {
   return ctrl.Result{}, err
}
userOS, userRes, userErr2 := keystonev1.GetUserServiceClient(ctx, helperObj, keystoneAPI, instance.Spec.UserName)
...

) (string, error) {

// The name of the Secret containing the service passwords
const ospSecretName = "osp-secret"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should not hardcode any value like this. It could be part of the deployment YAML spec.

return "", fmt.Errorf("failed to get Secret/%s: %w", ospSecretName, err)
}

key := capitalizeFirst(userName) + "Password"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to capitalize the username's first letter? Also, why are you harcoding "Password"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just to filter user password from osp-secret. Now, passwordSelector will be passed to AC CR, so that will be used for filtering

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code assumed this password scheme in osp-secret:

BarbicanPassword: <secret_data>
CinderPassword: <secret_data>
GlancePassword: <secret_data>
…

So to pick the right field we had to:

  1. Take the userName ("barbican")
  2. Capitalize the first letter → "Barbican"
  3. Append "Password" → "BarbicanPassword"

and then look up that key in the secret. Of course that'd force everyone to use that convention. So now openstack-operator extracts and passes into AC CR these as well:

spec:
  secret: # the Secret name (by default it's osp-secret)
  passwordSelector:  # how we extract this key, e.g. BarbicanPassword
  userName: # e.g. barbican, user can customize this in control plane spec
  …

@Deydra71 Deydra71 force-pushed the underlined-ac-support branch from 38cb512 to 15fcdb9 Compare May 14, 2025 11:50
@Deydra71
Copy link
Contributor Author

The latest update makes the controller take into consideration the custom password secret, custom service user name, and password selectors. Continuing to address other reviews.

@Deydra71 Deydra71 force-pushed the underlined-ac-support branch from 15fcdb9 to 2d8a489 Compare May 16, 2025 13:50
@Deydra71
Copy link
Contributor Author

Deydra71 commented May 16, 2025

Latest update includes corrections based on some reviews (not all, will still continue), and also adds support for the Unrestricted, Roles and AccessRoles fields.

Currently AccessRoles field is not correctly passed to the AC CR by openstack-operator, so it's missing in the AC CR, and for services this field is specified in the openstackcontrolplane spec, ACs are not generated. We are continuing on Slack what should be in AccessRules fields in the first place.

And also automatic revocation is disabled for now for testing, based on what we will agree it will be enabled again.

@Deydra71 Deydra71 force-pushed the underlined-ac-support branch from 2d8a489 to 4962009 Compare May 19, 2025 12:35
scope := &gophercloud.AuthScope{
ProjectName: "service",
DomainName: "Default",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very wary about hard-coding stuff here. I think that;

  1. there may be cases where either the service project or DomainName may be different
  2. we might be interested in obtaining an app cred for a non-service user or for a different project. For example, I know that there is work ongoing upstream to do manila share encryption where the user would use app creds to retrieve a barbican secret.

In any case, I think it makes sense to make this function more generic - maybe call it GetUserClient() and pass in the domain and project name. You could then add these parameters to the app cred spec, and default to "service" and "Default". This will future-proof the interface a bit.

Copy link
Contributor Author

@Deydra71 Deydra71 May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but right now we don't have any spec.ServiceProject or spec.ServiceDomain. So, that's why this serves as a mean to only get scoped token for our service.

Since FR3 is explicitly about wiring up service-account auth, I suggest we leave this helper as is for now, and add a // TODO pointing at the future work to extend the CRD with project/domain fields and refactor into a generic GetUserClient(…, project, domain…)

The domain name is actually hard coded for admin as well - https://github.com/openstack-k8s-operators/keystone-operator/blob/main/api/v1beta1/keystoneapi.go#L157

Copy link
Contributor Author

@Deydra71 Deydra71 May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be cases where either the service project or DomainName may be different

Is this still true if we take into account only service users?

AuthURL: authURL,
Username: userName,
Password: password,
TenantName: "service",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto comment as above.

@softwarefactory-project-zuul
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/keystone-operator for 567,96d9dbbc697740ff8e557b2a881fe43c92920a5e

@Deydra71
Copy link
Contributor Author

Deydra71 commented Oct 2, 2025

Note: We have to add kuttl test in a separate PR, after eventual bump in openstack-operator, as the KeystoneApplicationCredential CRD would be missing in the env.

if instance.Status.SecretName != "" {
key := types.NamespacedName{Namespace: instance.Namespace, Name: instance.Status.SecretName}
secret := &corev1.Secret{}
if err := r.Get(ctx, key, secret); err == nil && controllerutil.ContainsFinalizer(secret, acSecretFinalizer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we do not need to call controllerutil.ContainsFinalizer as the same check implicitly exists in controllerutil.RemoveFinalizer, so you don't need this in the condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, didn't realize that the "contains" check is redundant.

if err := r.Get(ctx, key, secret); err == nil && controllerutil.ContainsFinalizer(secret, acSecretFinalizer) {
base := secret.DeepCopy()
controllerutil.RemoveFinalizer(secret, acSecretFinalizer)
_ = r.Patch(ctx, secret, client.MergeFrom(base))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check a potential error returned from Patch?

}

// Remove finalizer from the AC CR
if controllerutil.ContainsFinalizer(instance, finalizer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, you can attempt removing the finalizer via controllerutil.RemoveFinalizer as it removes it only if is present.


// createACWithName creates a new AC in Keystone
func (r *ApplicationCredentialReconciler) createACWithName(
logger logr.Logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we get logger like:

logger := r.GetLogger(ctx)

and avoid passing it to the function? we can instead pass ctx context.Context.

if err := helperObj.GetClient().Get(ctx, key, secret); err != nil {
return err
}
if !controllerutil.ContainsFinalizer(secret, acSecretFinalizer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, you can if controllerutil.AddFinalizer(secret ....) { directly (see [1])

[1] https://github.com/openstack-k8s-operators/infra-operator/blob/main/apis/topology/v1beta1/topology_types.go#L217

return "fallback" // placeholder for generating failure
}
s := hex.EncodeToString(b)
return s[:n]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this function be a lib-common utility in the long term?

return defaultRequeue
}

// needsRotation returns (shouldRotate, logMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe here we can explain better the returned parameters.
In addition I'm not sure you need both bool and string.
The string seems something that feeds the logger, while the boolean is what you need to process for real as a returned value, right? In any case, I'm not asking to change this function, I was just trying to see it in the picture of the whole flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added better description of the function. And you are right, the boolean is the return value that drives the logic, and string serves for logging.

}

// getUserIDFromToken extracts the user ID from the authenticated token
func (r *ApplicationCredentialReconciler) getUserIDFromToken(_ logr.Logger, identClient *gophercloud.ServiceClient, username string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need logger here?

// Fetch AC data directly from the Secret
acData, err := keystonev1.GetApplicationCredentialFromSecret(
ctx, client, namespace, "barbican")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be, instead of "barbican", secretName that we got on L190 (and therefore ac-barbican-secret)?

@softwarefactory-project-zuul
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/keystone-operator for 567,c053c03a7f1eb49c24488ee351553e480a0d82d9

@Deydra71 Deydra71 force-pushed the underlined-ac-support branch from bcad4d2 to 94de270 Compare November 12, 2025 13:33
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 12, 2025

@Deydra71: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/keystone-operator-build-deploy-kuttl 94de270 link true /test keystone-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants