diff --git a/docs/hugo/content/contributing/_index.md b/docs/hugo/content/contributing/_index.md index 4c381e56a39..324aa176947 100644 --- a/docs/hugo/content/contributing/_index.md +++ b/docs/hugo/content/contributing/_index.md @@ -17,6 +17,7 @@ description: "How to contribute new resources to Azure Service Operator v2" * [Developer Setup]( {{< relref "developer-setup" >}} ). * [Adding a new code-generator resource]( {{< relref "add-a-new-code-generated-resource" >}} ). * [Example walkthrough of adding a new resource version]( {{< relref "upgrade-resource-version" >}}). +* [Extension Points]( {{< relref "extension-points" >}} ) - Customize resource behavior with extension interfaces. * [Generator code overview]( {{< relref "generator-overview" >}} ). * [Running a development version of ASO]( {{< relref "running-a-development-version" >}} ). * [Testing]( {{< relref "testing" >}} ). diff --git a/docs/hugo/content/contributing/extension-points/_index.md b/docs/hugo/content/contributing/extension-points/_index.md new file mode 100644 index 00000000000..bce3eaec25a --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/_index.md @@ -0,0 +1,75 @@ +--- +title: Extension Points +linktitle: Extension Points +weight: 100 +menu: + main: + parent: Contributing +description: "How to extend Azure Service Operator v2 resources with custom behavior" +--- + +Azure Service Operator v2 provides several extension points that allow customization of resource behavior. These extension points enable contributors to compensate for variation in the behavior of resources by adding custom logic at various stages of the resource lifecycle. + +## Overview + +Extension points are Go interfaces defined in `v2/pkg/genruntime/extensions/` that resources can implement to customize their behavior. When a resource implements an extension interface, the controller will invoke the custom logic at the appropriate time during reconciliation. + +### Extension Implementation Pattern + +Extensions are typically implemented in resource-specific files under `v2/api//customizations/_extensions.go`. The general pattern is: + +1. Declare that your extension type implements the interface: + ```go + var _ extensions.ARMResourceModifier = &MyResourceExtension{} + ``` + +2. Implement the required method(s) of the interface + +3. The controller automatically detects and uses the extension through a type-check at the appropriate time + +### Available Extension Points + +The following extension points are available for customizing resource behavior: + +| Extension Point | Purpose | When Invoked | +|-----------------|---------|--------------| +| [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}) | Modify the ARM payload before sending to Azure | Just before PUT/PATCH to ARM | +| [Deleter]({{< relref "deleter" >}}) | Customize resource deletion behavior | When resource is being deleted | +| [ErrorClassifier]({{< relref "error-classifier" >}}) | Classify ARM errors as retryable or fatal | When ARM returns an error | +| [Importer]({{< relref "importer" >}}) | Customize resource import behavior | During `asoctl import` operations | +| [KubernetesSecretExporter]({{< relref "kubernetes-secret-exporter" >}}) | Export secrets to Kubernetes | After successful reconciliation | +| [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}) | Perform post-reconciliation validation | After ARM reconciliation succeeds | +| [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}) | Validate before reconciling | Before sending requests to ARM | +| [PreReconciliationOwnerChecker]({{< relref "pre-reconciliation-owner-checker" >}}) | Validate owner state before reconciling | Before any ARM operations (including GET) | +| [SuccessfulCreationHandler]({{< relref "successful-creation-handler" >}}) | Handle successful resource creation | After initial resource creation | + +## When to Use Extensions + +Extensions should be used when: + +- The generated code doesn't handle a specific Azure resource quirk +- Additional validation or logic is needed before/after ARM operations +- Custom error handling is required for specific scenarios +- Resources need special handling during creation, deletion, or import +- Secrets or configuration need custom export logic + +Extensions should **not** be used for: + +- Changes that could be made to the generator itself +- Logic that applies to all resources (consider modifying the controller instead) +- Working around bugs in the generator (fix the generator instead) + +## Development Guidelines + +1. **Keep extensions minimal**: Only add logic that cannot be handled by the generator +2. **Document thoroughly**: Explain why the extension is needed +3. **Type assert hub versions**: Include hub type assertions to catch breaking changes +4. **Handle errors gracefully**: Return appropriate error types and messages +5. **Test thoroughly**: Add unit tests for extension logic +6. **Call next**: Most extensions use a chain pattern - remember to call the `next` function + +## Related Resources + +- [Adding a new code-generated resource]({{< relref "../add-a-new-code-generated-resource" >}}) +- [Generator overview]({{< relref "../generator-overview" >}}) +- [Testing]({{< relref "../testing" >}}) diff --git a/docs/hugo/content/contributing/extension-points/arm-resource-modifier.md b/docs/hugo/content/contributing/extension-points/arm-resource-modifier.md new file mode 100644 index 00000000000..3c198570c26 --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/arm-resource-modifier.md @@ -0,0 +1,243 @@ +--- +title: ARMResourceModifier +linktitle: ARMResourceModifier +weight: 10 +--- + +## Description + +`ARMResourceModifier` allows resources to modify the payload that will be sent to Azure Resource Manager (ARM) immediately before it is transmitted. This extension point is invoked after the standard resource conversion and validation, but before the HTTP request is made to ARM. + +The interface is invoked during PUT and PATCH operations to ARM, giving the resource an opportunity to make last-minute adjustments to the ARM payload based on runtime conditions, Azure state, or complex business logic that cannot be expressed in the generated code. + +## Interface Definition + +```go +type ARMResourceModifier interface { + ModifyARMResource( + ctx context.Context, + armClient *genericarmclient.GenericClient, + armObj genruntime.ARMResource, + obj genruntime.ARMMetaObject, + kubeClient kubeclient.Client, + resolver *resolver.Resolver, + log logr.Logger, + ) (genruntime.ARMResource, error) +} +``` + +**Parameters:** +- `ctx`: The current operation context +- `armClient`: Client for making additional ARM API calls if needed +- `armObj`: The ARM resource representation about to be sent to Azure +- `obj`: The Kubernetes resource being reconciled +- `kubeClient`: Client for accessing the Kubernetes cluster +- `resolver`: Helper for resolving resource references +- `log`: Logger for the current operation + +**Returns:** +- Modified `genruntime.ARMResource` that will be sent to ARM +- Error if modification fails (will block the ARM request) + +## Motivation + +The `ARMResourceModifier` extension exists to handle cases where: + +1. **Azure resource state affects the payload**: Some Azure resources require different payloads based on their current state in Azure (e.g., creation vs. update) + +2. **Complex conditional logic**: Business logic that depends on multiple factors and cannot be expressed declaratively in the resource schema + +3. **Azure-specific quirks**: Handling special cases or undocumented Azure behavior that varies by resource type + +4. **Dynamic payload construction**: Building parts of the payload at runtime based on information retrieved from Azure or Kubernetes + +5. **Soft-delete scenarios**: Resources with soft-delete capabilities (like Key Vault) may need special handling to recover or purge existing resources + +## When to Use + +Implement `ARMResourceModifier` when: + +- ✅ The resource has different creation modes or requires conditional field population +- ✅ You need to query Azure or Kubernetes state to determine the correct payload +- ✅ The resource has soft-delete and requires recovery or purge logic +- ✅ Child resources need to be included in parent payloads (e.g., VNET subnets) +- ✅ Field values must be computed at reconciliation time based on external state + +Do **not** use `ARMResourceModifier` when: + +- ❌ The logic could be handled by field defaults or validation +- ❌ The change should apply to all resources (modify the generator instead) +- ❌ Simple field transformations (use conversion functions) +- ❌ The issue is a generator bug (fix the generator) + +## Example: Key Vault CreateMode Handling + +The Key Vault resource uses `ARMResourceModifier` to handle different creation modes based on whether a soft-deleted vault exists: + +```go +var _ extensions.ARMResourceModifier = &VaultExtension{} + +func (ex *VaultExtension) ModifyARMResource( + ctx context.Context, + armClient *genericarmclient.GenericClient, + armObj genruntime.ARMResource, + obj genruntime.ARMMetaObject, + kubeClient kubeclient.Client, + resolver *resolver.Resolver, + log logr.Logger, +) (genruntime.ARMResource, error) { + // Type assert to the specific resource type + kv, ok := obj.(*keyvault.Vault) + if !ok { + return nil, eris.Errorf( + "Cannot run VaultExtension.ModifyARMResource() with unexpected resource type %T", + obj) + } + + // Type assert hub version to catch breaking changes + var _ conversion.Hub = kv + + // Exit early if no special handling needed + if kv.Spec.Properties == nil || kv.Spec.Properties.CreateMode == nil { + return armObj, nil + } + + // Get resource context + id, err := ex.getOwner(ctx, kv, resolver) + if err != nil { + return nil, eris.Wrap(err, "failed to get and parse resource ID from KeyVault owner") + } + + // Create Azure SDK client to check for soft-deleted vaults + vc, err := armkeyvault.NewVaultsClient(id.SubscriptionID, armClient.Creds(), armClient.ClientOptions()) + if err != nil { + return nil, eris.Wrap(err, "failed to create new VaultsClient") + } + + // Determine the correct create mode based on Azure state + createMode := *kv.Spec.Properties.CreateMode + if createMode == CreateMode_CreateOrRecover { + // Check if soft-deleted vault exists and adjust createMode accordingly + createMode, err = ex.handleCreateOrRecover(ctx, kv, vc, id, log) + if err != nil { + return nil, eris.Wrapf(err, "error checking for existence of soft-deleted KeyVault") + } + } + + if createMode == CreateMode_PurgeThenCreate { + // Purge the soft-deleted vault before creating + err = ex.handlePurgeThenCreate(ctx, kv, vc, log) + if err != nil { + return nil, eris.Wrapf(err, "error purging soft-deleted KeyVault") + } + createMode = CreateMode_Default + } + + // Modify the ARM payload with the determined createMode + spec := armObj.Spec() + err = reflecthelpers.SetProperty(spec, "Properties.CreateMode", &createMode) + if err != nil { + return nil, eris.Wrapf(err, "error setting CreateMode to %s", createMode) + } + + return armObj, nil +} +``` + +**Key aspects of this example:** + +1. **Type assertions**: Both for the resource type and hub version +2. **Early exit**: Returns original payload if no modification needed +3. **External queries**: Checks Azure for soft-deleted vault state +4. **Conditional logic**: Different behavior based on createMode and vault state +5. **Payload modification**: Uses reflection to safely modify the ARM payload +6. **Error handling**: Returns detailed errors that prevent ARM submission + +## Common Patterns + +### Pattern 1: Querying Current Azure State + +```go +func (ex *ResourceExtension) ModifyARMResource(...) (genruntime.ARMResource, error) { + // Get the resource ID + resourceID, hasResourceID := genruntime.GetResourceID(obj) + if !hasResourceID { + // Not yet claimed, return unmodified + return armObj, nil + } + + // Query current state from Azure + raw := make(map[string]any) + _, err = armClient.GetByID(ctx, resourceID, apiVersion, &raw) + if err != nil { + // Handle NotFound appropriately + var responseError *azcore.ResponseError + if eris.As(err, &responseError) && responseError.StatusCode == http.StatusNotFound { + return armObj, nil + } + return nil, err + } + + // Use Azure state to modify payload + // ... + + return armObj, nil +} +``` + +### Pattern 2: Including Child Resources + +```go +func (ex *ResourceExtension) ModifyARMResource(...) (genruntime.ARMResource, error) { + // Get existing child resources from Azure + children, err := getRawChildCollection(raw, "subnets") + if err != nil { + return nil, eris.Wrapf(err, "failed to get child resources") + } + + // Merge with desired state in payload + err = setChildCollection(armObj.Spec(), children, "Subnets") + if err != nil { + return nil, eris.Wrapf(err, "failed to set child resources") + } + + return armObj, nil +} +``` + +### Pattern 3: Conditional Field Population + +```go +func (ex *ResourceExtension) ModifyARMResource(...) (genruntime.ARMResource, error) { + // Determine value based on runtime conditions + value, err := ex.computeValue(ctx, obj, armClient) + if err != nil { + return nil, err + } + + // Set the computed value on the ARM resource + spec := armObj.Spec() + err = reflecthelpers.SetProperty(spec, "Properties.FieldName", value) + if err != nil { + return nil, err + } + + return armObj, nil +} +``` + +## Testing + +When testing `ARMResourceModifier` extensions: + +1. **Test all code paths**: Cover each conditional branch +2. **Mock ARM responses**: Use envtest with recorded responses +3. **Verify payload changes**: Assert that the returned armObj has expected modifications +4. **Test error handling**: Ensure errors are properly classified and reported +5. **Test edge cases**: No ResourceID, resource not found, nil fields, etc. + +## Related Extension Points + +- [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}): For validation before ARM operations +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): For validation after ARM operations +- [ErrorClassifier]({{< relref "error-classifier" >}}): For handling errors from modified requests diff --git a/docs/hugo/content/contributing/extension-points/deleter.md b/docs/hugo/content/contributing/extension-points/deleter.md new file mode 100644 index 00000000000..6c50e513ed5 --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/deleter.md @@ -0,0 +1,345 @@ +--- +title: Deleter +linktitle: Deleter +weight: 30 +--- + +## Description + +`Deleter` allows resources to customize how the reconciler deletes them from Azure. This extension is invoked when a resource has a deletion timestamp in Kubernetes (indicating the user wants to delete it) and gives the resource control over the deletion process. + +The interface is called after Kubernetes marks the resource for deletion but before the standard ARM DELETE operation. This allows resources to perform cleanup, handle special deletion scenarios, or coordinate multiple deletion operations. + +## Interface Definition + +```go +type Deleter interface { + Delete( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, + next DeleteFunc, + ) (ctrl.Result, error) +} + +type DeleteFunc = func( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, +) (ctrl.Result, error) +``` + +**Parameters:** +- `ctx`: The current operation context +- `log`: Logger for the current operation +- `resolver`: Helper for resolving resource references +- `armClient`: Client for making ARM API calls +- `obj`: The Kubernetes resource being deleted +- `next`: The default deletion implementation to call + +**Returns:** +- `ctrl.Result`: Reconciliation result (e.g., requeue timing) +- `error`: Error if deletion fails (will prevent finalizer removal) + +## Motivation + +The `Deleter` extension exists to handle cases where: + +1. **Pre-deletion operations**: Resources that need to perform cleanup before being deleted from Azure (e.g., canceling subscriptions, disabling features) + +2. **Multi-step deletion**: Resources requiring multiple API calls in a specific order to delete properly + +3. **Dependent resource cleanup**: Resources that need to ensure dependent resources are handled before deletion + +4. **Soft-delete handling**: Resources with soft-delete capabilities that may need special deletion modes + +5. **Conditional deletion**: Resources that should skip Azure deletion under certain circumstances (e.g., externally managed resources) + +6. **Coordinated deletion**: Resources that need to coordinate with other Azure services during deletion + +## When to Use + +Implement `Deleter` when: + +- ✅ Pre-deletion operations must be performed (e.g., canceling, disabling) +- ✅ Multiple Azure API calls are needed for complete deletion +- ✅ Deletion order matters across related resources +- ✅ Custom error handling is needed during deletion +- ✅ Soft-delete or purge operations require special logic +- ✅ The resource should be preserved in Azure in some scenarios + +Do **not** use `Deleter` when: + +- ❌ The standard DELETE operation works correctly +- ❌ You only need to clean up Kubernetes resources (use finalizers) +- ❌ The logic should apply to all resources (modify the controller) +- ❌ You're working around an Azure API bug (fix/report the bug) + +## Example: Subscription Alias Deletion + +The Subscription Alias resource uses `Deleter` to cancel the subscription before deleting the alias: + +```go +var _ extensions.Deleter = &AliasExtension{} + +func (extension *AliasExtension) Delete( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, + next extensions.DeleteFunc, +) (ctrl.Result, error) { + // Type assert to the specific resource type + typedObj, ok := obj.(*storage.Alias) + if !ok { + return ctrl.Result{}, eris.Errorf( + "cannot run on unknown resource type %T, expected *subscription.Alias", + obj) + } + + // Type assert hub version to catch breaking changes + var _ conversion.Hub = typedObj + + // Get the subscription ID from the alias status + subscriptionID, ok := getSubscriptionID(typedObj) + if !ok { + // SubscriptionID isn't populated, skip cancellation and proceed with deletion + log.V(Status).Info("No subscription ID found, proceeding with alias deletion") + return next(ctx, log, resolver, armClient, obj) + } + + // Create Azure SDK client for subscription operations + subscriptionClient, err := armsubscription.NewClient( + armClient.Creds(), + armClient.ClientOptions()) + if err != nil { + return ctrl.Result{}, eris.Wrapf(err, "failed to create subscription client") + } + + // Cancel the subscription before deleting the alias + log.V(Status).Info("Canceling subscription", "subscriptionId", subscriptionID) + _, err = subscriptionClient.Cancel(ctx, subscriptionID, nil) + if err != nil { + return ctrl.Result{}, eris.Wrapf(err, "failed to cancel subscription %q", subscriptionID) + } + + log.V(Status).Info("Subscription canceled, proceeding with alias deletion") + + // Now perform the standard deletion of the alias + return next(ctx, log, resolver, armClient, obj) +} +``` + +**Key aspects of this example:** + +1. **Type assertions**: For both resource type and hub version +2. **Conditional logic**: Checks if subscription ID is available +3. **Pre-deletion operation**: Cancels subscription before deleting alias +4. **Error handling**: Returns errors that prevent finalizer removal +5. **Chain pattern**: Calls `next()` to perform standard deletion +6. **Logging**: Clear logging of each step for debugging + +## Common Patterns + +### Pattern 1: Simple Pre-deletion Operation + +```go +func (ex *ResourceExtension) Delete( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, + next extensions.DeleteFunc, +) (ctrl.Result, error) { + resource := obj.(*myservice.MyResource) + + // Perform cleanup operation + log.V(Status).Info("Performing pre-deletion cleanup") + if err := ex.performCleanup(ctx, resource, armClient); err != nil { + return ctrl.Result{}, eris.Wrap(err, "cleanup failed") + } + + // Proceed with standard deletion + return next(ctx, log, resolver, armClient, obj) +} +``` + +### Pattern 2: Conditional Deletion + +```go +func (ex *ResourceExtension) Delete( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, + next extensions.DeleteFunc, +) (ctrl.Result, error) { + resource := obj.(*myservice.MyResource) + + // Check if resource should be preserved in Azure + if ex.shouldPreserve(resource) { + log.V(Status).Info("Skipping Azure deletion, resource marked for preservation") + // Return success without calling next() - finalizer will be removed + return ctrl.Result{}, nil + } + + // Proceed with normal deletion + return next(ctx, log, resolver, armClient, obj) +} +``` + +### Pattern 3: Multi-step Deletion with Retry + +```go +func (ex *ResourceExtension) Delete( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, + next extensions.DeleteFunc, +) (ctrl.Result, error) { + resource := obj.(*myservice.MyResource) + + // Step 1: Disable the resource + if !ex.isDisabled(resource) { + log.V(Status).Info("Disabling resource before deletion") + if err := ex.disableResource(ctx, resource, armClient); err != nil { + return ctrl.Result{}, eris.Wrap(err, "failed to disable resource") + } + // Requeue to wait for disable to complete + return ctrl.Result{RequeueAfter: 30 * time.Second}, nil + } + + // Step 2: Wait for dependent resources to be cleaned up + if ex.hasDependents(ctx, resource) { + log.V(Status).Info("Waiting for dependent resources to be deleted") + return ctrl.Result{RequeueAfter: 10 * time.Second}, nil + } + + // Step 3: Proceed with deletion + log.V(Status).Info("All prerequisites met, proceeding with deletion") + return next(ctx, log, resolver, armClient, obj) +} +``` + +### Pattern 4: Soft Delete with Purge Option + +```go +func (ex *ResourceExtension) Delete( + ctx context.Context, + log logr.Logger, + resolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + obj genruntime.ARMMetaObject, + next extensions.DeleteFunc, +) (ctrl.Result, error) { + resource := obj.(*myservice.MyResource) + + // Perform standard deletion (moves to soft-deleted state) + result, err := next(ctx, log, resolver, armClient, obj) + if err != nil { + return result, err + } + + // If purge is requested, purge the soft-deleted resource + if resource.Spec.DeleteMode != nil && *resource.Spec.DeleteMode == "Purge" { + log.V(Status).Info("Purging soft-deleted resource") + if err := ex.purgeResource(ctx, resource, armClient); err != nil { + return ctrl.Result{}, eris.Wrap(err, "failed to purge resource") + } + } + + return ctrl.Result{}, nil +} +``` + +## Deletion Lifecycle + +Understanding the deletion process: + +1. **User deletes resource**: `kubectl delete` sets deletion timestamp +2. **Finalizer blocks deletion**: ASO finalizer prevents immediate removal from Kubernetes +3. **Deleter invoked**: Custom `Delete()` method is called +4. **Pre-deletion logic**: Extension performs custom operations +5. **Standard deletion**: `next()` sends DELETE to ARM +6. **ARM deletion completes**: Azure resource is removed +7. **Finalizer removed**: Kubernetes removes the resource + +If any step fails, the process pauses and will retry on the next reconciliation. + +## Error Handling + +Proper error handling in deleters is critical: + +```go +// Transient error - will retry +return ctrl.Result{}, eris.Wrap(err, "temporary failure") + +// Permanent error with condition +return ctrl.Result{}, conditions.NewReadyConditionImpactingError( + err, + conditions.ConditionSeverityError, + conditions.ReasonFailed) + +// Requeue for later retry +return ctrl.Result{RequeueAfter: 1 * time.Minute}, nil + +// Success +return ctrl.Result{}, nil +``` + +## Testing + +When testing `Deleter` extensions: + +1. **Test successful deletion**: Verify the happy path works +2. **Test pre-deletion operations**: Ensure cleanup logic executes +3. **Test error scenarios**: Verify error handling prevents finalizer removal +4. **Test idempotency**: Multiple calls should be safe +5. **Test conditional paths**: Cover all branching logic +6. **Test requeue behavior**: Verify multi-step deletions requeue correctly + +Example test structure: + +```go +func TestMyResourceExtension_Delete(t *testing.T) { + t.Run("successful deletion with cleanup", func(t *testing.T) { + // Test full deletion flow + }) + + t.Run("cleanup fails blocks deletion", func(t *testing.T) { + // Test error handling + }) + + t.Run("conditional preservation", func(t *testing.T) { + // Test skip deletion logic + }) + + t.Run("multi-step deletion", func(t *testing.T) { + // Test requeue behavior + }) +} +``` + +## Important Notes + +- **Always call `next()` unless**: You have a very specific reason to skip Azure deletion +- **Handle missing IDs gracefully**: Resource might not have been created in Azure yet +- **Return appropriate Results**: Use `RequeueAfter` for async operations +- **Log clearly**: Deletion issues are hard to debug, good logging helps +- **Be idempotent**: Deletion might be called multiple times +- **Don't leak resources**: Ensure Azure resources are eventually deleted + +## Related Extension Points + +- [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}): Validate before operations +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Validate after operations +- [SuccessfulCreationHandler]({{< relref "successful-creation-handler" >}}): Handle successful creation diff --git a/docs/hugo/content/contributing/extension-points/error-classifier.md b/docs/hugo/content/contributing/extension-points/error-classifier.md new file mode 100644 index 00000000000..b21a368437c --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/error-classifier.md @@ -0,0 +1,384 @@ +--- +title: ErrorClassifier +linktitle: ErrorClassifier +weight: 40 +--- + +## Description + +`ErrorClassifier` allows resources to customize how the reconciler classifies and responds to errors returned by Azure Resource Manager (ARM). This extension is invoked whenever an ARM API call returns an error, giving the resource a chance to determine whether the error is retryable, fatal, or requires special handling. + +The interface is called in the error handling path of all ARM operations (GET, PUT, PATCH, DELETE). Proper error classification is critical for determining reconciliation behavior and user feedback. + +## Interface Definition + +```go +type ErrorClassifier interface { + ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next ErrorClassifierFunc, + ) (core.CloudErrorDetails, error) +} + +type ErrorClassifierFunc func( + cloudError *genericarmclient.CloudError, +) (core.CloudErrorDetails, error) +``` + +**Parameters:** +- `cloudError`: The error returned from ARM, including HTTP status, code, and message +- `apiVersion`: The ARM API version used for the request +- `log`: Logger for the current operation +- `next`: The default error classification function + +**Returns:** +- `core.CloudErrorDetails`: Structured error information including classification +- `error`: Error if classification itself fails + +The `CloudErrorDetails` structure includes: +- `Classification`: Fatal, Retryable, or other classifications +- `Code`: Error code from Azure +- `Message`: Human-readable error message +- `Retry`: Whether the operation should be retried + +## Motivation + +The `ErrorClassifier` extension exists to handle cases where: + +1. **Resource-specific errors**: Some Azure resources return unique error codes that require special handling + +2. **Retryable conditions**: Certain errors that appear permanent are actually transient for specific resources + +3. **Better user feedback**: Providing more context or clearer messages for resource-specific errors + +4. **API version differences**: Error behavior may vary across API versions for the same resource + +5. **Conditional retry logic**: Some errors are retryable under certain conditions but fatal under others + +The default error classifier handles common patterns, but some resources have unique error behaviors that need custom classification. + +## When to Use + +Implement `ErrorClassifier` when: + +- ✅ A resource returns specific error codes that need special handling +- ✅ Transient errors are being classified as fatal (or vice versa) +- ✅ Error messages need resource-specific clarification +- ✅ Certain errors should trigger different retry behavior +- ✅ API version affects error classification +- ✅ User feedback for errors needs improvement + +Do **not** use `ErrorClassifier` when: + +- ❌ The default classification handles the error correctly +- ❌ The error handling should apply to all resources (modify default classifier) +- ❌ The issue is with ARM itself (report to Azure) +- ❌ You want to suppress valid errors (fix the root cause instead) + +## Example: DNS Zone Record Error Classification + +DNS Zone records can encounter specific transient errors during provisioning that should be retried: + +```go +var _ extensions.ErrorClassifier = &DnsZonesARecordExtension{} + +func (extension *DnsZonesARecordExtension) ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next extensions.ErrorClassifierFunc, +) (core.CloudErrorDetails, error) { + // First, call the default classifier + details, err := next(cloudError) + if err != nil { + return core.CloudErrorDetails{}, err + } + + // Check for DNS-specific retryable errors + if isRetryableDNSZoneRecordError(cloudError) { + log.V(Status).Info( + "Classifying DNS zone record error as retryable", + "Code", cloudError.Code(), + "Message", cloudError.Message()) + + // Override the classification to make it retryable + details.Classification = core.ErrorRetryable + } + + return details, nil +} + +// Helper function to identify retryable DNS errors +func isRetryableDNSZoneRecordError(cloudError *genericarmclient.CloudError) bool { + // DNS zones may temporarily fail during propagation + code := cloudError.Code() + return code == "DnsRecordInGracePeriod" || + code == "DnsZoneNotReady" || + code == "NameServerNotReady" +} +``` + +**Key aspects of this example:** + +1. **Delegation to default**: Calls `next()` first to get standard classification +2. **Selective override**: Only modifies classification for specific errors +3. **Logging**: Records the classification decision for debugging +4. **Helper function**: Encapsulates the error detection logic +5. **Error propagation**: Returns errors from `next()` without modification + +## Common Patterns + +### Pattern 1: Marking Specific Errors as Retryable + +```go +func (ex *ResourceExtension) ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next extensions.ErrorClassifierFunc, +) (core.CloudErrorDetails, error) { + // Get default classification + details, err := next(cloudError) + if err != nil { + return core.CloudErrorDetails{}, err + } + + // Check for resource-specific transient errors + if ex.isTransientError(cloudError) { + details.Classification = core.ErrorRetryable + details.Retry = true + log.V(Status).Info("Marking error as retryable", "code", cloudError.Code()) + } + + return details, nil +} +``` + +### Pattern 2: API Version-Specific Classification + +```go +func (ex *ResourceExtension) ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next extensions.ErrorClassifierFunc, +) (core.CloudErrorDetails, error) { + details, err := next(cloudError) + if err != nil { + return core.CloudErrorDetails{}, err + } + + // Certain errors behave differently in older API versions + if apiVersion == "2021-01-01" && cloudError.Code() == "ResourceQuotaExceeded" { + // In this API version, quota errors are temporary during provisioning + details.Classification = core.ErrorRetryable + details.Message = "Resource quota temporarily exceeded during provisioning, will retry" + } + + return details, nil +} +``` + +### Pattern 3: Enhanced Error Messages + +```go +func (ex *ResourceExtension) ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next extensions.ErrorClassifierFunc, +) (core.CloudErrorDetails, error) { + details, err := next(cloudError) + if err != nil { + return core.CloudErrorDetails{}, err + } + + // Provide more context for common user errors + if cloudError.Code() == "InvalidParameterValue" { + if strings.Contains(cloudError.Message(), "sku") { + details.Message = fmt.Sprintf( + "Invalid SKU specified: %s. "+ + "Available SKUs for this resource: %s. "+ + "See documentation: https://docs.microsoft.com/...", + extractSKU(cloudError.Message()), + strings.Join(ex.getValidSKUs(), ", ")) + } + } + + return details, nil +} +``` + +### Pattern 4: Conditional Fatal Classification + +```go +func (ex *ResourceExtension) ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next extensions.ErrorClassifierFunc, +) (core.CloudErrorDetails, error) { + details, err := next(cloudError) + if err != nil { + return core.CloudErrorDetails{}, err + } + + // Some configuration errors are unrecoverable + if cloudError.Code() == "InvalidConfiguration" { + details.Classification = core.ErrorFatal + details.Retry = false + details.Message = fmt.Sprintf( + "Configuration error: %s. This requires manual intervention.", + cloudError.Message()) + } + + return details, nil +} +``` + +### Pattern 5: Multiple Error Conditions + +```go +func (ex *ResourceExtension) ClassifyError( + cloudError *genericarmclient.CloudError, + apiVersion string, + log logr.Logger, + next extensions.ErrorClassifierFunc, +) (core.CloudErrorDetails, error) { + details, err := next(cloudError) + if err != nil { + return core.CloudErrorDetails{}, err + } + + code := cloudError.Code() + + switch { + case isTransientNetworkError(code): + details.Classification = core.ErrorRetryable + details.Retry = true + case isQuotaError(code): + details.Classification = core.ErrorFatal + details.Message = "Quota exceeded. Please request a quota increase." + case isAuthorizationError(code): + details.Classification = core.ErrorFatal + details.Message = "Insufficient permissions. Check service principal roles." + default: + // Keep default classification + } + + return details, nil +} +``` + +## Error Classifications + +The `core` package defines several error classifications: + +- **`ErrorRetryable`**: The operation should be retried after a delay +- **`ErrorFatal`**: The operation cannot succeed without user intervention +- **`ErrorUnknown`**: Classification uncertain, treated conservatively + +The controller uses these classifications to determine: +- Whether to requeue the reconciliation +- What condition to set on the resource +- How long to wait before retrying +- Whether to emit events + +## Testing + +When testing `ErrorClassifier` extensions: + +1. **Test default pass-through**: Verify normal errors aren't affected +2. **Test specific error codes**: Cover all custom classification logic +3. **Test API version variations**: If version-specific, test all versions +4. **Test error message enhancement**: Verify improved user messages +5. **Test classification changes**: Assert correct classification results + +Example test structure: + +```go +func TestMyResourceExtension_ClassifyError(t *testing.T) { + t.Run("default classification unchanged", func(t *testing.T) { + // Test that unrecognized errors pass through + }) + + t.Run("transient error marked retryable", func(t *testing.T) { + // Test custom retryable classification + }) + + t.Run("quota error marked fatal", func(t *testing.T) { + // Test fatal classification + }) + + t.Run("enhanced error message", func(t *testing.T) { + // Test message improvements + }) + + t.Run("api version specific handling", func(t *testing.T) { + // Test version-specific logic + }) +} +``` + +Example test implementation: + +```go +func TestDnsZonesARecordExtension_ClassifyError(t *testing.T) { + extension := &DnsZonesARecordExtension{} + + // Create a mock default classifier + defaultClassifier := func(err *genericarmclient.CloudError) (core.CloudErrorDetails, error) { + return core.CloudErrorDetails{ + Classification: core.ErrorFatal, + Code: err.Code(), + Message: err.Message(), + }, nil + } + + t.Run("DnsRecordInGracePeriod is retryable", func(t *testing.T) { + cloudError := &genericarmclient.CloudError{ + Code: "DnsRecordInGracePeriod", + Message: "DNS record is in grace period", + } + + details, err := extension.ClassifyError( + cloudError, + "2023-01-01", + logr.Discard(), + defaultClassifier) + + assert.NoError(t, err) + assert.Equal(t, core.ErrorRetryable, details.Classification) + }) +} +``` + +## Important Notes + +- **Always call `next()` first**: This ensures default classification logic runs +- **Be conservative**: When in doubt, prefer retryable over fatal +- **Document error codes**: Comment which Azure error codes you're handling +- **Log decisions**: Help debugging by logging classification changes +- **Consider idempotency**: Retryable errors will cause repeated operations +- **Test thoroughly**: Incorrect classification can cause user frustration + +## Common Azure Error Codes + +Some error codes you might encounter: + +- `ResourceNotFound`: Resource doesn't exist (usually retryable for dependencies) +- `ResourceGroupNotFound`: Parent resource group missing +- `InvalidParameterValue`: Configuration error (usually fatal) +- `QuotaExceeded`: Subscription limits reached (fatal) +- `InternalServerError`: Azure service issue (retryable) +- `TooManyRequests`: Rate limiting (retryable with backoff) +- `Conflict`: Concurrent operation conflict (retryable) + +## Related Extension Points + +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Modify requests before they can error +- [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}): Prevent operations that would error +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Validate after operations +- [Deleter]({{< relref "deleter" >}}): Custom deletion with error handling diff --git a/docs/hugo/content/contributing/extension-points/importer.md b/docs/hugo/content/contributing/extension-points/importer.md new file mode 100644 index 00000000000..f2c8530c60a --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/importer.md @@ -0,0 +1,396 @@ +--- +title: Importer +linktitle: Importer +weight: 50 +--- + +## Description + +`Importer` allows resources to customize their behavior during the import process when using `asoctl import`. This extension is invoked when importing existing Azure resources into Kubernetes as ASO resources, giving resources the ability to skip import, modify the imported resource, or perform additional validation. + +The interface is called during the import workflow, after retrieving the resource from Azure but before writing it to Kubernetes. This allows resources to filter out unwanted resources or adjust the imported representation. + +## Interface Definition + +```go +type Importer interface { + Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next ImporterFunc, + ) (ImportResult, error) +} + +type ImporterFunc func( + ctx context.Context, + resource genruntime.ImportableResource, + owner *genruntime.ResourceReference, +) (ImportResult, error) + +type ImportResult struct { + because string +} + +// Helper functions for creating results +func ImportSucceeded() ImportResult +func ImportSkipped(because string) ImportResult +``` + +**Parameters:** +- `ctx`: The current operation context +- `rsrc`: The resource being imported +- `owner`: Optional owner reference for the resource +- `next`: The default import implementation + +**Returns:** +- `ImportResult`: Indicates success or skip with optional reason +- `error`: Error if import fails + +The `ImportResult` can indicate: +- **Success**: Resource should be imported (`ImportSucceeded()`) +- **Skipped**: Resource should not be imported (`ImportSkipped(reason)`) + +## Motivation + +The `Importer` extension exists to handle cases where: + +1. **System-managed resources**: Resources that are created/managed by Azure and shouldn't be imported + +2. **Default values**: Resources that only have default settings and don't need to be managed + +3. **Read-only resources**: Resources that can't be modified after creation + +4. **Filtering criteria**: Resources that don't meet certain criteria for management + +5. **Import validation**: Resources that need validation before allowing import + +6. **Resource transformation**: Adjusting the imported resource to fit Kubernetes conventions + +Many Azure services automatically create child resources or configurations. These often shouldn't be managed by the operator as they're managed by Azure. + +## When to Use + +Implement `Importer` when: + +- ✅ System-managed resources should be excluded from import +- ✅ Default/empty configurations don't need management +- ✅ Read-only resources can't be managed via ASO +- ✅ Import should be conditional based on resource properties +- ✅ Imported resources need validation or transformation +- ✅ Certain resource states should prevent import + +Do **not** use `Importer` when: + +- ❌ All resources of the type should be imported +- ❌ The filtering logic applies to all resources (add to asoctl) +- ❌ You want to modify how resources are retrieved from Azure (that's not import) +- ❌ You're trying to fix generator issues (fix the generator) + +## Example: MySQL Configuration Import Filtering + +The MySQL FlexibleServersConfiguration resource uses `Importer` to skip system-managed configurations: + +```go +var _ extensions.Importer = &FlexibleServersConfigurationExtension{} + +func (extension *FlexibleServersConfigurationExtension) Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next extensions.ImporterFunc, +) (extensions.ImportResult, error) { + // Call the default import logic first + result, err := next(ctx, rsrc, owner) + if err != nil { + return extensions.ImportResult{}, err + } + + // Type assert to the specific resource type + config, ok := rsrc.(*api.FlexibleServersConfiguration) + if !ok { + // If it's not our type, just return the default result + return result, nil + } + + // Skip system-managed default configurations + if config.Spec.Source != nil && + *config.Spec.Source == "system-default" { + return extensions.ImportSkipped("system-defaults don't need to be imported"), nil + } + + // Skip read-only configurations that can't be modified + if config.Status.IsReadOnly != nil && + *config.Status.IsReadOnly == api.ConfigurationProperties_IsReadOnly_STATUS_True { + return extensions.ImportSkipped("readonly configuration can't be set"), nil + } + + // Skip configurations that match default values + if config.Status.DefaultValue != nil && + config.Status.Value != nil && + *config.Status.DefaultValue == *config.Status.Value { + return extensions.ImportSkipped("default value is the same as the current value"), nil + } + + // Import this configuration + return result, nil +} +``` + +**Key aspects of this example:** + +1. **Chain pattern**: Calls `next()` first to get the imported resource +2. **Type checking**: Safely handles being called on wrong resource type +3. **Multiple filters**: Several conditions that trigger skip +4. **Clear reasons**: Each skip includes explanation for user feedback +5. **Default behavior**: Returns original result when no filters apply + +## Common Patterns + +### Pattern 1: Skip System-Managed Resources + +```go +func (ex *ResourceExtension) Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next extensions.ImporterFunc, +) (extensions.ImportResult, error) { + result, err := next(ctx, rsrc, owner) + if err != nil { + return extensions.ImportResult{}, err + } + + resource := rsrc.(*myservice.MyResource) + + // Skip resources managed by the system + if resource.Spec.ManagedBy != nil && *resource.Spec.ManagedBy == "System" { + return extensions.ImportSkipped("system-managed resources shouldn't be imported"), nil + } + + return result, nil +} +``` + +### Pattern 2: Validate Before Import + +```go +func (ex *ResourceExtension) Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next extensions.ImporterFunc, +) (extensions.ImportResult, error) { + resource := rsrc.(*myservice.MyResource) + + // Validate resource before importing + if err := ex.validateForImport(resource); err != nil { + return extensions.ImportResult{}, eris.Wrapf(err, + "resource %s failed import validation", resource.Name) + } + + // Proceed with import + return next(ctx, rsrc, owner) +} +``` + +### Pattern 3: Skip Based on Property Values + +```go +func (ex *ResourceExtension) Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next extensions.ImporterFunc, +) (extensions.ImportResult, error) { + result, err := next(ctx, rsrc, owner) + if err != nil { + return extensions.ImportResult{}, err + } + + resource := rsrc.(*myservice.MyResource) + + // Skip resources in certain states + if resource.Status.ProvisioningState != nil { + state := *resource.Status.ProvisioningState + if state == "Deleting" || state == "Failed" { + return extensions.ImportSkipped( + fmt.Sprintf("resource in %s state shouldn't be imported", state)), nil + } + } + + return result, nil +} +``` + +### Pattern 4: Transform Before Import + +```go +func (ex *ResourceExtension) Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next extensions.ImporterFunc, +) (extensions.ImportResult, error) { + resource := rsrc.(*myservice.MyResource) + + // Modify the resource before importing + // For example, clear fields that shouldn't be managed + if resource.Spec.AutoGeneratedField != nil { + resource.Spec.AutoGeneratedField = nil + } + + // Proceed with import of modified resource + return next(ctx, rsrc, owner) +} +``` + +### Pattern 5: Conditional Import Based on Tags + +```go +func (ex *ResourceExtension) Import( + ctx context.Context, + rsrc genruntime.ImportableResource, + owner *genruntime.ResourceReference, + next extensions.ImporterFunc, +) (extensions.ImportResult, error) { + result, err := next(ctx, rsrc, owner) + if err != nil { + return extensions.ImportResult{}, err + } + + resource := rsrc.(*myservice.MyResource) + + // Skip resources with specific tags + if resource.Spec.Tags != nil { + if managed, ok := resource.Spec.Tags["managedBy"]; ok && managed == "external" { + return extensions.ImportSkipped("resource tagged as externally managed"), nil + } + } + + return result, nil +} +``` + +## Import Workflow + +Understanding the import process: + +1. **Resource discovery**: `asoctl import` queries Azure for resources +2. **Resource retrieval**: Full resource details fetched from Azure +3. **Conversion**: Azure representation converted to ASO format +4. **Importer invoked**: Custom `Import()` method called +5. **Skip or continue**: Based on result, resource is written to YAML or skipped +6. **User notification**: Skipped resources reported to user with reasons + +## Skip vs. Error + +It's important to distinguish between skipping and errors: + +- **Skip (`ImportSkipped`)**: Resource should not be imported, but import continues + - Used for: System resources, defaults, read-only items + - Effect: Resource excluded from output, reason logged + - User sees: "Skipped N resources" with reasons + +- **Error (`return error`)**: Import process failed + - Used for: Validation failures, unexpected states, bugs + - Effect: Import may abort or continue based on error handling + - User sees: Error message, stack trace + +## Testing + +When testing `Importer` extensions: + +1. **Test successful import**: Verify normal resources import correctly +2. **Test skip conditions**: Cover all skip scenarios with reasons +3. **Test error cases**: Verify proper error handling +4. **Test transformation**: If modifying resources, verify changes +5. **Test type safety**: Ensure handling of unexpected types + +Example test structure: + +```go +func TestFlexibleServersConfigurationExtension_Import(t *testing.T) { + t.Run("imports user-defined configuration", func(t *testing.T) { + // Test normal import succeeds + }) + + t.Run("skips system-default configuration", func(t *testing.T) { + // Test system-default is skipped + }) + + t.Run("skips readonly configuration", func(t *testing.T) { + // Test readonly is skipped + }) + + t.Run("skips default value configuration", func(t *testing.T) { + // Test default values are skipped + }) +} +``` + +Example test implementation: + +```go +func TestFlexibleServersConfigurationExtension_Import(t *testing.T) { + extension := &FlexibleServersConfigurationExtension{} + + // Create a mock next function that returns success + next := func(ctx context.Context, rsrc genruntime.ImportableResource, owner *genruntime.ResourceReference) (extensions.ImportResult, error) { + return extensions.ImportSucceeded(), nil + } + + t.Run("skips system-default", func(t *testing.T) { + source := "system-default" + config := &api.FlexibleServersConfiguration{ + Spec: api.FlexibleServersConfiguration_Spec{ + Source: &source, + }, + } + + result, err := extension.Import(context.Background(), config, nil, next) + + assert.NoError(t, err) + reason, skipped := result.Skipped() + assert.True(t, skipped) + assert.Contains(t, reason, "system-default") + }) +} +``` + +## User Experience + +When resources are skipped during import, users see output like: + +```bash +$ asoctl import azure-resource myserver --output yaml +Imported 12 resources +Skipped 5 resources: + - myserver-config-1: system-defaults don't need to be imported + - myserver-config-2: readonly configuration can't be set + - myserver-config-3: default value is the same as the current value + ... +``` + +Clear, informative skip reasons help users understand why certain resources weren't imported. + +## Important Notes + +- **Call `next()` first**: Allows default import logic to run +- **Provide clear reasons**: Skip reasons are shown to users +- **Be consistent**: Similar resources should skip for similar reasons +- **Don't skip too broadly**: Only skip resources that truly shouldn't be managed +- **Document skip logic**: Comment why specific conditions trigger skips +- **Test with real imports**: Verify skips work in practice with `asoctl import` + +## Related Extension Points + +- [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}): Validate before reconciliation +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Validate after reconciliation +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Modify before sending to Azure + +## Related Tools + +- **asoctl import**: The CLI tool that uses the Importer extension +- **Resource specifications**: Define what makes a resource importable +- **Generator**: Creates the base ImportableResource implementation diff --git a/docs/hugo/content/contributing/extension-points/kubernetes-secret-exporter.md b/docs/hugo/content/contributing/extension-points/kubernetes-secret-exporter.md new file mode 100644 index 00000000000..bcf53021b8f --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/kubernetes-secret-exporter.md @@ -0,0 +1,466 @@ +--- +title: KubernetesSecretExporter +linktitle: KubernetesSecretExporter +weight: 60 +--- + +## Description + +`KubernetesSecretExporter` allows resources to export secrets from Azure into Kubernetes Secret objects. This extension is invoked after a resource has been successfully created or updated in Azure, giving the resource the ability to retrieve sensitive data (like connection strings, keys, passwords) and make them available in Kubernetes. + +The interface is called during the reconciliation process, after ARM operations succeed but before the Ready condition is marked successful. This ensures secrets are available before dependent resources can use them. + +## Interface Definition + +```go +type KubernetesSecretExporter interface { + ExportKubernetesSecrets( + ctx context.Context, + obj MetaObject, + additionalSecrets set.Set[string], + armClient *genericarmclient.GenericClient, + log logr.Logger, + ) (*KubernetesSecretExportResult, error) +} + +type KubernetesSecretExportResult struct { + // Objs is the set of Secret objects to create/update in Kubernetes + Objs []client.Object + + // RawSecrets contains raw secret values for use in secret expressions + RawSecrets map[string]string +} +``` + +**Parameters:** +- `ctx`: The current operation context +- `obj`: The Kubernetes resource that owns the secrets +- `additionalSecrets`: Set of secret names to retrieve (for secret expressions) +- `armClient`: Client for making ARM API calls to retrieve secrets +- `log`: Logger for the current operation + +**Returns:** +- `KubernetesSecretExportResult`: Contains Secret objects to create and raw secret values +- `error`: Error if secret export fails (will set condition on resource) + +## Motivation + +The `KubernetesSecretExporter` extension exists to handle cases where: + +1. **Azure-generated secrets**: Resources that generate secrets in Azure (keys, connection strings, passwords) that need to be accessible in Kubernetes + +2. **Status-based secrets**: Secret values that are part of the resource's status and should be exported to Secrets + +3. **API-retrieved secrets**: Secrets that require separate API calls to retrieve (not in the resource response) + +4. **Derived secrets**: Secret values that need to be computed or combined from Azure data + +5. **User-controlled export**: Users can specify which secrets they want exported via `operatorSpec.secrets` + +Many Azure resources generate credentials, keys, or connection strings that applications need. Rather than requiring users to manually retrieve these from Azure, ASO can export them directly to Kubernetes Secrets. + +## When to Use + +Implement `KubernetesSecretExporter` when: + +- ✅ The resource generates secrets in Azure (keys, passwords, tokens) +- ✅ Secret values are available in the resource status +- ✅ Additional ARM API calls are needed to retrieve secrets +- ✅ Users need programmatic access to resource credentials +- ✅ Secrets need to be made available to other Kubernetes resources + +Do **not** use `KubernetesSecretExporter` when: + +- ❌ The resource doesn't have any secrets to export +- ❌ Secrets are already available through standard Kubernetes mechanisms +- ❌ You're exposing non-sensitive configuration (use ConfigMaps instead) +- ❌ The generator can handle it automatically (enhance the generator) + +## Example: User Assigned Identity Secret Export + +The UserAssignedIdentity resource exports identity IDs as secrets: + +```go +var _ genruntime.KubernetesSecretExporter = &UserAssignedIdentityExtension{} + +func (ext *UserAssignedIdentityExtension) ExportKubernetesSecrets( + ctx context.Context, + obj genruntime.MetaObject, + _ set.Set[string], + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (*genruntime.KubernetesSecretExportResult, error) { + // Type assert to the specific resource type + typedObj, ok := obj.(*v20230131s.UserAssignedIdentity) + if !ok { + return nil, fmt.Errorf( + "cannot run on unknown resource type %T, expected *v20230131s.UserAssignedIdentity", + obj) + } + + // Type assert hub version to catch breaking changes + var _ conversion.Hub = typedObj + + // Check if user requested any secrets + hasSecrets := secretsSpecified(typedObj) + if !hasSecrets { + log.V(Debug).Info("No secrets retrieval to perform as operatorSpec.Secrets is empty") + return nil, nil + } + + // Create a collector to gather secret values + collector := secrets.NewCollector(typedObj.Namespace) + + // Add each requested secret from status + if typedObj.Spec.OperatorSpec != nil && typedObj.Spec.OperatorSpec.Secrets != nil { + collector.AddValue( + typedObj.Spec.OperatorSpec.Secrets.ClientId, + to.Value(typedObj.Status.ClientId)) + collector.AddValue( + typedObj.Spec.OperatorSpec.Secrets.PrincipalId, + to.Value(typedObj.Status.PrincipalId)) + collector.AddValue( + typedObj.Spec.OperatorSpec.Secrets.TenantId, + to.Value(typedObj.Status.TenantId)) + } + + // Convert collected values to Secret objects + result, err := collector.Values() + if err != nil { + return nil, err + } + + return &genruntime.KubernetesSecretExportResult{ + Objs: secrets.SliceToClientObjectSlice(result), + }, nil +} + +// Helper function to check if secrets were requested +func secretsSpecified(obj *v20230131s.UserAssignedIdentity) bool { + if obj.Spec.OperatorSpec == nil || obj.Spec.OperatorSpec.Secrets == nil { + return false + } + + specSecrets := obj.Spec.OperatorSpec.Secrets + return specSecrets.ClientId != nil || + specSecrets.PrincipalId != nil || + specSecrets.TenantId != nil +} +``` + +**Key aspects of this example:** + +1. **Early exit**: Returns nil if no secrets were requested +2. **Type safety**: Type assertions for resource and hub version +3. **Collector pattern**: Uses secrets.Collector to build Secret objects +4. **Status values**: Retrieves values from resource status +5. **User control**: Only exports secrets the user specified in operatorSpec +6. **Namespace scoping**: Secrets created in the same namespace as the resource + +## Common Patterns + +### Pattern 1: Export Status-Based Secrets + +```go +func (ex *ResourceExtension) ExportKubernetesSecrets( + ctx context.Context, + obj genruntime.MetaObject, + additionalSecrets set.Set[string], + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (*genruntime.KubernetesSecretExportResult, error) { + resource := obj.(*myservice.MyResource) + + // Check if secrets were requested + if resource.Spec.OperatorSpec == nil || + resource.Spec.OperatorSpec.Secrets == nil { + return nil, nil + } + + // Create collector + collector := secrets.NewCollector(resource.Namespace) + + // Add secrets from status + secrets := resource.Spec.OperatorSpec.Secrets + collector.AddValue(secrets.ConnectionString, resource.Status.ConnectionString) + collector.AddValue(secrets.PrimaryKey, resource.Status.PrimaryKey) + collector.AddValue(secrets.SecondaryKey, resource.Status.SecondaryKey) + + // Build Secret objects + result, err := collector.Values() + if err != nil { + return nil, err + } + + return &genruntime.KubernetesSecretExportResult{ + Objs: secrets.SliceToClientObjectSlice(result), + }, nil +} +``` + +### Pattern 2: Retrieve Secrets via ARM API + +Some secrets require additional API calls: + +```go +func (ex *ResourceExtension) ExportKubernetesSecrets( + ctx context.Context, + obj genruntime.MetaObject, + additionalSecrets set.Set[string], + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (*genruntime.KubernetesSecretExportResult, error) { + resource := obj.(*myservice.MyResource) + + if resource.Spec.OperatorSpec == nil || + resource.Spec.OperatorSpec.Secrets == nil { + return nil, nil + } + + // Get the resource ID for API calls + resourceID, hasID := genruntime.GetResourceID(resource) + if !hasID { + return nil, fmt.Errorf("resource doesn't have an ID yet") + } + + // Make ARM API call to retrieve keys + var keysResponse MyResourceKeysResponse + apiVersion := "2023-01-01" + keysURL := fmt.Sprintf("%s/listKeys", resourceID) + + _, err := armClient.PostByIDWithResponse( + ctx, + keysURL, + apiVersion, + nil, // request body + &keysResponse) + if err != nil { + return nil, fmt.Errorf("failed to retrieve keys: %w", err) + } + + // Collect secrets + collector := secrets.NewCollector(resource.Namespace) + secrets := resource.Spec.OperatorSpec.Secrets + collector.AddValue(secrets.Key1, keysResponse.Key1) + collector.AddValue(secrets.Key2, keysResponse.Key2) + + result, err := collector.Values() + if err != nil { + return nil, err + } + + return &genruntime.KubernetesSecretExportResult{ + Objs: secrets.SliceToClientObjectSlice(result), + }, nil +} +``` + +### Pattern 3: Combine Multiple Sources + +```go +func (ex *ResourceExtension) ExportKubernetesSecrets( + ctx context.Context, + obj genruntime.MetaObject, + additionalSecrets set.Set[string], + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (*genruntime.KubernetesSecretExportResult, error) { + resource := obj.(*myservice.MyResource) + + if resource.Spec.OperatorSpec == nil || + resource.Spec.OperatorSpec.Secrets == nil { + return nil, nil + } + + collector := secrets.NewCollector(resource.Namespace) + secrets := resource.Spec.OperatorSpec.Secrets + + // From status + collector.AddValue(secrets.Endpoint, resource.Status.Endpoint) + + // From ARM API + keys, err := ex.retrieveKeys(ctx, resource, armClient) + if err != nil { + return nil, err + } + collector.AddValue(secrets.PrimaryKey, keys.Primary) + + // Computed value + connectionString := fmt.Sprintf( + "Endpoint=%s;SharedAccessKey=%s", + resource.Status.Endpoint, + keys.Primary) + collector.AddValue(secrets.ConnectionString, connectionString) + + result, err := collector.Values() + if err != nil { + return nil, err + } + + return &genruntime.KubernetesSecretExportResult{ + Objs: secrets.SliceToClientObjectSlice(result), + }, nil +} +``` + +### Pattern 4: Using RawSecrets for Secret Expressions + +For advanced scenarios with secret expressions: + +```go +func (ex *ResourceExtension) ExportKubernetesSecrets( + ctx context.Context, + obj genruntime.MetaObject, + additionalSecrets set.Set[string], + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (*genruntime.KubernetesSecretExportResult, error) { + resource := obj.(*myservice.MyResource) + + // Regular secret export + collector := secrets.NewCollector(resource.Namespace) + if resource.Spec.OperatorSpec != nil && + resource.Spec.OperatorSpec.Secrets != nil { + collector.AddValue( + resource.Spec.OperatorSpec.Secrets.PrimaryKey, + resource.Status.PrimaryKey) + } + + secretObjs, err := collector.Values() + if err != nil { + return nil, err + } + + // Raw secrets for expressions (if requested) + rawSecrets := make(map[string]string) + if additionalSecrets.Contains("adminCredentials") { + creds, err := ex.getAdminCredentials(ctx, resource, armClient) + if err != nil { + return nil, err + } + rawSecrets["adminCredentials"] = creds + } + + return &genruntime.KubernetesSecretExportResult{ + Objs: secrets.SliceToClientObjectSlice(secretObjs), + RawSecrets: rawSecrets, + }, nil +} +``` + +## User Specification of Secrets + +Users control which secrets to export through the resource spec: + +```yaml +apiVersion: managedidentity.azure.com/v1api20230131 +kind: UserAssignedIdentity +metadata: + name: my-identity + namespace: default +spec: + location: westus2 + owner: + name: my-rg + operatorSpec: + secrets: + clientId: + name: identity-secret + key: clientId + principalId: + name: identity-secret + key: principalId + tenantId: + name: identity-secret + key: tenantId +``` + +This creates a Secret named `identity-secret` with three keys. + +## Secret Lifecycle + +Understanding the secret export process: + +1. **Resource reconciled**: ARM operations complete successfully +2. **Extension invoked**: `ExportKubernetesSecrets()` called +3. **Secrets retrieved**: Extension gathers secret values +4. **Secrets created**: Controller creates/updates Secret objects in Kubernetes +5. **Ready condition**: Resource marked as Ready +6. **Updates**: Secrets updated on each reconciliation if values change + +## Error Handling + +Proper error handling ensures users know when secret export fails: + +```go +// Retrieval failed +return nil, fmt.Errorf("failed to retrieve keys from Azure: %w", err) + +// Invalid configuration +return nil, fmt.Errorf("operatorSpec.secrets.connectionString requires a destination") + +// Partial failure - log warning and continue +log.V(Warning).Info("Failed to retrieve optional secret", "error", err) +// Continue with other secrets +``` + +## Testing + +When testing `KubernetesSecretExporter` extensions: + +1. **Test no secrets requested**: Verify nil return when nothing specified +2. **Test secret creation**: Verify Secret objects are correctly created +3. **Test multiple secrets**: Verify all requested secrets are exported +4. **Test API failures**: Verify error handling for ARM call failures +5. **Test secret updates**: Verify secrets update when values change +6. **Test secret references**: Verify correct names and keys + +Example test structure: + +```go +func TestUserAssignedIdentityExtension_ExportKubernetesSecrets(t *testing.T) { + t.Run("no secrets when not specified", func(t *testing.T) { + // Test nil result when operatorSpec.secrets is nil + }) + + t.Run("exports requested secrets", func(t *testing.T) { + // Test secret creation with values + }) + + t.Run("handles missing status values", func(t *testing.T) { + // Test behavior when status fields are nil + }) +} +``` + +## Security Considerations + +When implementing secret export: + +1. **Only export when requested**: Don't create secrets unless user specified them +2. **Use appropriate permissions**: Ensure proper RBAC for secret creation +3. **Validate destinations**: Ensure secret references are valid +4. **Handle nil values**: Don't export nil/empty secrets +5. **Log carefully**: Don't log secret values +6. **Follow naming conventions**: Use predictable, documented secret formats + +## Important Notes + +- **Return nil for no secrets**: If no secrets requested, return `nil, nil` +- **Secrets updated on reconciliation**: Values refresh with each reconciliation +- **Namespace scoping**: Secrets created in same namespace as resource +- **Secret format**: Follow Kubernetes Secret conventions (base64 encoding handled automatically) +- **Error impact**: Export failures prevent Ready condition +- **User control**: Only export what users explicitly request + +## Related Extension Points + +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Runs at similar time +- [SuccessfulCreationHandler]({{< relref "successful-creation-handler" >}}): Also runs after creation +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Runs before ARM operations + +## Related Interfaces + +- **KubernetesConfigExporter**: Similar interface for exporting ConfigMaps instead of Secrets +- **Secrets package**: Helper utilities for building Secret objects +- **SecretDestination**: Spec type for specifying where secrets should be exported diff --git a/docs/hugo/content/contributing/extension-points/post-reconciliation-checker.md b/docs/hugo/content/contributing/extension-points/post-reconciliation-checker.md new file mode 100644 index 00000000000..40e60092f30 --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/post-reconciliation-checker.md @@ -0,0 +1,430 @@ +--- +title: PostReconciliationChecker +linktitle: PostReconciliationChecker +weight: 70 +--- + +## Description + +`PostReconciliationChecker` allows resources to perform validation or checks after ARM reconciliation completes successfully. This extension is invoked after Azure operations succeed but before the Ready condition is marked successful, giving resources the ability to defer the Ready status until additional conditions are met. + +The interface is called at the end of the reconciliation process, after the resource has been successfully created or updated in Azure. It provides a final opportunity to verify that the resource is truly ready for use. + +## Interface Definition + +```go +type PostReconciliationChecker interface { + PostReconcileCheck( + ctx context.Context, + obj MetaObject, + owner MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next PostReconcileCheckFunc, + ) (PostReconcileCheckResult, error) +} + +type PostReconcileCheckFunc func( + ctx context.Context, + obj MetaObject, + owner MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (PostReconcileCheckResult, error) + +// Helper functions for creating results +func PostReconcileCheckResultSuccess() PostReconcileCheckResult +func PostReconcileCheckResultFailure(reason string) PostReconcileCheckResult +``` + +**Parameters:** +- `ctx`: The current operation context +- `obj`: The resource that was reconciled (with fresh status) +- `owner`: The parent resource (can be nil for root resources) +- `resourceResolver`: Helper for resolving resource references +- `armClient`: Client for making additional ARM API calls if needed +- `log`: Logger for the current operation +- `next`: The default check implementation (usually returns success) + +**Returns:** +- `PostReconcileCheckResult`: Indicates success or failure with optional reason +- `error`: Error if the check itself fails (not the same as check failure) + +## Motivation + +The `PostReconciliationChecker` extension exists to handle cases where: + +1. **Async operations**: Azure resource is created but still initializing or provisioning + +2. **Manual approval required**: Resources that require external approval before being considered ready + +3. **Dependent state**: Resource readiness depends on the state of other Azure resources + +4. **Complex readiness criteria**: Determining if a resource is "ready" requires more than just ARM success + +5. **Validation**: Additional checks needed to ensure resource is in expected state + +6. **Gradual rollout**: Resource created but waiting for deployment to complete + +The default behavior marks resources as Ready immediately after successful ARM operations. Some resources need to wait for additional conditions before being truly ready. + +## When to Use + +Implement `PostReconciliationChecker` when: + +- ✅ Resource continues initializing after ARM operations complete +- ✅ Manual approval or external processes must complete first +- ✅ Dependent resources need to reach certain states +- ✅ Complex validation is needed to determine readiness +- ✅ Resource provisioning happens asynchronously in Azure +- ✅ Users should wait before using the resource + +Do **not** use `PostReconciliationChecker` when: + +- ❌ ARM success means the resource is ready +- ❌ The check should happen before reconciliation (use PreReconciliationChecker) +- ❌ You're validating the spec (do that in webhooks) +- ❌ You're modifying the resource (use other extensions) + +## Example: Private Endpoint Approval Check + +The PrivateEndpoint resource uses `PostReconciliationChecker` to wait for manual connection approval: + +```go +var _ extensions.PostReconciliationChecker = &PrivateEndpointExtension{} + +func (extension *PrivateEndpointExtension) PostReconcileCheck( + _ context.Context, + obj genruntime.MetaObject, + _ genruntime.MetaObject, + _ *resolver.Resolver, + _ *genericarmclient.GenericClient, + _ logr.Logger, + _ extensions.PostReconcileCheckFunc, +) (extensions.PostReconcileCheckResult, error) { + // Type assert to the specific resource type + endpoint, ok := obj.(*network.PrivateEndpoint) + if !ok { + return extensions.PostReconcileCheckResult{}, + eris.Errorf("cannot run on unknown resource type %T, expected *network.PrivateEndpoint", obj) + } + + // Type assert hub version to catch breaking changes + var _ conversion.Hub = endpoint + + // Check if any manual connections are pending approval + var reqApprovals []string + if connections := endpoint.Status.ManualPrivateLinkServiceConnections; connections != nil { + for _, connection := range connections { + if *connection.PrivateLinkServiceConnectionState.Status != "Approved" { + reqApprovals = append(reqApprovals, *connection.Id) + } + } + } + + // If approvals are needed, fail the check with a helpful message + if len(reqApprovals) > 0 { + return extensions.PostReconcileCheckResultFailure( + fmt.Sprintf( + "Private connection(s) '%q' to the PrivateEndpoint requires approval", + reqApprovals)), nil + } + + // All connections approved, resource is ready + return extensions.PostReconcileCheckResultSuccess(), nil +} +``` + +**Key aspects of this example:** + +1. **Type assertions**: For both resource type and hub version +2. **Status inspection**: Examines resource status to determine readiness +3. **Clear failure messages**: Provides actionable information to users +4. **No ARM calls**: Uses existing status data +5. **Conditional result**: Returns success or failure based on state +6. **No error return**: Check itself succeeded, but resource not ready yet + +## Common Patterns + +### Pattern 1: Check Resource State + +```go +func (ex *ResourceExtension) PostReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PostReconcileCheckFunc, +) (extensions.PostReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Check if resource is in expected state + if resource.Status.ProvisioningState == nil { + return extensions.PostReconcileCheckResultFailure( + "provisioning state not yet available"), nil + } + + state := *resource.Status.ProvisioningState + if state != "Succeeded" { + return extensions.PostReconcileCheckResultFailure( + fmt.Sprintf("resource still provisioning: %s", state)), nil + } + + return extensions.PostReconcileCheckResultSuccess(), nil +} +``` + +### Pattern 2: Query Azure for Additional State + +```go +func (ex *ResourceExtension) PostReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PostReconcileCheckFunc, +) (extensions.PostReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Get resource ID for additional queries + resourceID, hasID := genruntime.GetResourceID(resource) + if !hasID { + return extensions.PostReconcileCheckResultFailure( + "resource ID not available"), nil + } + + // Query Azure for deployment status + deploymentReady, err := ex.checkDeploymentStatus(ctx, resourceID, armClient) + if err != nil { + // Check failed (not the same as check returning failure) + return extensions.PostReconcileCheckResult{}, err + } + + if !deploymentReady { + return extensions.PostReconcileCheckResultFailure( + "deployment still in progress"), nil + } + + return extensions.PostReconcileCheckResultSuccess(), nil +} +``` + +### Pattern 3: Check Dependent Resources + +```go +func (ex *ResourceExtension) PostReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PostReconcileCheckFunc, +) (extensions.PostReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Check if dependent resources are ready + if resource.Spec.DependencyReference != nil { + ready, err := ex.isDependencyReady(ctx, resource.Spec.DependencyReference, resourceResolver) + if err != nil { + return extensions.PostReconcileCheckResult{}, err + } + + if !ready { + return extensions.PostReconcileCheckResultFailure( + "waiting for dependent resource to be ready"), nil + } + } + + return extensions.PostReconcileCheckResultSuccess(), nil +} +``` + +### Pattern 4: Timeout After Extended Wait + +```go +func (ex *ResourceExtension) PostReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PostReconcileCheckFunc, +) (extensions.PostReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Check how long we've been waiting + if resource.Status.CreationTime != nil { + waitTime := time.Since(resource.Status.CreationTime.Time) + if waitTime > 30*time.Minute { + // Give up waiting, but don't fail permanently + log.V(Warning).Info( + "Resource taking longer than expected to be ready", + "waitTime", waitTime) + // Could return success here to stop blocking, or keep waiting + } + } + + // Perform the actual readiness check + if !ex.isReady(resource) { + return extensions.PostReconcileCheckResultFailure( + "resource initialization in progress"), nil + } + + return extensions.PostReconcileCheckResultSuccess(), nil +} +``` + +### Pattern 5: Call Next for Chain + +```go +func (ex *ResourceExtension) PostReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PostReconcileCheckFunc, +) (extensions.PostReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Perform custom check first + if !ex.customReadinessCheck(resource) { + return extensions.PostReconcileCheckResultFailure( + "custom readiness check failed"), nil + } + + // Call next checker in the chain (if any) + // The default implementation always returns success + return next(ctx, obj, owner, resourceResolver, armClient, log) +} +``` + +## Check Results + +The extension returns one of two results: + +### Success +```go +return extensions.PostReconcileCheckResultSuccess(), nil +``` +- Resource is ready +- Ready condition will be marked True +- Reconciliation completes successfully + +### Failure +```go +return extensions.PostReconcileCheckResultFailure("reason for not ready"), nil +``` +- Resource is not yet ready +- Warning condition set with the provided reason +- Reconciliation will retry later +- Resource requeued for another check + +### Error +```go +return extensions.PostReconcileCheckResult{}, fmt.Errorf("check failed: %w", err) +``` +- The check itself failed (couldn't determine readiness) +- Error condition set on resource +- Reconciliation will retry later + +## Failure vs. Error + +It's critical to distinguish between check failure and check error: + +- **Failure**: "The resource is not ready yet" (expected, will retry) + - Example: "waiting for approval", "deployment in progress" + - Returns `PostReconcileCheckResultFailure(reason), nil` + +- **Error**: "I couldn't determine if the resource is ready" (unexpected) + - Example: "failed to query Azure", "invalid state" + - Returns `PostReconcileCheckResult{}, error` + +## Reconciliation Impact + +When a post-reconciliation check fails: + +1. **Condition set**: Warning condition added to resource status +2. **Reconciliation requeued**: Controller will try again later +3. **No Ready condition**: Resource not marked as Ready +4. **User visibility**: Users see the warning condition with reason + +This continues until the check succeeds or the resource is deleted. + +## Testing + +When testing `PostReconciliationChecker` extensions: + +1. **Test success case**: Verify check passes when resource is ready +2. **Test failure cases**: Cover all scenarios that should defer readiness +3. **Test error handling**: Verify proper error returns +4. **Test with real status**: Use realistic status values +5. **Test retry behavior**: Verify requeue happens correctly + +Example test structure: + +```go +func TestPrivateEndpointExtension_PostReconcileCheck(t *testing.T) { + t.Run("success when all connections approved", func(t *testing.T) { + // Test success path + }) + + t.Run("failure when approval pending", func(t *testing.T) { + // Test failure with pending approval + }) + + t.Run("failure with multiple pending approvals", func(t *testing.T) { + // Test multiple pending approvals + }) + + t.Run("success with no manual connections", func(t *testing.T) { + // Test when no manual connections exist + }) +} +``` + +## Important Notes + +- **Call `next()` if appropriate**: Allows for check chaining (rarely needed) +- **Don't modify the resource**: This is for validation only +- **Be patient**: Checks may run many times before succeeding +- **Provide clear reasons**: Failure messages shown to users +- **Log appropriately**: Help debugging without noise +- **Handle nil values**: Status fields may not be populated yet +- **Consider performance**: This runs on every reconciliation + +## Common Readiness Scenarios + +Here are typical reasons to use post-reconciliation checks: + +1. **Async provisioning**: Azure resource created but still initializing +2. **Manual approval**: External human approval required +3. **Deployment rollout**: Waiting for deployment to all regions/instances +4. **Certificate generation**: Waiting for certs to be issued +5. **DNS propagation**: Waiting for DNS changes to propagate +6. **Dependent services**: Waiting for related services to initialize + +## Related Extension Points + +- [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}): Check before reconciliation +- [KubernetesSecretExporter]({{< relref "kubernetes-secret-exporter" >}}): Runs at similar time +- [SuccessfulCreationHandler]({{< relref "successful-creation-handler" >}}): Runs after initial creation +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Modify before reconciliation + +## Best Practices + +1. **Keep checks fast**: Runs frequently, avoid expensive operations +2. **Be idempotent**: Check may run multiple times +3. **Use status fields**: Prefer examining status over ARM calls +4. **Provide context**: Clear failure messages help users +5. **Consider timeouts**: Don't wait forever for readiness +6. **Log decisions**: Help debugging by logging why checks fail/succeed diff --git a/docs/hugo/content/contributing/extension-points/pre-reconciliation-checker.md b/docs/hugo/content/contributing/extension-points/pre-reconciliation-checker.md new file mode 100644 index 00000000000..4f8397ac2cf --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/pre-reconciliation-checker.md @@ -0,0 +1,455 @@ +--- +title: PreReconciliationChecker +linktitle: PreReconciliationChecker +weight: 75 +--- + +## Description + +`PreReconciliationChecker` allows resources to perform validation or checks before ARM reconciliation begins. This extension is invoked before sending any requests to Azure, giving resources the ability to block reconciliation until certain prerequisites are met. + +The interface is called early in the reconciliation process, after basic claim/validation but before any ARM GET/PUT/PATCH operations. It provides an opportunity to ensure that conditions necessary for successful reconciliation are satisfied. + +## Interface Definition + +```go +type PreReconciliationChecker interface { + PreReconcileCheck( + ctx context.Context, + obj MetaObject, + owner MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next PreReconcileCheckFunc, + ) (PreReconcileCheckResult, error) +} + +type PreReconcileCheckFunc func( + ctx context.Context, + obj MetaObject, + owner MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (PreReconcileCheckResult, error) + +// Helper functions for creating results +func ProceedWithReconcile() PreReconcileCheckResult +func BlockReconcile(reason string) PreReconcileCheckResult +``` + +**Parameters:** +- `ctx`: The current operation context +- `obj`: The resource about to be reconciled (with fresh status) +- `owner`: The parent resource (can be nil for root resources) +- `resourceResolver`: Helper for resolving resource references +- `armClient`: Client for making ARM API calls if needed +- `log`: Logger for the current operation +- `next`: The default check implementation (usually proceeds) + +**Returns:** +- `PreReconcileCheckResult`: Indicates proceed or block with optional reason +- `error`: Error if the check itself fails + +## Motivation + +The `PreReconciliationChecker` extension exists to handle cases where: + +1. **Prerequisite validation**: Ensuring required conditions are met before attempting ARM operations + +2. **Owner readiness**: Verifying that parent resources are in a state suitable for child creation + +3. **Quota checking**: Validating that operation won't exceed limits + +4. **Preventing futile operations**: Blocking reconciliation attempts that cannot possibly succeed + +5. **External dependencies**: Waiting for external systems or resources to be ready + +6. **Resource state validation**: Ensuring the resource is in an appropriate state for reconciliation + +The default behavior attempts reconciliation immediately. Some resources need to verify prerequisites first to avoid unnecessary ARM calls, rate limiting, or error conditions. + +## When to Use + +Implement `PreReconciliationChecker` when: + +- ✅ Parent/owner resources must reach certain states first +- ✅ External dependencies must be satisfied before reconciling +- ✅ Spec validation requires complex logic beyond webhooks +- ✅ Reconciliation would fail due to known prerequisites not being met +- ✅ Rate limiting or quota concerns require gating +- ✅ Resource configuration requires specific ordering + +Do **not** use `PreReconciliationChecker` when: + +- ❌ Simple validation can be done in admission webhooks +- ❌ The check should happen after reconciliation (use PostReconciliationChecker) +- ❌ You're trying to modify the resource (use other extensions) +- ❌ The default behavior works correctly + +## Example: Waiting for Owner Ready State + +A common pattern is waiting for the parent resource to be ready: + +```go +var _ extensions.PreReconciliationChecker = &MyResourceExtension{} + +func (ex *MyResourceExtension) PreReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + // Type assert to specific resource type + resource, ok := obj.(*myservice.MyResource) + if !ok { + return extensions.PreReconcileCheckResult{}, + eris.Errorf("cannot run on unknown resource type %T", obj) + } + + // Type assert hub version to catch breaking changes + var _ conversion.Hub = resource + + // Check if owner is provided and ready + if owner != nil { + ready := conditions.IsReady(owner) + if !ready { + ownerName := owner.GetName() + return extensions.BlockReconcile( + fmt.Sprintf("waiting for owner %s to be ready", ownerName)), nil + } + } + + // Owner is ready (or not required), proceed with reconciliation + return extensions.ProceedWithReconcile(), nil +} +``` + +**Key aspects of this example:** + +1. **Type assertions**: For both resource type and hub version +2. **Owner check**: Validates owner state before proceeding +3. **Clear blocking messages**: Provides reason for blocking +4. **No error**: Check succeeded, but reconciliation should wait +5. **Proceeds when ready**: Returns proceed result when checks pass + +## Common Patterns + +### Pattern 1: Check Owner Ready Condition + +```go +func (ex *ResourceExtension) PreReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Ensure owner is ready before creating child + if owner == nil { + return extensions.BlockReconcile("owner not found"), nil + } + + if !conditions.IsReady(owner) { + return extensions.BlockReconcile( + fmt.Sprintf("owner %s/%s not ready", + owner.GetNamespace(), owner.GetName())), nil + } + + // Owner ready, call next checker in chain + return next(ctx, obj, owner, resourceResolver, armClient, log) +} +``` + +### Pattern 2: Validate Required References + +```go +func (ex *ResourceExtension) PreReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Check if required references are ready + if resource.Spec.NetworkReference != nil { + resolved, err := resourceResolver.ResolveResourceReference( + ctx, resource.Spec.NetworkReference) + if err != nil { + return extensions.PreReconcileCheckResult{}, err + } + + if !resolved.Found { + return extensions.BlockReconcile( + "required network reference not found"), nil + } + + if !conditions.IsReady(resolved.Resource) { + return extensions.BlockReconcile( + "required network not ready"), nil + } + } + + return extensions.ProceedWithReconcile(), nil +} +``` + +### Pattern 3: Validate Configuration Prerequisites + +```go +func (ex *ResourceExtension) PreReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Complex validation that can't be done in webhook + if resource.Spec.AdvancedConfig != nil { + if err := ex.validateAdvancedConfig(resource.Spec.AdvancedConfig); err != nil { + // Configuration is invalid, block permanently + return extensions.PreReconcileCheckResult{}, conditions.NewReadyConditionImpactingError( + err, + conditions.ConditionSeverityError, + conditions.ReasonFailed) + } + } + + return extensions.ProceedWithReconcile(), nil +} +``` + +### Pattern 4: Check Azure Resource State + +```go +func (ex *ResourceExtension) PreReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Get resource ID if it exists + resourceID, hasID := genruntime.GetResourceID(resource) + if !hasID { + // Not claimed yet, proceed + return extensions.ProceedWithReconcile(), nil + } + + // Query current state from Azure + var azureState MyResourceState + apiVersion := "2023-01-01" + _, err := armClient.GetByID(ctx, resourceID, apiVersion, &azureState) + if err != nil { + // Handle error appropriately + return extensions.PreReconcileCheckResult{}, err + } + + // Check if resource is in a state that allows updates + if azureState.Status == "Locked" || azureState.Status == "Deleting" { + return extensions.BlockReconcile( + fmt.Sprintf("resource in %s state, cannot reconcile", azureState.Status)), nil + } + + return extensions.ProceedWithReconcile(), nil +} +``` + +### Pattern 5: Rate Limiting or Throttling + +```go +func (ex *ResourceExtension) PreReconcileCheck( + ctx context.Context, + obj genruntime.MetaObject, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + resource := obj.(*myservice.MyResource) + + // Check if we're being rate limited + if ex.shouldThrottle(resource) { + return extensions.BlockReconcile( + "rate limit reached, waiting before retry"), nil + } + + // Check quota + quotaOK, err := ex.checkQuota(ctx, resource, armClient) + if err != nil { + return extensions.PreReconcileCheckResult{}, err + } + + if !quotaOK { + return extensions.BlockReconcile("quota exceeded"), nil + } + + return extensions.ProceedWithReconcile(), nil +} +``` + +## Check Results + +The extension returns one of two results: + +### Proceed +```go +return extensions.ProceedWithReconcile(), nil +``` +- Prerequisites are satisfied +- Reconciliation will continue normally +- ARM operations will be attempted + +### Block +```go +return extensions.BlockReconcile("reason for blocking"), nil +``` +- Prerequisites not met +- Reconciliation skipped for now +- Resource will be requeued to try again later +- Condition set with the blocking reason + +### Error +```go +return extensions.PreReconcileCheckResult{}, fmt.Errorf("check failed: %w", err) +``` +- The check itself failed +- Error condition set on resource +- Reconciliation blocked until error resolved + +## Block vs. Error + +Understanding the difference is important: + +- **Block**: "Not ready to reconcile yet, try again later" (transient) + - Example: "waiting for owner", "resource locked" + - Returns `BlockReconcile(reason), nil` + - Resource requeued automatically + +- **Error**: "Something went wrong during the check" (needs attention) + - Example: "invalid configuration", "failed to query Azure" + - Returns `PreReconcileCheckResult{}, error` + - May be permanent issue requiring user action + +## Reconciliation Impact + +When a pre-reconciliation check blocks: + +1. **No ARM operations**: No requests sent to Azure +2. **Condition set**: Condition added explaining why blocked +3. **Reconciliation requeued**: Will try again automatically +4. **Resource state preserved**: No changes made to resource +5. **Efficient waiting**: Avoids unnecessary ARM calls + +This continues until the check passes or the resource is deleted. + +## Testing + +When testing `PreReconciliationChecker` extensions: + +1. **Test proceed case**: Verify check passes when prerequisites met +2. **Test block cases**: Cover all blocking scenarios +3. **Test error handling**: Verify proper error handling +4. **Test with nil owner**: Handle cases with no owner +5. **Test check chain**: Verify calling next() works correctly + +Example test structure: + +```go +func TestMyResourceExtension_PreReconcileCheck(t *testing.T) { + t.Run("proceeds when owner ready", func(t *testing.T) { + // Test success case + }) + + t.Run("blocks when owner not ready", func(t *testing.T) { + // Test blocking when owner not ready + }) + + t.Run("blocks when owner missing", func(t *testing.T) { + // Test blocking when owner is nil + }) + + t.Run("proceeds when no owner required", func(t *testing.T) { + // Test for resources without owners + }) +} +``` + +## Performance Considerations + +Pre-reconciliation checks run on **every** reconciliation attempt, so: + +- **Keep them fast**: Avoid expensive operations when possible +- **Cache results**: If appropriate, cache validation results +- **Minimize ARM calls**: Use status/cache over live queries +- **Fail fast**: Return quickly when blocking +- **Be selective**: Only check what's necessary + +## Common Use Cases + +1. **Owner/Parent Readiness**: Most common - wait for parent to be ready +2. **Reference Resolution**: Ensure referenced resources exist and are ready +3. **Ordering Dependencies**: Ensure resources created in correct order +4. **State Validation**: Verify resource in appropriate state for updates +5. **Quota/Limit Checks**: Prevent operations that would exceed limits +6. **External System Dependencies**: Wait for external systems to be ready + +## Important Notes + +- **Call `next()` when checks pass**: Allows for check chaining +- **Don't modify the resource**: This is for validation only +- **Provide clear reasons**: Blocking messages shown to users +- **Be idempotent**: Checks may run many times +- **Handle nil owner**: Owner can be nil for root resources +- **Use conditions package**: For setting appropriate conditions +- **Log decisions**: Help debugging by explaining why checks block + +## Relationship to Other Validation + +Pre-reconciliation checks complement other validation mechanisms: + +- **Admission Webhooks**: Validate at create/update time (prefer for simple validation) +- **PreReconciliationChecker**: Validate at reconciliation time (for complex/runtime checks) +- **PostReconciliationChecker**: Validate after reconciliation (for readiness checks) + +Use the right tool for the right job: +- Simple field validation → Admission webhook +- Runtime dependency checks → PreReconciliationChecker +- Async readiness validation → PostReconciliationChecker + +## Related Extension Points + +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Check after reconciliation +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Runs after pre-reconciliation check +- [ErrorClassifier]({{< relref "error-classifier" >}}): Handles errors that could have been prevented + +## Best Practices + +1. **Validate early**: Catch issues before ARM operations +2. **Clear messaging**: Users need to understand why reconciliation is blocked +3. **Appropriate blocking**: Only block when reconciliation would fail +4. **Consider performance**: These run frequently +5. **Use conditions**: Set appropriate conditions when blocking +6. **Handle edge cases**: Nil owners, missing references, etc. +7. **Test thoroughly**: Cover all blocking scenarios diff --git a/docs/hugo/content/contributing/extension-points/pre-reconciliation-owner-checker.md b/docs/hugo/content/contributing/extension-points/pre-reconciliation-owner-checker.md new file mode 100644 index 00000000000..78e549c345e --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/pre-reconciliation-owner-checker.md @@ -0,0 +1,493 @@ +--- +title: PreReconciliationOwnerChecker +linktitle: PreReconciliationOwnerChecker +weight: 76 +--- + +## Description + +`PreReconciliationOwnerChecker` is a specialized variant of `PreReconciliationChecker` that validates only the owner's state before reconciliation, without accessing the resource itself. This extension is invoked before any ARM operations, including GET requests on the resource being reconciled. + +The key difference from `PreReconciliationChecker` is that this extension **avoids** performing GET operations on the resource itself, only checking the parent/owner resource. This is critical for resources where the owner's state can completely block access to the child resource. + +## Interface Definition + +```go +type PreReconciliationOwnerChecker interface { + PreReconcileOwnerCheck( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next PreReconcileOwnerCheckFunc, + ) (PreReconcileCheckResult, error) +} + +type PreReconcileOwnerCheckFunc func( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, +) (PreReconcileCheckResult, error) + +// Helper functions for creating results +func ProceedWithReconcile() PreReconcileCheckResult +func BlockReconcile(reason string) PreReconcileCheckResult +``` + +**Parameters:** +- `ctx`: The current operation context +- `owner`: The parent resource (can be nil for root resources or ARM ID references) +- `resourceResolver`: Helper for resolving resource references +- `armClient`: Client for making ARM API calls if needed +- `log`: Logger for the current operation +- `next`: The default check implementation (usually proceeds) + +**Returns:** +- `PreReconcileCheckResult`: Indicates proceed or block with optional reason +- `error`: Error if the check itself fails + +**Note:** Unlike `PreReconciliationChecker`, this interface does **not** receive the resource being reconciled (`obj`). It only has access to the owner. + +## Motivation + +The `PreReconciliationOwnerChecker` extension exists to handle a specific class of Azure resources where: + +1. **Owner state blocks all access**: The parent resource's state can prevent any operations on child resources, including GET + +2. **Cannot determine child state**: You cannot query the child resource to check its state when the owner is in certain states + +3. **Avoid wasted API calls**: Attempting to GET or PUT a child when the owner blocks access wastes API quota and generates errors + +4. **Owner-dependent access**: Some Azure services completely lock down child resources when the parent is in maintenance, updating, or powered-off states + +The most notable example is **Azure Data Explorer (Kusto)**, where you cannot even GET a database when the cluster is powered off or updating. Without this extension, the controller would repeatedly attempt to access the database, failing each time and consuming request quota. + +## When to Use + +Implement `PreReconciliationOwnerChecker` when: + +- ✅ The owner's state can block **all** access to the child resource, including GET operations +- ✅ You need to avoid GET requests on the resource itself +- ✅ The parent resource has states that completely prevent child access (powered off, updating, locked) +- ✅ Attempting operations when the owner is in certain states always fails +- ✅ You want to avoid wasting API quota on operations that will definitely fail + +Do **not** use `PreReconciliationOwnerChecker` when: + +- ❌ You can safely GET the resource regardless of owner state (use `PreReconciliationChecker` instead) +- ❌ You need access to the resource's current state to make the decision +- ❌ The owner's state doesn't affect access to the child +- ❌ The check can be done after retrieving the resource + +## PreReconciliationOwnerChecker vs PreReconciliationChecker + +Understanding the difference is critical: + +| Aspect | PreReconciliationOwnerChecker | PreReconciliationChecker | +|--------|------------------------------|--------------------------| +| **When invoked** | Before any ARM calls, including GET | After GET, before PUT/PATCH | +| **Resource access** | No - only owner checked | Yes - resource status available | +| **Use case** | Owner blocks all access | Validation based on current state | +| **GET avoidance** | Yes - no GET performed | No - GET already performed | +| **Parameters** | Only `owner` | Both `obj` and `owner` | + +**Rule of thumb:** If you can GET the resource to check its state, use `PreReconciliationChecker`. If even GET fails when the owner is in certain states, use `PreReconciliationOwnerChecker`. + +## Example: Kusto Database Owner State Check + +The Kusto Database resource uses `PreReconciliationOwnerChecker` because databases cannot be accessed when the cluster is in certain states: + +```go +var _ extensions.PreReconciliationOwnerChecker = &DatabaseExtension{} + +// Ensure we're dealing with the hub version of Cluster +var _ conversion.Hub = &kusto.Cluster{} + +func (ext *DatabaseExtension) PreReconcileOwnerCheck( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileOwnerCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + // Check to see if the owning cluster is in a state that will block us + // Owner can be nil if referenced by ARM ID + if owner != nil { + if cluster, ok := owner.(*kusto.Cluster); ok { + // If our owning cluster is in a state that will reject any operation, + // we should skip reconciliation entirely. + // + // When we reconcile the cluster, it enters an `Updating` state. + // During this time, we can't even try to reconcile the database - + // the operation will fail with a `Conflict` error. + // + // Checking the cluster state allows us to "play nice" and not use up + // request quota attempting operations we know will fail. + state := cluster.Status.ProvisioningState + if state != nil && clusterProvisioningStateBlocksReconciliation(state) { + return extensions.BlockReconcile( + fmt.Sprintf("Owning cluster is in provisioning state %q", *state)), + nil + } + } + } + + // Owner is in acceptable state, proceed + return next(ctx, owner, resourceResolver, armClient, log) +} + +// Helper function to determine if cluster state blocks database access +func clusterProvisioningStateBlocksReconciliation(state *string) bool { + if state == nil { + return false + } + + // These states prevent all database operations + blocked := []string{ + "Creating", + "Updating", + "Deleting", + "Stopped", + "Stopping", + } + + for _, blockedState := range blocked { + if *state == blockedState { + return true + } + } + + return false +} +``` + +**Key aspects of this example:** + +1. **Owner type assertion**: Checks if owner is a Kusto Cluster +2. **Nil handling**: Gracefully handles nil owner (ARM ID references) +3. **State checking**: Examines cluster's provisioning state +4. **Clear blocking messages**: Provides specific reason for blocking +5. **Helper function**: Encapsulates state-checking logic +6. **Calls next**: Proceeds when cluster is in acceptable state + +## Common Patterns + +### Pattern 1: Block on Owner State + +```go +func (ex *ResourceExtension) PreReconcileOwnerCheck( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileOwnerCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + // Handle nil owner + if owner == nil { + // No owner to check, proceed + return next(ctx, owner, resourceResolver, armClient, log) + } + + // Type assert to specific owner type + parent, ok := owner.(*parentservice.ParentResource) + if !ok { + // Not the expected owner type, proceed + return next(ctx, owner, resourceResolver, armClient, log) + } + + // Check owner state + if parent.Status.State != nil && *parent.Status.State == "PoweredOff" { + return extensions.BlockReconcile("parent is powered off"), nil + } + + if parent.Status.State != nil && *parent.Status.State == "Updating" { + return extensions.BlockReconcile("parent is updating"), nil + } + + // Owner in acceptable state + return next(ctx, owner, resourceResolver, armClient, log) +} +``` + +### Pattern 2: Multiple Blocking States + +```go +func (ex *ResourceExtension) PreReconcileOwnerCheck( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileOwnerCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + if owner == nil { + return next(ctx, owner, resourceResolver, armClient, log) + } + + parent, ok := owner.(*parentservice.ParentResource) + if !ok { + return next(ctx, owner, resourceResolver, armClient, log) + } + + // Check multiple conditions that block access + if parent.Status.State != nil { + state := *parent.Status.State + + blockedStates := map[string]string{ + "Stopped": "parent is stopped and children cannot be accessed", + "Stopping": "parent is stopping, wait for it to complete", + "Deleting": "parent is being deleted", + "Updating": "parent is updating, operations will conflict", + "Migrating": "parent is migrating, children unavailable", + } + + if reason, isBlocked := blockedStates[state]; isBlocked { + return extensions.BlockReconcile(reason), nil + } + } + + return next(ctx, owner, resourceResolver, armClient, log) +} +``` + +### Pattern 3: Conditional Blocking Based on Multiple Factors + +```go +func (ex *ResourceExtension) PreReconcileOwnerCheck( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileOwnerCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + if owner == nil { + return next(ctx, owner, resourceResolver, armClient, log) + } + + parent, ok := owner.(*parentservice.ParentResource) + if !ok { + return next(ctx, owner, resourceResolver, armClient, log) + } + + // Check both state and SKU + // Some SKUs allow operations even during certain states + if parent.Status.State != nil && *parent.Status.State == "Updating" { + // Premium SKU allows limited operations during updates + if parent.Spec.SKU != nil && *parent.Spec.SKU.Name != "Premium" { + return extensions.BlockReconcile( + "parent is updating and SKU does not support concurrent operations"), nil + } + } + + // Check for maintenance mode + if parent.Status.InMaintenanceMode != nil && *parent.Status.InMaintenanceMode { + return extensions.BlockReconcile("parent is in maintenance mode"), nil + } + + return next(ctx, owner, resourceResolver, armClient, log) +} +``` + +### Pattern 4: Logging for Debugging + +```go +func (ex *ResourceExtension) PreReconcileOwnerCheck( + ctx context.Context, + owner genruntime.MetaObject, + resourceResolver *resolver.Resolver, + armClient *genericarmclient.GenericClient, + log logr.Logger, + next extensions.PreReconcileOwnerCheckFunc, +) (extensions.PreReconcileCheckResult, error) { + if owner == nil { + log.V(Debug).Info("No owner to check, proceeding with reconciliation") + return next(ctx, owner, resourceResolver, armClient, log) + } + + parent, ok := owner.(*parentservice.ParentResource) + if !ok { + log.V(Debug).Info("Owner is not expected type, proceeding") + return next(ctx, owner, resourceResolver, armClient, log) + } + + state := parent.Status.State + log.V(Debug).Info("Checking parent state", + "parentName", parent.Name, + "state", state) + + if state != nil && *state == "Updating" { + log.V(Info).Info("Blocking reconciliation due to parent state", + "parentName", parent.Name, + "state", *state) + return extensions.BlockReconcile( + fmt.Sprintf("parent %s is updating", parent.Name)), nil + } + + log.V(Debug).Info("Parent state allows reconciliation", "state", state) + return next(ctx, owner, resourceResolver, armClient, log) +} +``` + +## Check Results + +The extension returns one of two results: + +### Proceed +```go +return extensions.ProceedWithReconcile(), nil +``` +- Owner state permits reconciliation +- Reconciliation will continue (GET will be attempted) +- Normal reconciliation flow proceeds + +### Block +```go +return extensions.BlockReconcile("parent is updating"), nil +``` +- Owner state blocks reconciliation +- No GET or other operations attempted on the resource +- Resource requeued to try again later +- Condition set with the blocking reason + +### Error +```go +return extensions.PreReconcileCheckResult{}, fmt.Errorf("check failed: %w", err) +``` +- The check itself failed +- Error condition set on resource +- Reconciliation blocked until error resolved + +## Reconciliation Flow + +Understanding where this fits in the reconciliation process: + +1. **Resource needs reconciliation**: Controller picks up resource +2. **Owner resolution**: Owner resource identified and fetched +3. **PreReconciliationOwnerChecker invoked**: Extension checks owner state **← YOU ARE HERE** +4. **Decision point**: + - If **blocked**: Stop here, requeue, set condition + - If **proceed**: Continue to next step +5. **GET resource**: Fetch resource from Azure (only if not blocked) +6. **PreReconciliationChecker**: Check resource state (if implemented) +7. **ARM operations**: PUT/PATCH/DELETE as needed + +## Testing + +When testing `PreReconciliationOwnerChecker` extensions: + +1. **Test with nil owner**: Verify handling when owner is nil +2. **Test with wrong owner type**: Verify graceful handling +3. **Test blocking states**: Cover all states that should block +4. **Test proceed states**: Verify reconciliation proceeds when appropriate +5. **Test error handling**: Verify proper error returns + +Example test structure: + +```go +func TestDatabaseExtension_PreReconcileOwnerCheck(t *testing.T) { + t.Run("proceeds when owner is nil", func(t *testing.T) { + // Test nil owner handling + }) + + t.Run("proceeds when cluster is running", func(t *testing.T) { + // Test with cluster in Running state + }) + + t.Run("blocks when cluster is updating", func(t *testing.T) { + // Test blocking on Updating state + }) + + t.Run("blocks when cluster is stopped", func(t *testing.T) { + // Test blocking on Stopped state + }) + + t.Run("proceeds with non-cluster owner", func(t *testing.T) { + // Test when owner is different type + }) +} +``` + +## Performance Considerations + +This extension is more efficient than `PreReconciliationChecker` when: + +- Owner checks are sufficient to make the decision +- Avoiding GET saves API quota +- Owner state changes less frequently than resource state +- Multiple child resources share the same owner (check once, benefit many times) + +However, it's less flexible because: + +- You cannot examine the resource's current state +- You must make decisions based only on owner information +- You cannot access resource-specific fields or status + +## Important Notes + +- **No resource access**: You do **not** receive the resource being reconciled +- **Owner only**: Make decisions based solely on owner state +- **Earlier in flow**: Runs before GET, unlike PreReconciliationChecker +- **Nil owner**: Always handle the nil owner case gracefully +- **Type assertions**: Verify owner is expected type before accessing fields +- **Call next()**: Call the next checker in the chain when checks pass +- **Clear reasons**: Provide helpful blocking messages for users +- **Logging**: Log decisions to help debugging + +## Real-World Scenarios + +### Scenario 1: Kusto (Azure Data Explorer) + +**Problem:** Kusto databases cannot be accessed at all when the cluster is stopped or updating. + +**Solution:** Check cluster state before attempting any database operations. + +```go +// Block on: Creating, Updating, Deleting, Stopped, Stopping +// Allow on: Running, Succeeded +``` + +### Scenario 2: Virtual Machine Scale Set Instances + +**Problem:** Cannot manage individual VMSS instances when the scale set is updating or deleting. + +**Solution:** Check VMSS state before attempting instance operations. + +```go +// Block on: Updating, Deallocating, Deleting +// Allow on: Running, Succeeded +``` + +### Scenario 3: Container Service Agent Pools + +**Problem:** Agent pools cannot be modified when the cluster is upgrading. + +**Solution:** Check AKS cluster upgrade state before pool operations. + +```go +// Block on: Upgrading, Updating +// Allow on: Succeeded, Running +``` + +## Related Extension Points + +- [PreReconciliationChecker]({{< relref "pre-reconciliation-checker" >}}): Check resource state (use this unless you need owner-only checks) +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Validate after reconciliation +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Modify payloads (runs after checks pass) + +## Best Practices + +1. **Prefer PreReconciliationChecker**: Use this only when necessary +2. **Handle nil owner**: Always check for nil before dereferencing +3. **Type assert safely**: Verify owner type before accessing fields +4. **Document blocking states**: Comment which states block and why +5. **Use helper functions**: Encapsulate state-checking logic +6. **Log decisions**: Help debugging by logging why checks block +7. **Clear messages**: Users need to understand why reconciliation is blocked +8. **Call next()**: Enable check chaining +9. **Test thoroughly**: Cover all owner states and edge cases diff --git a/docs/hugo/content/contributing/extension-points/successful-creation-handler.md b/docs/hugo/content/contributing/extension-points/successful-creation-handler.md new file mode 100644 index 00000000000..a26089c4572 --- /dev/null +++ b/docs/hugo/content/contributing/extension-points/successful-creation-handler.md @@ -0,0 +1,404 @@ +--- +title: SuccessfulCreationHandler +linktitle: SuccessfulCreationHandler +weight: 80 +--- + +## Description + +`SuccessfulCreationHandler` allows resources to perform custom logic immediately after they are successfully created in Azure for the first time. This extension is invoked once after the initial ARM PUT operation succeeds, giving resources the opportunity to capture information, set fields, or perform initialization that depends on the Azure-assigned resource ID. + +The interface is called after the resource exists in Azure and has been assigned an ID, but before subsequent reconciliations. It runs exactly once per resource, during its initial creation. + +## Interface Definition + +```go +type SuccessfulCreationHandler interface { + Success(obj genruntime.ARMMetaObject) error +} + +type SuccessFunc = func(obj genruntime.ARMMetaObject) error +``` + +**Parameters:** +- `obj`: The resource that was just created, with populated status including the Azure resource ID + +**Returns:** +- `error`: Error if the success handling fails (will prevent the Ready condition) + +## Motivation + +The `SuccessfulCreationHandler` extension exists to handle cases where: + +1. **Derived IDs**: Some resources need to compute or override their resource ID based on Azure's response + +2. **Child resource IDs**: Parent resources may need to set special IDs for child resources to reference + +3. **Post-creation initialization**: One-time setup that can only happen after the resource exists in Azure + +4. **Status field initialization**: Setting status fields that depend on the Azure resource ID + +5. **Special ARM ID handling**: Resources with non-standard ARM ID structures that need custom handling + +Most resources receive their ARM ID through the standard process. Some resources have special requirements that need custom handling when the resource is first created. + +## When to Use + +Implement `SuccessfulCreationHandler` when: + +- ✅ Resource ID needs custom computation after creation +- ✅ Child resources need special ID references set on the parent +- ✅ One-time initialization required after Azure creation +- ✅ Resource has non-standard ARM ID structure +- ✅ Status fields need to be set based on the created resource + +Do **not** use `SuccessfulCreationHandler` when: + +- ❌ The standard resource ID handling works correctly +- ❌ The logic should run on every reconciliation (use other extensions) +- ❌ You're modifying the spec (that should be done elsewhere) +- ❌ The initialization doesn't depend on the resource existing in Azure + +## Example: Subscription Alias ID Override + +The Subscription Alias resource uses `SuccessfulCreationHandler` to override the child resource ID to point to the subscription itself rather than the alias: + +```go +var _ extensions.SuccessfulCreationHandler = &AliasExtension{} + +func (extension *AliasExtension) Success(obj genruntime.ARMMetaObject) error { + // Type assert to the specific resource type + typedObj, ok := obj.(*storage.Alias) + if !ok { + return eris.Errorf( + "cannot run on unknown resource type %T, expected *subscription.Alias", + obj) + } + + // Type assert hub version to catch breaking changes + var _ conversion.Hub = typedObj + + // Get the subscription ID from the alias status + subscriptionID, ok := getSubscriptionID(typedObj) + if !ok { + // SubscriptionID isn't populated. That's a problem + return eris.Errorf("SubscriptionID field not populated") + } + + // Override the child resource ID to point to the subscription + // This allows child resources to reference the subscription directly + genruntime.SetChildResourceIDOverride( + typedObj, + genericarmclient.MakeSubscriptionID(subscriptionID)) + + return nil +} + +func getSubscriptionID(typedObj *storage.Alias) (string, bool) { + if typedObj.Status.Properties == nil || + typedObj.Status.Properties.SubscriptionId == nil || + *typedObj.Status.Properties.SubscriptionId == "" { + return "", false + } + + return *typedObj.Status.Properties.SubscriptionId, true +} +``` + +**Key aspects of this example:** + +1. **Type assertions**: For both resource type and hub version +2. **Status access**: Retrieves information from the populated status +3. **ID override**: Sets a custom child resource ID reference +4. **Validation**: Checks that required status fields are present +5. **Error handling**: Returns error if required data missing + +## Common Patterns + +### Pattern 1: Set Computed Status Fields + +```go +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + + // Type assert hub version + var _ conversion.Hub = resource + + // Get the Azure resource ID + resourceID, hasID := genruntime.GetResourceID(resource) + if !hasID { + return fmt.Errorf("resource ID not set after creation") + } + + // Compute derived values from the resource ID + // For example, extract subscription ID, resource group, etc. + parsed, err := arm.ParseResourceID(resourceID) + if err != nil { + return fmt.Errorf("failed to parse resource ID: %w", err) + } + + // Set status fields that depend on the ID + resource.Status.SubscriptionID = &parsed.SubscriptionID + resource.Status.ResourceGroup = &parsed.ResourceGroupName + + return nil +} +``` + +### Pattern 2: Initialize One-Time Configuration + +```go +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + var _ conversion.Hub = resource + + // Perform one-time initialization that depends on the resource existing + if err := ex.initializePostCreation(resource); err != nil { + return fmt.Errorf("post-creation initialization failed: %w", err) + } + + return nil +} +``` + +### Pattern 3: Set Child Resource Reference + +```go +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + var _ conversion.Hub = resource + + // Extract a specific value from status to use as child resource ID + if resource.Status.SpecialID == nil { + return fmt.Errorf("special ID not populated in status") + } + + // Set this as the reference ID for child resources + specialID := *resource.Status.SpecialID + genruntime.SetChildResourceIDOverride( + resource, + fmt.Sprintf("/subscriptions/.../providers/.../resources/%s", specialID)) + + return nil +} +``` + +### Pattern 4: Validate Creation Result + +```go +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + var _ conversion.Hub = resource + + // Verify that the created resource has expected properties + if resource.Status.Endpoint == nil { + return fmt.Errorf("endpoint not populated after creation") + } + + if resource.Status.State == nil || *resource.Status.State != "Active" { + return fmt.Errorf("resource not in expected state after creation") + } + + return nil +} +``` + +### Pattern 5: Record Creation Metadata + +```go +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + var _ conversion.Hub = resource + + // Record when the resource was created + now := metav1.Now() + resource.Status.CreatedAt = &now + + // Record the original configuration + resource.Status.InitialSKU = resource.Spec.SKU + + return nil +} +``` + +## Invocation Timing + +Understanding when this extension runs: + +1. **Resource created in Kubernetes**: User applies YAML +2. **Resource claimed**: Gets Azure resource ID assigned +3. **Initial ARM PUT**: First creation request sent to Azure +4. **ARM returns success**: Resource now exists in Azure +5. **Status updated**: Resource status populated from Azure response +6. **SuccessfulCreationHandler called**: Extension runs **once** +7. **Ready condition set**: Resource marked as Ready (if no errors) + +Subsequent reconciliations do **not** trigger this extension again. + +## Success vs. Update + +It's important to note: +- **Success handler**: Runs once after initial creation +- **Updates**: Do not trigger the success handler +- **Recreation**: If a resource is deleted and recreated, the handler runs again + +## Error Handling + +Errors from the success handler are significant: + +```go +// Success handling failed +return fmt.Errorf("failed to initialize: %w", err) +``` + +If the success handler returns an error: +1. The error is recorded in conditions +2. The Ready condition is not set +3. Reconciliation will retry +4. Handler may be called multiple times until it succeeds + +Make sure your success handler is **idempotent** - safe to call multiple times. + +## Testing + +When testing `SuccessfulCreationHandler` extensions: + +1. **Test successful handling**: Verify handler succeeds with good data +2. **Test with missing data**: Verify error handling for missing status fields +3. **Test ID override**: Verify child resource IDs set correctly +4. **Test idempotency**: Verify multiple calls are safe +5. **Test status modifications**: Verify status fields set correctly + +Example test structure: + +```go +func TestAliasExtension_Success(t *testing.T) { + t.Run("sets child resource ID override", func(t *testing.T) { + // Test successful ID override + }) + + t.Run("returns error when subscription ID missing", func(t *testing.T) { + // Test error when required status field missing + }) + + t.Run("idempotent when called multiple times", func(t *testing.T) { + // Test multiple calls produce same result + }) +} +``` + +Example test implementation: + +```go +func TestAliasExtension_Success(t *testing.T) { + extension := &AliasExtension{} + + t.Run("sets child resource ID", func(t *testing.T) { + subscriptionID := "12345678-1234-1234-1234-123456789012" + alias := &storage.Alias{ + Status: storage.Alias_STATUS{ + Properties: &storage.SubscriptionAliasResponseProperties_STATUS{ + SubscriptionId: &subscriptionID, + }, + }, + } + + err := extension.Success(alias) + + assert.NoError(t, err) + override := genruntime.GetChildResourceIDOverride(alias) + assert.NotNil(t, override) + assert.Contains(t, *override, subscriptionID) + }) + + t.Run("errors when subscription ID missing", func(t *testing.T) { + alias := &storage.Alias{ + Status: storage.Alias_STATUS{ + Properties: &storage.SubscriptionAliasResponseProperties_STATUS{}, + }, + } + + err := extension.Success(alias) + + assert.Error(t, err) + assert.Contains(t, err.Error(), "SubscriptionID") + }) +} +``` + +## Common Scenarios + +Here are typical reasons to use success handlers: + +1. **ID Override**: Setting custom child resource IDs (subscriptions, managed resources) +2. **Derived Status**: Computing status fields from the resource ID or response +3. **One-Time Setup**: Configuration that only makes sense after creation +4. **State Initialization**: Setting initial state based on Azure response +5. **Reference Setup**: Establishing references to related resources + +## Important Notes + +- **Runs once per resource**: After initial creation, not on updates +- **Status is populated**: Resource status contains Azure response data +- **Resource has ID**: Azure resource ID is available via `GetResourceID` +- **Be idempotent**: Handler may be called multiple times if it errors +- **Don't modify spec**: This is for status and metadata only +- **Return errors carefully**: Errors prevent Ready condition +- **Type assert hub**: Catch breaking changes with hub version assertion + +## Idempotency Requirement + +Since the handler might be called multiple times (if errors occur), ensure your handler is idempotent: + +```go +// Good - idempotent +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + + // Set a value (safe to do multiple times) + computed := computeValue(resource) + resource.Status.ComputedField = &computed + + return nil +} + +// Bad - not idempotent +func (ex *ResourceExtension) Success(obj genruntime.ARMMetaObject) error { + resource := obj.(*myservice.MyResource) + + // Append to a list (would grow on each call) + resource.Status.History = append(resource.Status.History, "Created") + + return nil +} +``` + +## Alternative Approaches + +Before implementing a success handler, consider: + +1. **Status fields**: Can the generator populate this from Azure? +2. **Conversion functions**: Can this be handled in conversion? +3. **Controller logic**: Should this be in the generic controller? +4. **ARMResourceModifier**: Would modifying the request work better? + +Use `SuccessfulCreationHandler` when the logic truly requires: +- The resource to exist in Azure first +- One-time execution after creation +- Custom ID handling that can't be generated + +## Related Extension Points + +- [ARMResourceModifier]({{< relref "arm-resource-modifier" >}}): Modifies the creation request +- [PostReconciliationChecker]({{< relref "post-reconciliation-checker" >}}): Runs after every reconciliation +- [KubernetesSecretExporter]({{< relref "kubernetes-secret-exporter" >}}): Also runs after successful creation + +## Best Practices + +1. **Keep it simple**: Only do what's necessary +2. **Validate status**: Check that required status fields are populated +3. **Be idempotent**: Handle multiple calls gracefully +4. **Clear errors**: Return descriptive error messages +5. **Don't call ARM**: This is for processing what Azure returned, not making new requests +6. **Type assert hub**: Catch breaking changes early +7. **Document why**: Comment why the success handler is needed diff --git a/docs/hugo/content/design/ADR-2023-05-Installing-Only-Selected-CRDs.md b/docs/hugo/content/design/ADR-2023-05-Installing-Only-Selected-CRDs.md index 9626cffe539..6bd91221957 100644 --- a/docs/hugo/content/design/ADR-2023-05-Installing-Only-Selected-CRDs.md +++ b/docs/hugo/content/design/ADR-2023-05-Installing-Only-Selected-CRDs.md @@ -140,8 +140,8 @@ Open questions: to enable ASO as an Extension or Addon, as there everything must be done through Helm. For users of Helm itself we likely can just tell them to run a `kubectl` command afterwards to create the `ConfigMap`. That does make the Helm install experience worse though. -* We may be able to create the `ConfigMap` as empty given Helms use of - [3 way strategic merge](https://helm.sh/docs/faq/changes_since_helm2/#improved-upgrade-strategy-3-way-strategic-merge-patches) +* We may be able to create the `ConfigMap` as empty given Helms use of _3 way strategic merge_ + and then patch it afterwards. * Is it OK that _providers_ would be patchable, but you would need to include all the resources when patching something like `microsoft.storage: "StorageAccount,StorageAccountsBlobService"`? We could just disallow individual resource selection diff --git a/v2/pkg/genruntime/extensions/arm_resource_modifier.go b/v2/pkg/genruntime/extensions/arm_resource_modifier.go index 8aed535e075..9fa53798311 100644 --- a/v2/pkg/genruntime/extensions/arm_resource_modifier.go +++ b/v2/pkg/genruntime/extensions/arm_resource_modifier.go @@ -20,9 +20,19 @@ import ( ) // ARMResourceModifier provides a hook allowing resources to modify the payload that will be sent to ARM just before it is sent. +// This extension is invoked during PUT and PATCH operations to ARM, after standard conversion but before the HTTP request. +// Use cases include: handling soft-delete scenarios (e.g., Key Vault), including child resources in parent payloads (e.g., VNET subnets), +// and conditional field population based on Azure state. type ARMResourceModifier interface { // ModifyARMResource takes a genruntime.ARMResource and returns an updated genruntime.ARMResource. The updated resource // is then serialized and sent to ARM in the body of a PUT request. + // ctx is the current operation context. + // armClient allows making additional ARM API calls if needed to determine modifications. + // armObj is the ARM resource representation about to be sent. + // obj is the Kubernetes resource being reconciled. + // kubeClient allows access to the Kubernetes cluster. + // resolver helps resolve resource references. + // log is a logger for the current operation. ModifyARMResource( ctx context.Context, armClient *genericarmclient.GenericClient, diff --git a/v2/pkg/genruntime/extensions/claimer.go b/v2/pkg/genruntime/extensions/claimer.go index 51587926d43..ccb5646578f 100644 --- a/v2/pkg/genruntime/extensions/claimer.go +++ b/v2/pkg/genruntime/extensions/claimer.go @@ -15,9 +15,18 @@ import ( "github.com/Azure/azure-service-operator/v2/pkg/genruntime" ) -// Claimer can be implemented to customize how the reconciler claims a resource +// Claimer can be implemented to customize how the reconciler claims a resource. +// Claiming establishes the link between a Kubernetes resource and its Azure resource by setting the ARM ID. +// Most resources use the default claiming logic. Implement this extension when: +// - The resource's ARM ID doesn't follow standard construction patterns +// - Custom validation is required before claiming +// - Additional operations must be performed during the claim process type Claimer interface { - // Claim claims the resource + // Claim claims the resource by establishing its Azure Resource Manager ID. + // ctx is the current operation context. + // log is a logger for the current operation. + // obj is the Kubernetes resource being claimed. + // next is the default claim implementation - call this to use standard claiming logic. Claim(ctx context.Context, log logr.Logger, obj genruntime.ARMOwnedMetaObject, next ClaimFunc) error } diff --git a/v2/pkg/genruntime/extensions/deleter.go b/v2/pkg/genruntime/extensions/deleter.go index ebbbe914786..e5c3e976a33 100644 --- a/v2/pkg/genruntime/extensions/deleter.go +++ b/v2/pkg/genruntime/extensions/deleter.go @@ -18,9 +18,22 @@ import ( "github.com/Azure/azure-service-operator/v2/pkg/genruntime" ) -// Deleter can be implemented to customize how the reconciler deletes resources +// Deleter can be implemented to customize how the reconciler deletes resources from Azure. +// This extension is invoked when a resource has a deletion timestamp, before the standard ARM DELETE operation. +// Implement this extension when: +// - Pre-deletion operations are required (e.g., canceling subscriptions, disabling features) +// - Multiple API calls are needed for complete deletion +// - Custom error handling is needed during deletion +// - Resources should be preserved in Azure under certain conditions type Deleter interface { - // Delete deletes the resource + // Delete deletes the resource from Azure, with the ability to perform custom logic before, during, or after deletion. + // ctx is the current operation context. + // log is a logger for the current operation. + // resolver helps resolve resource references. + // armClient allows making ARM API calls. + // obj is the Kubernetes resource being deleted. + // next is the default deletion implementation - call this to perform standard ARM DELETE. + // Returns a reconciliation result (e.g., requeue timing) and an error if deletion fails. Delete( ctx context.Context, log logr.Logger, diff --git a/v2/pkg/genruntime/extensions/error_classifier.go b/v2/pkg/genruntime/extensions/error_classifier.go index 58151f81712..e3cf3e8d63b 100644 --- a/v2/pkg/genruntime/extensions/error_classifier.go +++ b/v2/pkg/genruntime/extensions/error_classifier.go @@ -15,13 +15,21 @@ import ( "github.com/Azure/azure-service-operator/v2/pkg/genruntime/core" ) -// ErrorClassifier can be implemented to customize how the reconciler reacts to specific errors +// ErrorClassifier can be implemented to customize how the reconciler reacts to specific errors returned by Azure. +// This extension is invoked whenever an ARM API call returns an error, allowing resources to classify errors +// as retryable or fatal, and to provide better error messages to users. +// Implement this extension when: +// - Resource-specific error codes need special handling +// - Certain errors should be retried that would normally be considered fatal (or vice versa) +// - Error messages need resource-specific clarification +// - Error behavior varies by API version type ErrorClassifier interface { - // ClassifyError evaluates the provided error, returning including whether it is fatal or can be retried. + // ClassifyError evaluates the provided error, returning details including whether it is fatal or can be retried. // cloudError is the error returned from ARM. // apiVersion is the ARM API version used for the request. - // log is a logger than can be used for telemetry. - // next is the next implementation to call. + // log is a logger that can be used for telemetry. + // next is the default classification implementation to call. + // Returns CloudErrorDetails with classification and an error if classification itself fails. ClassifyError( cloudError *genericarmclient.CloudError, apiVersion string, diff --git a/v2/pkg/genruntime/extensions/importer.go b/v2/pkg/genruntime/extensions/importer.go index a5da51a1660..231395d91d3 100644 --- a/v2/pkg/genruntime/extensions/importer.go +++ b/v2/pkg/genruntime/extensions/importer.go @@ -12,12 +12,20 @@ import ( ) // Importer is an optional interface that can be implemented by resource extensions to customize the import process. +// This extension is invoked during 'asoctl import' operations, after retrieving the resource from Azure but before +// writing it to Kubernetes. It allows resources to skip import for system-managed resources, read-only configurations, +// or resources that only have default values. +// Implement this extension when: +// - System-managed or auto-created resources should be excluded from import +// - Default or empty configurations don't need management +// - Resources need validation before allowing import type Importer interface { - // Import allows interception of the import process. + // Import allows interception of the import process to skip or modify resources being imported. // ctx is the current asynchronous context - // resource is the resource being imported. + // rsrc is the resource being imported. // owner is an optional owner for the resource. // next is a function to call to do the actual import. + // Returns an ImportResult indicating success or skip with a reason, and an error if import fails. Import( ctx context.Context, rsrc genruntime.ImportableResource, diff --git a/v2/pkg/genruntime/extensions/postreconciliation_checker.go b/v2/pkg/genruntime/extensions/postreconciliation_checker.go index bdfafed48ab..215292ea6e0 100644 --- a/v2/pkg/genruntime/extensions/postreconciliation_checker.go +++ b/v2/pkg/genruntime/extensions/postreconciliation_checker.go @@ -20,18 +20,24 @@ import ( ) // PostReconciliationChecker is implemented by resources that want to do extra status checks after -// a full ARM reconcile. +// a full ARM reconcile. This extension is invoked after Azure operations succeed but before the Ready +// condition is marked successful, allowing resources to defer readiness until additional conditions are met. +// Implement this extension when: +// - The Azure resource continues initializing after ARM operations complete +// - Manual approval or external processes must complete before the resource is ready +// - Complex validation is needed to determine true readiness type PostReconciliationChecker interface { // PostReconcileCheck does a post-reconcile check to see if the resource is in a state to set 'Ready' condition. // ARM resources should implement this if they need to defer the Ready condition until later. // Returns PostReconcileCheckResultSuccess if the reconciliation is successful. // Returns PostReconcileCheckResultFailure and a human-readable reason if the reconciliation should put a condition on resource. // ctx is the current operation context. - // obj is the resource about to be reconciled. The resource's State will be freshly updated. + // obj is the resource that was reconciled. The resource's status will be freshly updated. // owner is the parent resource of obj. This can be nil in some cases like `ResourceGroups` and `Alias`. - // kubeClient allows access to the cluster for any required queries. + // resourceResolver helps resolve resource references. // armClient allows access to ARM for any required queries. // log is the logger for the current operation. + // next is the default check implementation (usually returns success). PostReconcileCheck( ctx context.Context, obj genruntime.MetaObject, diff --git a/v2/pkg/genruntime/extensions/prereconciliation_checker.go b/v2/pkg/genruntime/extensions/prereconciliation_checker.go index e23ee8773f8..a566dc4e396 100644 --- a/v2/pkg/genruntime/extensions/prereconciliation_checker.go +++ b/v2/pkg/genruntime/extensions/prereconciliation_checker.go @@ -18,17 +18,22 @@ import ( ) // PreReconciliationChecker is implemented by resources that want to do extra checks before proceeding with -// a full ARM reconcile. +// a full ARM reconcile. This extension is invoked before sending any requests to Azure, giving resources +// the ability to block reconciliation until prerequisites are met. +// Implement this extension when: +// - Parent/owner resources must reach certain states before child creation +// - External dependencies must be satisfied before reconciling +// - Reconciliation would fail due to known prerequisites not being met type PreReconciliationChecker interface { // PreReconcileCheck does a pre-reconcile check to see if the resource is in a state that can be reconciled. // ARM resources should implement this to avoid reconciliation attempts that cannot possibly succeed. // Returns ProceedWithReconcile if the reconciliation should go ahead. // Returns BlockReconcile and a human-readable reason if the reconciliation should be skipped. // ctx is the current operation context. - // obj is the resource about to be reconciled. The resource's State will be freshly updated. - // owner is the owner of the resource about to be reconciled. The owner's State will be freshly updated. May be nil + // obj is the resource about to be reconciled. The resource's status will be freshly updated. + // owner is the owner of the resource about to be reconciled. The owner's status will be freshly updated. May be nil // if the resource has no owner, or if it has been referenced via ARMID directly. - // kubeClient allows access to the cluster for any required queries. + // resourceResolver helps resolve resource references. // armClient allows access to ARM for any required queries. // log is the logger for the current operation. // next is the next (nested) implementation to call. diff --git a/v2/pkg/genruntime/extensions/prereconciliation_owner_checker.go b/v2/pkg/genruntime/extensions/prereconciliation_owner_checker.go index 45f7588de8b..0d747d7651f 100644 --- a/v2/pkg/genruntime/extensions/prereconciliation_owner_checker.go +++ b/v2/pkg/genruntime/extensions/prereconciliation_owner_checker.go @@ -17,8 +17,13 @@ import ( "github.com/Azure/azure-service-operator/v2/pkg/genruntime" ) -// PreReconciliationChecker is implemented by resources that want to do extra checks before proceeding with -// a full ARM reconcile. +// PreReconciliationOwnerChecker is implemented by resources that want to check their owner's state before +// proceeding with a full ARM reconcile. This is a specialized variant of PreReconciliationChecker that only +// checks the owner, avoiding any GET operations on the resource itself. +// Implement this extension when: +// - The owner's state can block all access to the resource, including GET operations +// - You need to avoid attempting operations that will fail due to owner state +// - The resource cannot be accessed when its owner is in certain states (e.g., powered off, updating) type PreReconciliationOwnerChecker interface { // PreReconcileOwnerCheck does a pre-reconcile check to see if the owner of a resource is in a state that permits // the resource to be reconciled. For a limited number of resources, the state of their owner can block all access @@ -29,9 +34,9 @@ type PreReconciliationOwnerChecker interface { // Returns ProceedWithReconcile if the reconciliation should go ahead. // Returns BlockReconcile and a human-readable reason if the reconciliation should be skipped. // ctx is the current operation context. - // owner is the owner of the resource about to be reconciled. The owner's State will be freshly updated. May be nil + // owner is the owner of the resource about to be reconciled. The owner's status will be freshly updated. May be nil // if the resource has no owner, or if it has been referenced via ARMID directly. - // kubeClient allows access to the cluster for any required queries. + // resourceResolver helps resolve resource references. // armClient allows access to ARM for any required queries. // log is the logger for the current operation. // next is the next (nested) implementation to call. diff --git a/v2/pkg/genruntime/extensions/successful_resource_modifier.go b/v2/pkg/genruntime/extensions/successful_resource_modifier.go index 4e338cb72ae..297138adb76 100644 --- a/v2/pkg/genruntime/extensions/successful_resource_modifier.go +++ b/v2/pkg/genruntime/extensions/successful_resource_modifier.go @@ -14,9 +14,17 @@ import ( "github.com/Azure/azure-service-operator/v2/pkg/genruntime" ) -// SuccessfulCreationHandler can be implemented to customize the resource upon successful creation +// SuccessfulCreationHandler can be implemented to customize the resource upon successful creation in Azure. +// This extension is invoked once after the initial ARM PUT operation succeeds, giving resources the opportunity +// to perform one-time initialization that depends on the Azure-assigned resource ID. +// Implement this extension when: +// - Resource ID needs custom computation or override after creation +// - Child resources need special ID references set on the parent +// - One-time initialization is required after the resource exists in Azure type SuccessfulCreationHandler interface { - // Success modifies the resource based on a successful creation + // Success performs custom logic after the resource is successfully created in Azure for the first time. + // obj is the resource that was just created, with populated status including the Azure resource ID. + // Returns an error if the success handling fails, which will prevent the Ready condition from being set. Success(obj genruntime.ARMMetaObject) error } diff --git a/v2/pkg/genruntime/kubernetes_secret_exporter.go b/v2/pkg/genruntime/kubernetes_secret_exporter.go index f0d4103102a..5fff9d9f9d7 100644 --- a/v2/pkg/genruntime/kubernetes_secret_exporter.go +++ b/v2/pkg/genruntime/kubernetes_secret_exporter.go @@ -29,12 +29,24 @@ type KubernetesSecretExportResult struct { RawSecrets map[string]string } -// KubernetesSecretExporter defines a resource which can create retrieve secrets from Azure and export them to -// Kubernetes secrets. +// KubernetesSecretExporter defines a resource which can retrieve secrets from Azure and export them to +// Kubernetes secrets. This extension is invoked after a resource has been successfully created or updated +// in Azure, giving resources the ability to make sensitive data available in Kubernetes. +// Implement this extension when: +// - The resource generates secrets in Azure (keys, connection strings, passwords) +// - Secret values are available in the resource status +// - Additional ARM API calls are needed to retrieve secrets type KubernetesSecretExporter interface { - // ExportKubernetesSecrets provides a list of Kubernetes resource for the operator to create once the resource which - // implements this interface is successfully provisioned. This method is invoked once a resource has been - // successfully created in Azure, but before the Ready condition has been marked successful. + // ExportKubernetesSecrets retrieves secrets from Azure and returns Kubernetes Secret objects to be created. + // This method is invoked once a resource has been successfully created or updated in Azure, + // but before the Ready condition has been marked successful. + // ctx is the current operation context. + // obj is the resource that owns the secrets. + // additionalSecrets is a set of secret names to retrieve (for secret expressions). This exists to avoid + // making multiple calls to the secrets API - instead we capture all the secrets we need and then get them. + // armClient allows making ARM API calls to retrieve secrets. + // log is a logger for the current operation. + // Returns a KubernetesSecretExportResult containing Secret objects to create and raw secret values. ExportKubernetesSecrets( ctx context.Context, obj MetaObject,