- Instance Automation Controller
The Instance Automation Controller (instautoctrl package) handles all automation tasks related to Instances, covering four main areas: instance inactivity management, instance expiration handling, instance termination processes, and instance submission workflows.
The package includes four different controllers:
- Instance Inactive Controller
- Instance Expiration Controller
- Instance Termination Controller
- Instance Submission Controller
This controller monitors instances and automates actions based on their inactivity status and lifespan.
The controller understands if the Instance can be declared as Inactive and starts sending notifications to its tenant to inform them to access their Instance resources, otherwise they will be paused (if persistent) or deleted (if not persistent) after a specific period of time defined in the Template resource.
The template introduces a new field called InactivityTimeout, which specifies the period of time during which the Tenant must not access the Instance for it to be considered inactive and eligible for deletion.
This field is always available in the Template resource and, if omitted, it is set to never by default, meaning that Instances of that Template will be ignored by the controller.
The controller begins by retrieving all the active Instances. For each instance, it determines whether it should be monitored or not.
Once it realizes that the instance should be monitored, the controller adds several annotations to it.
These include the crownlabs.polito.it/number-alerts-sent, which tracks the number of notifications sent to inform the tenant that the instance has been idle for some time and will soon be stopped or deleted.
This number ranges from zero up to a maximum limit defined by inactiveTerminationMaxNumberOfAlerts, a custom parameter defined via Helm chart.
This value could be overwritten by the crownlabs.polito.it/custom-number-alerts annotation in the associated Template resource. Another annotation is the crownlabs.polito.it/last-activity, which records the last time the user accessed the instance either via the frontend through the Ingress or via SSH (info available through the SSH bastion tracker).
When a new email notification is triggered, the controller adds the crownlabs.polito.it/last-notification-timestamp to the instance, recording the timestamp of the last sent notification (if the feature is enabled). This annotation is used to determine whether the required interval has elapsed since the previous notification, thereby allowing a new alert to be sent if necessary.
The Helm Chart introduces the inactiveTerminationNotificationInterval parameter, which defines the minimum time interval between two consecutive email notifications.
If the interval has passed, a new email is sent; otherwise, the notification is skipped.
Next, the controller checks whether the instance is inactive by comparing its last activity timestamp with the InactivityTimeout value specified in the Template.
If the instance is found to be inactive (meaning the remaining time is zero or less), a series of notification emails are sent to the instance owner.
Once the number of alerts reaches a configurable threshold, either defined via inactiveTerminationMaxNumberOfAlerts in the Chart or via crownlabs.polito.it/custom-number-alerts annotation, CrownLabs will take action by either stopping the instance if it is persistent, or deleting it if it is non-persistent.
On the other hand, if the instance is still active (the remaining time is greater than zero), the controller evaluates the remaining time and reschedules the inactivity check when it expires (a one-minute margin is added to the timer to be sure the timer is actually expired).
Finally, if the instance has been paused and the user restarts it, the crownlabs.polito.it/number-alerts-sent annotation is reset and the crownlabs.polito.it/last-activity annotation is updated.
The controller then evaluates the new remaining time, and the entire monitoring process begins again.
This mechanism relies on the crownlabs.polito.it/last-running annotation to detect if the instance has been restarted after being paused.
The controller focuses on one point: understanding if the Instance is being used (and it should not be deleted) or it is not being used (and it should be deleted). An Instance can be accessed by the Crownlabs Frontend or via SSH. The controller uses Prometheus to do this check:
- It uses Nginx metrics to verify the last access to the Frontend
- It uses a custom metric (called bastion_ssh_connections) to monitor the SSH accesses. Read here for more info on how SSH connections are monitored.
Note: a single query on Prometheus cannot return more than 11000 data points. In order to cover all the scenarios, a new parameter queryStep has been defined in the Helm Chart to modify the query resolution (query step), based on the InactivityTimeout selected.
After this check, the crownlabs.polito.it/last-activity is updated with the most recent timestamp.
If the last access is above the max threshold (defined with the inactivityTimeout field in the Template resource), the Instance is declared as inactive and (if enabled) email notifications start to be sent at regular interval -inactiveTerminationNotificationInterval parameter in the Helm chart.
After the maximum time of notifications, the Instance is stopped.
The InstanceInactiveTerminationReconciler is set to watch and react to events related to the following resources in an efficient way:
- Instances: if an Instance has been stopped and the user restart is, the reconciler on that Instance must be triggered again to restart the monitoring process. There is a predicate filter (instanceTriggered) to let the reconciler reschedule the Instance.
- Templates: if the
inactivityTimeoutis set or modified in a template, the associated instances must be reconciled to recalculate the remaining time of the associated instances. - Namespaces: if a
Namespaceis set to be monitored (annotation crownlabs.polito.it/instance-inactivity-ignore != true), all the Instance of thatNamespacemust be reconciled to evaluate the remaining time of the instance. There is a predicate filter (called inactivityIgnoreNamespace) to let the reconciler reschedule the Instance if a newNamespacehas to be checked.
- crownlabs.polito.it/instance-inactivity-ignore:
Namespacelabel used to ignore the inactivity termination for all the Instances of the entireNamespace. Default value (if omitted) isfalse. - crownlabs.polito.it/number-alerts-sent: Instance annotation that stores the number of email notifications sent to the
Tenant. - crownlabs.polito.it/last-notification-timestamp: Instance annotation that stores the timestamp of the last email notification sent to the
Tenant. - crownlabs.polito.it/last-running: Instance annotation that stores the previous value of the Running field of the Instance. It is used to check whether the
Instanceshave been restarted after being paused. - crownlabs.polito.it/custom-number-alerts: Template annotation that stores the override the default
InstanceMaxNumberOfAlertsin the InstanceInactiveTerminationReconciler for a specific template.
This controller verifies whether the instance has exceeded its maximum lifespan, as defined by the DeleteAfter field in the associated Template resource. If exceeded, the instance and its related resources are deleted.
The controller retrieves all the active Instances and fetches the related Template resource. Based on the DeleteAfter field of the Template, the maximum lifespan of each Instance is determined.
When omitted, this value is set to never, meaning the Instance is not scheduled for termination. However, it can be configured with a time interval representing durations in minutes, hours, or days.
Once the instance lifespan expires, the controller sends a warning email to the tenant informing them that their Instance will be deleted soon.
The controller adds a new crownlabs.polito.it/expiring-warning-notification-timestamp annotation to store the timestamp of the warning notification. This annotation is used to determine whether the required interval has elapsed since the warning notification, thereby allowing the deletion of the Instance if necessary.
After the notificationInterval time has passed since the warning, the controller proceeds to delete the Instance and sends a second email to the tenant confirming that the Instance has been deleted.
The InstanceExpirationReconciler is set to watch and react to events related to the following resources in an efficient way:
- Instances: if an Instance has been stopped and the user restart is, the reconciler on that Instance must be triggered again to restart the monitoring process. There is a predicate filter (instanceTriggered) to let the reconciler reschedule the Instance.
- Templates: if the
deleteAftervalue is set or modified in a template, the associated instances must be reconciled to recalculate the remaining time of the associated instances. There is a predicate filter (deleteAfterChanged) to let the reconciler reschedule the Instance to update the new remaining time. - Namespaces: if a
Namespaceis set to be monitored (ExpirationIgnoreNamespace != true), all the Instance of thatNamespacemust be reconciled to evaluate the remaining time of the instance. There is a predicate filter (called expirationIgnoreNamespace) to let the reconciler reschedule the Instance if a newNamespacehas to be checked.
- crownlabs.polito.it/expiration-ignore:
Namespacelabel used to ignore the expiration for all the Instances of the entireNamespace. Default value (if omitted) isfalse. - crownlabs.polito.it/expiring-warning-notification-timestamp: Instance annotation that stores the timestamp of the warning notification sent to the
Tenant. If it is present, it means that the warning notification has already been sent, therefore the Instance is ready to be deleted after the notification interval.
This controller specifically focuses on instance termination in exam scenarios. It first verifies whether the instance’s public endpoint is still responding by performing an HTTP check. If the endpoint is found to be unreachable, the controller proceeds to initiate the termination process for that instance.
This controller automates exam submission workflows by creating a ZIP archive of the instance’s persistent volume, which contains the VM disk. Once the archive is created, it is uploaded to a configured submission endpoint. This process is used during exams to collect student submissions in a reproducible and traceable way, ensuring consistency and accountability.
The Instance Automation Controller is deployed together with the the Instance Operator as a secondary deployment. The Helm chart for the Instance Operator has been updated to include the deployment of the new Instance Automation controller.
Main controller parameters:
- mailTemplateDir: path to the directory containing the crownmail templates.
- mailConfigDir: path to the directory containing the crownmail configuration files.
Main automation parameters:
- enableInstanceSubmission: flag to enable the Instance Submission controller.
- enableInstanceTermination: flag to enable the Instance Termination controller.
- enableInstanceInactiveTermination: flag to enable the Instance Inactive Termination controller.
- enableInstanceExpiration: flag to enable the Instance Expiration controller.
- inactiveTerminationMaxNumberOfAlerts: maximum number of email notifications to send to the Tenant before deleting/pausing the Instance.
- enableInactivityNotifications: flag to enable the notification for the Instance Inactive Termination controller.
- enableExpirationNotifications: flag to enable the notification for the Instance Expiration Termination controller.
- inactiveTerminationNotificationInterval: time interval between two consecutive email notifications.
- expirationNotificationInterval: time interval between the warning notification and the Instance deletion.
- marginTime: margin time (in minutes) added when calculating the remaining time for inactivity and expiration checks.
Main monitoring parameters:
- prometheusURL: URL of the Prometheus service in the cluster.
- queryNginxAvailable: query to verify if the external ingress is available and is correctly collecting metrics.
- queryBastionSSHAvailable: query to verify if the custom SSH bastion tracker is available and is correctly collecting metrics.
- queryWebSSHAvailable: query to verify if the WebSSH (SSH through a new browser terminal) metrics are available.
- queryNginxData: query to retrieve info about an Instance access through frontend.
- queryBastionSSHData: query to retrieve info about an Instance access through SSH.
- queryWebSSHData: query to retrieve info about an Instance access through WebSSH.
- queryStep: step to use in the Prometheus query to retrieve data.