[JENKINS-76317]: Add support to define maintenance windows for Clouds#320
[JENKINS-76317]: Add support to define maintenance windows for Clouds#320prathamalwayscomeslast wants to merge 25 commits intojenkinsci:mainfrom
Conversation
…pport clouds. - Added `MaintenanceTarget.TargetType.CLOUD` + `toKey()` serialization. - Made `MaintenanceAction` more generic to support `Cloud` - Extending bulk addition of `MaintenanceWindow`s to `Cloud` - UI coverage: Replicated existing agent specific UI for clouds Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
|
I'm open to any suggestions about change in the approach. Unit tests will be added in a day or two. My first idea was to make |
|
Some considerations: I would prefer when the cloud maintenance windows on the MaintenanceLink page are in a separate table. Also there is an add button that will add new maintenance windows where you can use label expressions. That will not work for clouds in that way. While clouds can react on labels I think right now a cloud should be taken offline completely. |
| <l:layout title="Edit Maintenances" norefresh="true"> | ||
| <j:if test="${it.isAgent()}"> | ||
| <st:include it="${it.getAgentComputer()}" page="sidepanel.jelly"/> | ||
| </j:if> |
There was a problem hiding this comment.
clouds also have a sidepanel so that should be included.
There was a problem hiding this comment.
Okay I have included sidepanel to cloud too. There is an issue here though. When I go to maitenance window on cloud's side panel, it takes me to cloud/cloud-name/maintenanceWindow and then the sidepanel is broken. Now if I go to configure from sidepanel, it directs me to cloud/cloud-name/maintenanceWindow/configure which is the wrong API. Even the status button is now cloud/cloud-name/maintenanceWindow. This only happens with the cloud's sidepanel, the agent's one is working fine. I'm having a hard time finding where this url discrepancy is coming from
There was a problem hiding this comment.
That is a bug in the sidepanel.jelly at https://github.com/jenkinsci/jenkins/blob/474f2aa295b1dcfe7578806bcf9ee0d0c610055b/core/src/main/resources/hudson/slaves/Cloud/sidepanel.jelly#L30 and L31. Here it uses a relative url, while it should use an absolute url. I opened jenkinsci/jenkins#26002 for this
And in the agent sidepanel is also a bug when it comes to deleting the agent. There as well it uses a relative url.
src/main/resources/com/sap/prd/jenkins/plugins/agent_maintenance/MaintenanceLink/index.jelly
Outdated
Show resolved
Hide resolved
- Implemented `CloudMaintenanceQueueBlocker` instead of `ProvisioningListener` as it was found to be less reliable. - Separate sections for cloud and agent maintenance windows in `MaintenanceLink` Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
For now, I hide the agent-specific fields for the cloud context. To not persist them, an optimal solution would be to make an abstract base class for MaintenanceWindow but then that would require tons of refactoring everywhere. MaintenanceLink table separation is done. I also ran into a bunch of problems implementing the ProvisioningListener logic. The code of that does look correct to me, and I tried tweaking it to make it work in anyway, but the builds still ran on the docker cloud even with active maintenance window. So finally, I changed the approach and blocked the queue itself during active maintenance window - |
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Changes: - Implemented separate form submission paths: JSON for agents, URL params for clouds - Updated doAdd() to handle both agent (JSON) and cloud (URL parameters) submissions - Added cloud maintenance modal with multi select dropdown for cloud selection Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
src/main/resources/com/sap/prd/jenkins/plugins/agent_maintenance/agent-maintenance.js
Show resolved
Hide resolved
|
The queue approach doesn't work well. |
|
With the mock-agent plugin that can also create mock clouds, the approach with the CloudProvisioningListener works fine |
Can you try this:
I have done this and what I see is that the docker cloud still provisions an agent for the freestyle job to run. You can also verify it from the console output. Why does that happen? I have an active maintenance window and |
I was thinking that we should iterate through all the nodes having the label, whichever cloud provisioned node has maintenance window on their respective cloud, it should skip to the next node with that label. But that doesn't seem possible as there's no way to extract |
- Followed the same logic for maintenance windows; splitting into agent and cloud specific jelly files - Reverted back to `CloudMaintenanceProvisioningListener` Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
- Complete refactoring of `MaintenanceActionTest` to include clouds - New: `MaintenanceTargetTest` - Few more improvements in the core Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
|
Hey @mawinter69. The cloud maintenance feature is 95% complete, with most of the unit tests concerning it. The only things left are resolving some tests, doc changes and checkstyle improvements. So with all that aside, is there something I might have missed? Something I should've done differently? I'm open to any suggestions or changing my approach |
|
Also this issue was great to work on, taught me a lot about Jenkins internals and how the plugins work, as a beginner. This was my second opensource contribution to any project. If there's anywhere you'd want help, some big update like this, I'd love to help! |
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
… new documentation for Clouds. Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
|
Heyy there @mawinter69. Almost everything is done now. Unit tests and documentation too. There's some issues in |
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
mawinter69
left a comment
There was a problem hiding this comment.
I think we that the actions should be separated. There are a lot of if necessary that make the code hard to understand.
e.g. the jelly of MaintenanceLink will then more or less just be adding the stuff for clouds
When a maintenance window for a cloud is over it should automatically be removed like it is done for agents.
The displayname for the MaintenanceLink is still Agent Maintenances. That should include now also clouds. Maybe that can be made dynamic so in case there are no clouds defined it says Agent Maintenances and with clouds it says Agent and Cloud Maintenances
|
|
||
| <td> | ||
| <f:checkbox class="am__checkbox" name="${mid}"/> | ||
| <j:if test="${action.isAgent()}"> |
There was a problem hiding this comment.
This distinction is not necessary. The action is guaranteed to be an agent
| <j:if test="${action.isAgent()}"> | ||
| <l:icon src="${action.getAgentComputer().iconClassName}" class="icon-sm am__table-icon"/> | ||
| </j:if> | ||
| <j:if test="${action.isCloud()}"> |
There was a problem hiding this comment.
remove this if as it is always false
|
|
||
| <td> | ||
| <nobr> | ||
| <j:if test="${action.isAgent()}"> |
| ${action.getAgentComputer().displayName} | ||
| </a> | ||
| </j:if> | ||
| <j:if test="${action.isCloud()}"> |
| </td> | ||
| <td>${m.userid}</td> | ||
|
|
||
| <j:if test="${action.isAgent()}"> |
| <td>${m.reason}</td> | ||
| <td>${m.userid}</td> | ||
|
|
||
| <p:hasAnyPermission it="${app}" permissions="Jenkins.ADMINISTER"> |
There was a problem hiding this comment.
| <p:hasAnyPermission it="${app}" permissions="Jenkins.ADMINISTER"> | |
| <l:hasPermission it="${app}" permissions="${app.ADMINISTER}"> |
| data-message="${%deleteMaintenanceOf} ${action.target.toKey()}" | ||
| data-message-success="${%Maintenance window was successfully deleted}"> | ||
| <l:icon src="symbol-trash-outline plugin-ionicons-api" class="icon-sm icon-red am__table-icon" | ||
| tooltip="Delete this maintenance window for ${h.escape(action.target.toKey())}"/> |
| </td> | ||
| <td class="delete"> | ||
| <div class="am__link-delete" | ||
| data-message="${%deleteMaintenanceOf} ${action.target.toKey()}" |
There was a problem hiding this comment.
| data-message="${%deleteMaintenanceOf} ${action.target.toKey()}" | |
| data-message="${%deleteMaintenanceOf} ${c.displayName}" |
| <button id="add-button-clouds" type="button" | ||
| class="jenkins-button jenkins-button--primary">${%Add}</button> | ||
| <button id="delete-selected-clouds" type="button" | ||
| class="jenkins-button je nkins-button--primary ${cloudMwCount==0?'jenkins-hidden':''} delete-selected-button" |
There was a problem hiding this comment.
typo
| class="jenkins-button je nkins-button--primary ${cloudMwCount==0?'jenkins-hidden':''} delete-selected-button" | |
| class="jenkins-button jenkins-button--primary ${cloudMwCount==0?'jenkins-hidden':''} delete-selected-button" |
| </td> | ||
| <td class="delete"> | ||
| <div class="am__link-delete" | ||
| data-message="${%deleteMaintenanceOf} ${action.target.toKey()}" |
There was a problem hiding this comment.
The text is wrong, the dialog then contains agent instead of cloud
- Separated agent and cloud actions for `MaintenanceLink` - Removed redundant if statements - Improved docs Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
|
Okay i fixed everything, removed unnecessary if statements, made displayName more dynamic for agents and clouds, maintenance window clean up in clouds just like agents, separated agent and cloud actions. However, I restored Also, I added clean up of cloud maintenance windows when they finish. It happens when the cloud is provisioning a node. Does it need to be a periodic check like we do in |
mawinter69
left a comment
There was a problem hiding this comment.
We have one principal problem. There is no guarantee that the cloud name is unique. While the UI for creating a new cloud initially prevents from creating duplicate clouds, some clouds still overwrite that. e.g. JClouds plugin in the actual config page allows to define a profile and passes this to the super class as the cloud name. In the UI it checks if the name already exists and properly shows a validation error. But you can still save it with this name. In the cloud overview you then see 2 clouds with that name but the second cloud has a different url http://localhost:8080/jenkins/manage/cloud/cloudByIndex/12/. As currently you connect the maintenance windows via the cloud name it would be impossible to determine if a maintenance window is meant for the first or second cloud.
I found following clouds where it is possible to create duplicate names:
- https://plugins.jenkins.io/oracle-cloud-infrastructure-compute/
- https://plugins.jenkins.io/openstack-cloud/
- https://plugins.jenkins.io/jclouds-jenkins/
- https://plugins.jenkins.io/macstadium-orka/
- https://plugins.jenkins.io/hetzner-cloud/
- https://plugins.jenkins.io/google-compute-engine/
And probably more.
Jenkins core solves that by creating a url with an index, but the problem we have here is that we shouldn't rely on the index as it might change between restarts (e.g. some uses config as code and added a new cloud with same name).
I think for clouds we need to persist the action as part of the cloud itself, Cloud is actionable. So in the factory, first check if the cloud already has an action, if not then create it and add it to the cloud via Cloud.addAction. This will persist the action together with the cloud in the config.xml of Jenkins.
I think that requires a separate action for clouds where all the maintenance windows are stored directly in the action. That also means an own MaintenanceWindow implementation I think
mawinter69
left a comment
There was a problem hiding this comment.
I haven't analyzed yet but deleting a maintenance window from the MaintenanceLink page by clicking on the trash can is not working. It shows the wrong text in the dialog and then fails with an exception
2026-01-24 17:19:10.466+0000 [id=171] WARNING h.i.i.InstallUncaughtExceptionHandler#handleException: Caught unhandled exception with ID 38a26465-5d98-4b56-8a8c-25b2fc30134d
java.lang.IllegalArgumentException: Invalid maintenance target key: null
at PluginClassLoader for agent-maintenance//com.sap.prd.jenkins.plugins.agent_maintenance.MaintenanceTarget.fromKey(MaintenanceTarget.java:42)
at PluginClassLoader for agent-maintenance//com.sap.prd.jenkins.plugins.agent_maintenance.MaintenanceLink.hasPermission(MaintenanceLink.java:291)
at PluginClassLoader for agent-maintenance//com.sap.prd.jenkins.plugins.agent_maintenance.MaintenanceLink.deleteMaintenance(MaintenanceLink.java:225)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733)
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:484)
at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:497)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:218)
at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:140)
| <j:if test="${!action.hasCloudDeletePermission()}"> | ||
| <td/> | ||
| <td/> | ||
| </j:if>> |
| </div> | ||
| </td> | ||
| </p:hasAnyPermission> | ||
| </tr> |
There was a problem hiding this comment.
missing p:hasNoPermission to insert 2 <td/>
| Jenkins.MANAGE.setEnabled(true); | ||
| Jenkins.SYSTEM_READ.setEnabled(true); | ||
| Computer.EXTENDED_READ.setEnabled(true); | ||
| Computer.CONFIGURE.setEnabled(true); |
There was a problem hiding this comment.
These 2 permissions are enabled by default. No need to set them
| // System read and computer configure on agent, but not on agentRestricted | ||
| matrixAuth.add(Jenkins.READ, configure); | ||
| matrixAuth.add(Jenkins.SYSTEM_READ, configure); | ||
| matrixAuth.add(Computer.CONFIGURE, configure); |
There was a problem hiding this comment.
remove that added line. This is the cause for the test failures. The permissions are added via a node property in BasePermissionChecks just for one agent. But here you grant that globally.
| // System read and computer configure on agent, but not on agentRestricted | ||
| matrixAuth.add(Jenkins.READ, disconnect); | ||
| matrixAuth.add(Jenkins.SYSTEM_READ, disconnect); | ||
| matrixAuth.add(Computer.DISCONNECT, disconnect); |
There was a problem hiding this comment.
remove that added line. This is the cause for the test failures. The permissions are added via a node property in BasePermissionChecks just for one agent. But here you grant that globally.
| || computer.hasPermission(Computer.EXTENDED_READ)) | ||
| && computer.getNode() != null; | ||
| } | ||
| return Jenkins.get().hasPermission(Jenkins.ADMINISTER); |
There was a problem hiding this comment.
There is the permission Jenkins.SYSTEM_READ This permission is not enabled by default. It allows a read only view to the global configuration. Clouds are visible with this permission so we also need to support that. So in all places replace ADMINISTER with SYSTEM_READ except when it comes to showing delete buttons or in methods that actually delete something. I.e. with SYSTEM_READ I should be able to see cloud maintenance windows but I should not be able to delete or modify them
|
I get duplicate cloud name concern. Persisting actions of each cloud sounds okay but how about this - We generate unique id and assign it to each cloud in the config file. Then, that id will be used to deal with maintenance windows. How I'd want these configurations to persist: Currently we store maintenance windows as I believe this would be an optimal solution as it would avoid branching from the already working unified architecture that supports agents and clouds both. |
|
@mawinter69 Is Jenkins.get().save() an expensive call? |
- New: `CloudUuidAction` - a minimal action class just for containing the duplicate cloud's UUID. - `CloudUuidStore` - helper for dealing with duplicate clouds. - New `targetKey` format for duplicate clouds - CLOUD:cloud-name:cloud-uuid. While non duplicate clouds' format remains as is. - Saving maintenance-config.xml UUID directory if the cloud is found to have duplicates. - Fixed deletion bugs in `MaintenanceLink`. Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
|
This seemed easier at first but was a pain to work on. At first I thought UUID is a solid enough distinction for same named clouds. So maybe store key value pairs in some .properties file? But that was not enough to know which exact cloud the uuid belongs to, leaving room for ambiguity. So the best stable distinction would be sha-256 hash of the cloud's configuration along with the cloud's name. Nothing can beat this, right? But configurations keep changing all the time, so that means we'd have to keep track of that and keep our hash in sync with the current config. I knew I was over complicating things, turns out the solution was much simpler! Saving the action with duplicate cloud's UUID under clouds in the global config file. That was it. Some edge cases may not be covered in this change, and the related unit tests with this will be added after some time if that's fine. Permission improvements too (I am thinking of a centralized PermissionHelper class migrating all permission logic there) |
|
Cloud name duplicacy shouldn't be allowed. Even the documentation in |
|
I already opened an issue in core that the handling of clouds is broken. There is an open PR jenkinsci/jenkins#26233 that addresses the problem by introducing a UUID for each cloud. The main problem with the duplicates is that the handling in the UI used an index, but that index is not stable and that leads to weird problems. I think my idea earlier to directly save the maintenance windows as part of the action will not work. Looking at the code of the Cloud class it will create a new instance upon saving the configuration and is then not taking over any added actions from the old to the new instance. So the there is no need to call |
- Fixed deletion bugs in `MaintenanceLink`. Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
…f all `MaintenanceTarget`s Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Merging the permission manager changes to the working branch and also reverting the UUID logic.
|
Sounds good. Permission checking was a lil haphazard so I added a centralized |
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
…w: `CloudUuidAction` - a minimal action class just for containing the duplicate cloud's UUID. - `CloudUuidStore` - helper for dealing with duplicate clouds. - New `targetKey` format for duplicate clouds - CLOUD:cloud-name:cloud-uuid. While non duplicate clouds' format remains as is. - Saving maintenance-config.xml UUID directory if the cloud is found to have duplicates. - Fixed deletion bugs in `MaintenanceLink`." This reverts commit 1b8ba9d
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>
Signed-off-by: Pratham Vaghela <prathamcomeslast@gmail.com>

Comprehensive solution for JENKINS-76317 i.e. adding cloud maintenance windows.
fixes #337
Implementation details
MaintenanceTargeta general model class that represents aTargetType(AGENT, CLOUD)TargetType+targetNameserialization to identify targetsMaintenanceLinkandMaintenanceActionmore generic to allow extension forCloudMaintenanceLinkTesting done
mvn clean installand then ran withmvn hpi:run(skipped tests for now)Submitter checklist