-
Notifications
You must be signed in to change notification settings - Fork 91
Description
Description
We are encountering issues with the execution of the Netcentric AC Tool in our Kubernetes environment(AEMaaCS). The execution does not complete successfully due to the routine stopping and restarting of pods, which interferes with the AC Tool’s process.
After an investigation with Adobe Support, it was identified that the AC Tool saves a "hash" property with every execution—regardless of whether it succeeds. This prevents an automatic re-run when a pod is restarted, making it difficult to achieve a full execution, which typically takes around 2.5 hours for our setup. As a temporary workaround, we have been manually re-triggering executions via the AC Tool UI in smaller batches. However, this is not a scalable solution.
Issue Summary:
- The AC Tool execution is interrupted due to pod restarts in Kubernetes.
- A "hash" property is stored even when execution fails, preventing automatic retries.
- The execution time for our use case(~2.5 hours) increases the likelihood of interruptions, making completion difficult.
- In case the execution is interrupted, execution logs are not created under /var/statistics/achistory
- High repository query load is also impacting performance, particularly through repeated calls to
UserProvider.getAuthorizable.
Request for Assistance:
We would like your assistance in finding a long-term solution to make the AC Tool execution more resilient in a Kubernetes environment. Based on our investigation with Adobe, potential improvements could include:
-
Enhancing the Execution Model
- Implementing an asynchronous execution model with checkpointing.
- Allowing execution of one YAML file at a time and only storing the hash after successful completion.
- Ensuring that only the current task needs to be redone if a pod restarts mid-process.
-
Optimizing Repository Query Load
- Reducing excessive repository queries, particularly redundant calls to
UserProvider.getAuthorizable. Analysis of thread dumps during an AC Tool execution revealed that the tool is triggering a large number of repository queries—especially through repeated calls to UserProvider.getAuthorizable. Although some caching exists, the cache is not updated for newly created authorizables, leading to an excessive read-load. - Improving caching mechanisms to avoid repeated queries for newly created authorizables.
- Modifying the tool’s logic to handle duplicate object creation via exception handling, similar to Jackrabbit Oak’s
UserManager#createGroup. The tool seems to use additional get calls to avoid duplicate object creation, rather than handling exceptions that might arise if an object already exists (as the Jackrabbit Oak’s UserManager#createGroup method does). Adjusting this approach to catch exceptions could reduce the number of repository queries and improve overall performance.
- Reducing excessive repository queries, particularly redundant calls to
Next Steps:
We would appreciate your input on the feasibility of these recommendations and any additional steps we can take to improve AC Tool execution stability.
Please let us know how we can collaborate on this effort and if further details are needed.
Thank you for your support.
Best Regards,
Arshan Beig