-
Notifications
You must be signed in to change notification settings - Fork 11
feat(automated-analysis): implement batch processing of archives #859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(automated-analysis): implement batch processing of archives #859
Conversation
5348d49 to
0f93ec1
Compare
|
This PR/issue depends on:
|
e09072a to
657bf31
Compare
|
/build_test |
|
Workflow started at 4/1/2025, 11:52:48 AM. View Actions Run. |
|
Oh, that's going to fail because it needs cryostatio/cryostat-core#529 . |
|
CI build and push: At least one test failed ❌ |
10d04e8 to
bbfb245
Compare
bbfb245 to
594d159
Compare
|
/build_test |
|
Workflow started at 4/11/2025, 9:19:58 AM. View Actions Run. |
|
No GraphQL schema changes detected. |
|
No OpenAPI schema changes detected. |
|
CI build and push: All tests pass ✅ |
Welcome to Cryostat! 👋
Before contributing, make sure you have:
mainbranch[chore, ci, docs, feat, fix, test]To recreate commits with GPG signature
git fetch upstream && git rebase --force --gpg-sign upstream/mainFixes #842
Related to #41Related to #66Related to #71Related to cryostatio/cryostat-core#528
Depends on cryostatio/cryostat-core#529
See also #856 which should improve the performance (lower overhead) of this feature
Description of the change:
1. Implements soft-deletion of Target records. When deleted, rather than the actual record being physically deleted from the database, adeletedflag is simply set. This is using a Hibernate feature, so other HQL queries etc. will automatically ignore these soft-deleted entities. Using native queries we can query the table and ignore thedeletedflag, or even query for targets that explicitly have been deleted. This allows the data model to retain all of the information we had about what the target was, even after goes offline or is otherwise lost from discovery. This has been a sore point in the API and data model for a long time, but it's particularly relevant here because we may be getting eligible (correctly labelled, or whatever) archived recordings pushed to us from Agent instances that are in the process of shutting down. If we hard delete Targets instead of soft deleting them, then we always have a race condition between batch processing grabbing the Target record to preserve what information we have about the origin of the recording, and the Target deregistering itself or otherwise disappearing. By preserving the Target record and only soft deleting it when the Target disappears we can resolve the race condition, so that it doesn't matter when the batch process starts or finishes - we can always retrieve information about what the source Target is/was.2. Listens for archive recordings to be created which have the
autoanalyze=truemetadata label. When such an archived recording is created, Cryostat will submit that recording for automated analysis immediately. When the report is created another notification is emitted, which the newAnalysisReportAggregatorlistens for and uses to store the final report in its cache. The cache has an on-write expiry policy and will also invalidate entries when the associatedTargetis lost (see #860 - once we have soft deletion then maybe this should change?)3. The
AnalysisReportAggregatorcollects one report "document" per Target after batch processing and exposes the aggregated reports as Prometheus-style metrics (see below). The newGET /api/v4/metrics/reportsandGET /api/v4/metrics/reports/{jvmId}endpoints here can be used to query this.4. The
Reportsclass adds a newGET /api/v4/targets/{targetId}/reportsPOST /api/v4/targets/{targetId}/reports?clean=trueendpoint pair. TheGETsimply returns the latest cached result from theAnalysisReportAggregator, but in plain JSON format instead of the Prometheus-style scraping format. ThePOSTcan be used to trigger report generation on the specified Target with a single request - Cryostat processes these requests by creating a new JFR Snapshot recording with theautoanalyze=truelabel and immediately archiving it, which will trigger the report generation as described above. If thePOSThasclean=truequery parameter (which is the default) then the snapshot will be deleted once it has been copied to archives. This allows theAnalysisReportAggregatorcache to be updated immediately using whatever JFR data is currently available in the target, without the client needing to specifically create a new recording with a particular label etc. - but it does require that some prior recording(s) exists so that there is JFR data available to be snapshotted.5. Refactors the
LongRunningRequestGeneratorso that the EventBus messages sent to it can have a direct response back to the sender, as well as the general broadcast notification. TheAnalysisReportAggregatoruses this facility to request the automated analysis report and directly receive the report document back, without needing to listen for an out-of-band broadcast notification.6. Adds a GraphQL subquery to the Target objects that allows retrieval of the current Report from the aggregator, if any, with server-side aggregation for the count of evaluated rules and the maximum result score. This is used to power the Topology view decorator added in cryostatio/cryostat-web#1589 .
Analysis result names are used as metric keys. The discovery node owner chain (excluding
Universewhich is a common root for all nodes/targets) is also embedded in the metrics labels, along with the unique JVM hash ID. The analysis result score is the metric value.Current issues:
Soft deletion is a significant change to the underlying data model, which should really have been done back in 3.0. Various behaviours will need to be adjusted - for example, connectUrls are no longer strictly unique (see [Epic] Redefine Target data model to allow multiple Connection URLs #71) since a soft-deleted target may have a connectUrl that a user wants to reuse, either for the same target instance or one at the same resolved network location. Maybe this change needs to be split out and worked on separately.Motivation for the change:
See #842
How to manually test:
./mvnw clean installitcd src/main/webuiand check out feat(autoanalyze): add controls to enable archived recording autoanalysis cryostat-web#1589../mvnw -o clean package./smoketest.bash -O -t quarkus-cryostat-agentTest the following features:
autoanalyze=truelabel. Archiving this recording should cause a report generation notification to appear (in the notification drawer) after a short time.