Skip to content

Conversation

@andrewazores
Copy link
Member

@andrewazores andrewazores commented Mar 24, 2025

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes #842
Related to #41
Related to #66
Related to #71
Related to cryostatio/cryostat-core#528
Depends on cryostatio/cryostat-core#529
See also #856 which should improve the performance (lower overhead) of this feature

Description of the change:

1. Implements soft-deletion of Target records. When deleted, rather than the actual record being physically deleted from the database, a deleted flag is simply set. This is using a Hibernate feature, so other HQL queries etc. will automatically ignore these soft-deleted entities. Using native queries we can query the table and ignore the deleted flag, or even query for targets that explicitly have been deleted. This allows the data model to retain all of the information we had about what the target was, even after goes offline or is otherwise lost from discovery. This has been a sore point in the API and data model for a long time, but it's particularly relevant here because we may be getting eligible (correctly labelled, or whatever) archived recordings pushed to us from Agent instances that are in the process of shutting down. If we hard delete Targets instead of soft deleting them, then we always have a race condition between batch processing grabbing the Target record to preserve what information we have about the origin of the recording, and the Target deregistering itself or otherwise disappearing. By preserving the Target record and only soft deleting it when the Target disappears we can resolve the race condition, so that it doesn't matter when the batch process starts or finishes - we can always retrieve information about what the source Target is/was.
2. Listens for archive recordings to be created which have the autoanalyze=true metadata label. When such an archived recording is created, Cryostat will submit that recording for automated analysis immediately. When the report is created another notification is emitted, which the new AnalysisReportAggregator listens for and uses to store the final report in its cache. The cache has an on-write expiry policy and will also invalidate entries when the associated Target is lost (see #860 - once we have soft deletion then maybe this should change?)
3. The AnalysisReportAggregator collects one report "document" per Target after batch processing and exposes the aggregated reports as Prometheus-style metrics (see below). The new GET /api/v4/metrics/reports and GET /api/v4/metrics/reports/{jvmId} endpoints here can be used to query this.
4. The Reports class adds a new GET /api/v4/targets/{targetId}/reports POST /api/v4/targets/{targetId}/reports?clean=true endpoint pair. The GET simply returns the latest cached result from the AnalysisReportAggregator, but in plain JSON format instead of the Prometheus-style scraping format. The POST can be used to trigger report generation on the specified Target with a single request - Cryostat processes these requests by creating a new JFR Snapshot recording with the autoanalyze=true label and immediately archiving it, which will trigger the report generation as described above. If the POST has clean=true query parameter (which is the default) then the snapshot will be deleted once it has been copied to archives. This allows the AnalysisReportAggregator cache to be updated immediately using whatever JFR data is currently available in the target, without the client needing to specifically create a new recording with a particular label etc. - but it does require that some prior recording(s) exists so that there is JFR data available to be snapshotted.
5. Refactors the LongRunningRequestGenerator so that the EventBus messages sent to it can have a direct response back to the sender, as well as the general broadcast notification. The AnalysisReportAggregator uses this facility to request the automated analysis report and directly receive the report document back, without needing to listen for an out-of-band broadcast notification.
6. Adds a GraphQL subquery to the Target objects that allows retrieval of the current Report from the aggregator, if any, with server-side aggregation for the count of evaluated rules and the maximum result score. This is used to power the Topology view decorator added in cryostatio/cryostat-web#1589 .

Exceptions{Realm="JDP", JVM="service:jmx:rmi:///jndi/rmi://mypc/jmxrmi", jvmId="m8Q4OzEcy-WcyhItaL_W-65qTDye5TmB4byCYBaFnOQ="}=0.009996005416307545
StackdepthSetting{Realm="JDP", JVM="service:jmx:rmi:///jndi/rmi://mypc/jmxrmi", jvmId="m8Q4OzEcy-WcyhItaL_W-65qTDye5TmB4byCYBaFnOQ="}=25.0
CompareCpu{Realm="JDP", JVM="service:jmx:rmi:///jndi/rmi://mypc/jmxrmi", jvmId="m8Q4OzEcy-WcyhItaL_W-65qTDye5TmB4byCYBaFnOQ="}=0.25587051761284224
HeapDump{Realm="JDP", JVM="service:jmx:rmi:///jndi/rmi://mypc/jmxrmi", jvmId="m8Q4OzEcy-WcyhItaL_W-65qTDye5TmB4byCYBaFnOQ="}=0.0

Analysis result names are used as metric keys. The discovery node owner chain (excluding Universe which is a common root for all nodes/targets) is also embedded in the metrics labels, along with the unique JVM hash ID. The analysis result score is the metric value.

Current issues:

  1. Soft deletion is a significant change to the underlying data model, which should really have been done back in 3.0. Various behaviours will need to be adjusted - for example, connectUrls are no longer strictly unique (see [Epic] Redefine Target data model to allow multiple Connection URLs #71) since a soft-deleted target may have a connectUrl that a user wants to reuse, either for the same target instance or one at the same resolved network location. Maybe this change needs to be split out and worked on separately.

Motivation for the change:

See #842

How to manually test:

  1. Check out chore(exception): public visibility cryostat-core#529 , ./mvnw clean install it
  2. Check out this PR, cd src/main/webui and check out feat(autoanalyze): add controls to enable archived recording autoanalysis cryostat-web#1589. ./mvnw -o clean package
  3. ./smoketest.bash -O -t quarkus-cryostat-agent

Test the following features:

  1. Dashboard card for Automated Analysis, which should look and feel the same as the drawer that appears on Active Recordings > Analyze
  2. The new Automated Analysis component should have a "Create recording" text link/button when looking at a target with no source recordings. Clicking this button should lead to the recording creation form with some prefilled default values. Clicking the Create button should result in the recording being created and with the autoanalyze=true label. Archiving this recording should cause a report generation notification to appear (in the notification drawer) after a short time.
  3. After a target has an available report, either the dashboard card or the Active Recordings > Analyze drawer should display the results. The results should be filterable by name or category, and by score, and should also be viewable in a list format.
  4. After a target has an available report, the Topology view should display a new status badge reflecting the report score status. The Topology details panel that appears after selecting a target node should also have a new Report "owned resource". This will show 0 if no report is available or 1 if there is one available. Clicking on this resource should lead back to the Active Recordings table view with the report drawer pulled out.
  5. The preset Automated Rule can be enabled and should cause recordings to appear on all discovered applications. After some time these should get archived, and reports should be automatically processed, and all of the above indicators should reflect these report data.

@github-actions
Copy link

This PR/issue depends on:

@andrewazores andrewazores force-pushed the batch-report-processing branch from e09072a to 657bf31 Compare April 1, 2025 14:06
@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

github-actions bot commented Apr 1, 2025

Workflow started at 4/1/2025, 11:52:48 AM. View Actions Run.

@andrewazores
Copy link
Member Author

Oh, that's going to fail because it needs cryostatio/cryostat-core#529 .

@github-actions
Copy link

github-actions bot commented Apr 1, 2025

CI build and push: At least one test failed ❌
https://github.com/cryostatio/cryostat/actions/runs/14200232385

@andrewazores andrewazores force-pushed the batch-report-processing branch from 10d04e8 to bbfb245 Compare April 8, 2025 13:55
@andrewazores andrewazores force-pushed the batch-report-processing branch from bbfb245 to 594d159 Compare April 9, 2025 18:17
@andrewazores andrewazores marked this pull request as ready for review April 9, 2025 18:18
@andrewazores andrewazores requested a review from a team as a code owner April 9, 2025 18:18
@andrewazores andrewazores requested a review from a team April 9, 2025 18:18
@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 4/11/2025, 9:19:58 AM. View Actions Run.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/14404093048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Request] Automatic Automated Analysis

2 participants