Skip to content

feat(s3): file metadata storage modes#927

Merged
andrewazores merged 39 commits intocryostatio:mainfrom
andrewazores:s3-object-tagging
Jun 19, 2025
Merged

feat(s3): file metadata storage modes#927
andrewazores merged 39 commits intocryostatio:mainfrom
andrewazores:s3-object-tagging

Conversation

@andrewazores
Copy link
Member

@andrewazores andrewazores commented May 21, 2025

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes: #924
Related to #269
See also cryostatio/cryostat-helm#247
See also cryostatio/cryostat-operator#959

Description of the change:

  1. Refactors some logic out of ArchivedRecordings (API endpoints) into RecordingHelper (general utility) to ensure consistent behaviours
  2. Adds new configuration properties. One to control the overall "metadata storage mode", which can have values tagging | metadata | bucket, and a "metadata bucket name". There are also sub-configurations for the "archived recordings storage mode" and "event templates storage mode" which accept the same values, and default to the overall metadata storage mode, which can be used to configure these two things differently if that is needed for some reason.
  3. If the mode is tagging, Cryostat behaves the same as before this PR - metadata is stored in the S3 object Tags.
  4. If the mode is metadata then metadata is stored in the S3 object metadata.
  5. If the mode is bucket then the metadata bucket name must also be specified, and then Cryostat will not use object Tags but will instead use a separate storage bucket and create JSON files containing Metadata objects. Within the bucket the subdirectories and filenames mirror the actual archives so that the same file name "key" can be used to reference either the actual recording or its metadata depending on which bucket is queried.
  6. Removes the JMC Agent Probe Template metadata handling, which used Tagging before. This stored four attributes of the template document: the original file name, the class prefix, the "allowToString" property, and the "allowConverter" property. Of these four only the original file name is really useful for identifying/locating the document, and the allowToString/allowConverter do not seem useful at all. The class prefix may have been useful for some UI presentation that was never implemented. The implementation before this PR contained some bugs that broke the behaviour of the endpoint and its notifications, and the metadata handling also resulted in the UI displaying an incomplete rendition of the template's XML content. With this stripped-down version in this PR the functionality is restored and the UI displays the original full XML content again (which is a decision that should be revisited, but this restores what the behaviour was originally intended to be), and the Tagging API usage is removed so the issues below are sidestepped.
  7. Refactors the StorageBuckets implementation so that declarative configuration startup activities do not proceed until the corresponding storage bucket (if any) has been successfully created. This prevents a race where Cryostat might try to ex. upload a custom event template into storage, before there is a bucket ready to receive that upload.
  8. Adds some BufferedInputStream wrappers around S3 SDK storage.getObject() responses and around declarative configuration file reads. The S3 SDK response ones are particularly important, because the S3 SDK HTTP client will reuse HTTP connections to reduce overall resource consumption, but connections cannot be released until the inner InputStream has been consumed/closed. Since there are internal operations that operate over these streams, such as by parsing them as XML documents, it's best to try to read the content out of the stream as quickly as possible and close it, so that the HTTP connection can be reused for another S3 SDK request.
  9. Fixes up some dependency injection smells like improper use of DefaultBean and Named annotations

Motivation for the change:

Circumvents the following potential problems with some S3 providers:

  1. Object Tagging may not be supported whatsoever, and may be silently ignored or may produce error responses
  2. Object Tagging usually entails a maximum size/length for keys and values, as well as a maximum count of tags that can be applied to objects
  3. Object Metadata entails a maximum size/length for keys and values and a maximum metadata payload size, and is immutable (can only be set when the data object is created)

How to manually test:

  1. Check out and build PR
  2. Run CRYOSTAT_STORAGE_MODE=tagging ./smoketest.bash -O -s minio -t quarkus-cryostat-agent to run a smoketest using the standard tagging metadata storage behaviour.
  3. Create and archive recordings, do target analysis, etc. and ensure everything works.
  4. Tear down smoketest.
  5. Run CRYOSTAT_STORAGE_MODE=metadata ./smoktest.bash -O -s minio -t quarkus-cryostat-agent to run a smoketest using metadata storage behaviour.
  6. Repeat step 3 testing. Everything should behave exactly the same.
  7. Repeat again, with CRYOSTAT_STORAGE_MODE=bucket to run a smoketest using bucketed metadata behaviour.

Resource Types vs Storage Modes:

Tagging Metadata Bucketed
Archived Reports N/A, no metadata stored N/A N/A
JMC Agent Probe Templates N/A, no metadata stored (previously stored some unused attributes, now removed in this PR) N/A N/A
Archived Recordings full original featureset - if the S3 provider supports it. caveat for maximum label key/value sizes and label count. immutable archived recording labels, limitations on key/value size and overall size full original featureset, but more complex S3 management for the end user. No limitations on key/value size or label count
Custom Event Templates full original featureset, key/value size caveat may apply in particular to Description field full original featureset, Description field size limitation full original featureset, no limitations, more complex end user management

@andrewazores andrewazores added feat New feature or request safe-to-test labels May 21, 2025
@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 5/21/2025, 4:53:24 PM. View Actions Run.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15172360175

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 5/22/2025, 12:10:49 PM. View Actions Run.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15191560148

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 5/22/2025, 3:38:38 PM. View Actions Run.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15195348015

@andrewazores andrewazores changed the title feat(archives): store archived recording metadata as separate files feat(archives): archived recording metadata storage modes May 23, 2025
@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 5/23/2025, 11:11:26 AM. View Actions Run.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15213520722

@andrewazores andrewazores changed the title feat(archives): archived recording metadata storage modes feat(s3): file metadata storage modes May 23, 2025
@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 5/28/2025, 10:52:05 AM. View Actions Run.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15303375177

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 6/12/2025, 12:02:14 PM. View Actions Run.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15615507826

Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the following via smoketest with all 3 storage modes (CRYOSTAT_STORAGE_MODE=tagging/metadata/bucket) using the cryostat-quarkus-agent test application and cryostat itself.

  • Creating recordings
  • Archiving recordings
  • Stopping/deleting active recordings (both manually started and ones that started automatically)
  • Deleting archived recordings
  • Generating reports for recordings
  • Uploading/using custom event templates

Everything looks good and works as expected.

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 6/18/2025, 4:56:44 PM. View Actions Run.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15743456171

@andrewazores andrewazores merged commit 0c516a1 into cryostatio:main Jun 19, 2025
9 checks passed
@andrewazores andrewazores deleted the s3-object-tagging branch June 19, 2025 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Cryostat requires object storage to implement PUT object tagging

2 participants