Skip to content

feat(storage): add configuration for external object storage provider#247

Merged
andrewazores merged 18 commits intocryostatio:mainfrom
andrewazores:external-s3
Sep 24, 2025
Merged

feat(storage): add configuration for external object storage provider#247
andrewazores merged 18 commits intocryostatio:mainfrom
andrewazores:external-s3

Conversation

@andrewazores
Copy link
Member

@andrewazores andrewazores commented May 15, 2025

See #246
See cryostatio/cryostat#927

Allows the user to configure an alternate S3 object storage provider, rather than requiring the use of cryostat-storage.

Manual Testing with unmanaged cryostat-storage

  1. Set up an unmanaged S3-compatible object storage instance. In this case we will actually use cryostat-storage again, but deployed independently. This assumes an OpenShift cluster is available and oc is logged in.
Deploy cryostat-storage
$ oc new-project objectstorage
$ # set credentials to secure the S3 endpoint
$ oc create secret generic s3cred \
    --from-literal=STORAGE_ACCESS_KEY_ID=cryostat \
    --from-literal=STORAGE_ACCESS_KEY=verySecretKey1
$ oc create -f seaweed.yaml

seaweed.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  generation: 1
  labels:
    app.kubernetes.io/component: seaweed
    app.kubernetes.io/instance: seaweed
    app.kubernetes.io/name: seaweed
    app.kubernetes.io/part-of: seaweed
  name: seaweed
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: seaweed
      app.kubernetes.io/instance: seaweed
      app.kubernetes.io/name: seaweed
      app.kubernetes.io/part-of: seaweed
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/component: seaweed
        app.kubernetes.io/instance: seaweed
        app.kubernetes.io/name: seaweed
        app.kubernetes.io/part-of: seaweed
    spec:
      containers:
      - env:
        - name: CRYOSTAT_BUCKETS
          value: archivedrecordings,archivedreports,eventtemplates,probes
        - name: CRYOSTAT_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: STORAGE_ACCESS_KEY_ID
              name: s3cred
              optional: false
        - name: CRYOSTAT_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: STORAGE_ACCESS_KEY
              name: s3cred
              optional: false
        - name: DATA_DIR
          value: /data
        - name: IP_BIND
          value: 0.0.0.0
        - name: REST_ENCRYPTION_ENABLE
          value: "1"
        image: quay.io/cryostat/cryostat-storage:latest
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          httpGet:
            path: /status
            port: 8333
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: seaweed
        ports:
        - containerPort: 8333
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - echo s3.bucket.list | weed shell | [[ "$(</dev/stdin)" == *"archivedrecordings"*
              ]]
          failureThreshold: 2
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          requests:
            cpu: 50m
            memory: 256Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        startupProbe:
          failureThreshold: 9
          httpGet:
            path: /status
            port: 8333
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: seaweed
          subPath: seaweed
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: seaweed
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: seaweed
    app.kubernetes.io/instance: seaweed
    app.kubernetes.io/name: seaweed
    app.kubernetes.io/part-of: seaweed
  name: seaweed
spec:
  clusterIP: 10.217.5.138
  clusterIPs:
  - 10.217.5.138
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - port: 8333
    protocol: TCP
    targetPort: 8333
  selector:
    app.kubernetes.io/component: seaweed
    app.kubernetes.io/instance: seaweed
    app.kubernetes.io/name: seaweed
    app.kubernetes.io/part-of: seaweed
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: v1
data:
  STORAGE_ACCESS_KEY: dmVyeVNlY3JldEtleTE=
  STORAGE_ACCESS_KEY_ID: Y3J5b3N0YXQ=
kind: Secret
metadata:
  name: s3cred
type: Opaque
  1. Deploy Cryostat using this PR, pointing it at the existing unmanaged object storage instance in the other namespace.
$ oc new-project cryostat-helm
$ # create the same Secret again so that Cryostat can be configured to use these credentials for access
$ oc create secret generic s3cred \
    --from-literal=STORAGE_ACCESS_KEY_ID=cryostat \
    --from-literal=STORAGE_ACCESS_KEY=verySecretKey1
$ helm install \
    --set storage.storageSecretName=s3cred \
    --set storage.provider.url=http://seaweed.objectstorage:8333 \
    --set storage.provider.region=us-east-1 \
    --set core.route.enabled=true \
    cryostat ./charts/cryostat
  1. Visit the Route and use the Cryostat Web UI to start a recording, either on jfr-datasource or create a localhost:0 custom target or deploy another sample application, then archive the recording and ensure that all archiving functionality works as expected.

TODO

  1. test using a different S3-compatible storage implementation
  2. the S3 client has a lot of configuration parameters. Some are required or very basic and should be implemented directly as Helm values (such as the ones already done here), but others are much more specific and may not always be needed. Solving [Request] Add ability to add additional env variables #203 would probably be the right way to go about this. See [Request] Add ability to add additional env variables #203 feat(envs): add configurations to include extra envs for containers #248

Manual Testing with external commercial object storage provider

step 0: sign up for a Backblaze B2 free tier account. At the time of writing this allows for an account with no expiry date, but a 10GB data limit. That's enough for basic functionality testing. Create an account and a new Application Key. Create buckets MYPREFIX-metadata, MYPREFIX-archivedrecordings, MYPREFIX-archivedreports, MYPREFIX-eventtemplates, and MYPREFIX-probes. You can select your own bucket settings, but I would suggest private and storing only the latest copy of each file. I also set archivedrecordings to be encrypted and the others unencrypted. Substitute MYPREFIX for some unique identifier - the bucket names must be globally unique across all of B2, since they become subdomain names.

$ oc new-project apps1 ; pushd cryostat-operator ; make sample_app ; popd
$ oc new-project cryostat-helm ; pushd cryostat-helm
$ oc create secret generic s3cred \
  --from-literal=STORAGE_ACCESS_KEY=abcd1234 \
  --from-literal=STORAGE_ACCESS_KEY_ID=abcd1234
$ helm install \
  --set storage.provider.url=https://s3.us-east-005.backblazeb2.com \
  --set storage.storageSecretName=s3cred \
  --set storage.provider.region=us-east-1 \
  --set storage.provider.usePathStyleAccess=false \
  --set storage.provider.metadata.storageMode=bucket \
  --set storage.buckets.names.archivedRecordings=MYPREFIX-archivedrecordings \
  --set storage.buckets.names.archivedReports=MYPREFIX-archivedreports \
  --set storage.buckets.names.eventTemplates=MYPREFIX-eventtemplates \
  --set storage.buckets.names.jmcAgentProbeTemplates=MYPREFIX-probes \
  --set storage.buckets.names.metadata=MYPREFIX-metadata \
  --set core.route.enabled=true \
  --set core.discovery.kubernetes.enabled=true \
  --set core.discovery.kubernetes.namespaces='{apps1}' \
  --set core.image.repository=quay.io/andrewazores/cryostat \
  --set core.image.tag=object-tagging-alt-4 \
  cryostat ./charts/cryostat
  1. Check the paths for pushd in the first two steps. These assume that you are in some parent directory which contains both repositories.
  2. Replace the abcd1234s in the s3cred creation with the key parts from your B2 account and a new Application Key created for this purpose. The name "key ID" is used the same way, and the "application key" is the "access key".
  3. Check that the storage.provider.url matches the "endpoint" for your B2 buckets, adjust if needed. Ensure that the storage.provider.region matches.
  4. (optional) Disable storage.provider.usePathStyleAccess. B2 also supports DNS subdomain access, which should be more performant. Either should work.
  5. Use my object-tagging-alt-4 image, which contains feat(s3): file metadata storage modes cryostat#927 . The storage.provider.metadata.storageMode=bucket relies on this, and configures Cryostat to use a separate storage bucket and JSON files for metadata, rather than object Tags (which are not supported in B2).
  6. (optional) Do helm upgrade --reuse-values --set storage.provider.metadata.storageMode=metadata cryostat ./charts/cryostat. This uses the "native" S3 object metadata facility, which has fewer size constraints than Tags but which are immutable. For archived recording labels this means that the labels cannot be modified after the recording has been created, which is not so bad since this is not a major feature. Feel free to try it out and observe the nice error message that appears.

@andrewazores andrewazores added feat New feature or request safe-to-test labels May 15, 2025
@ebaron
Copy link
Member

ebaron commented May 15, 2025

Would using a config map or secret make sense? Maybe allow users to specify a local file with S3 environment variables and use envFrom to include those in the storage container?

@tthvo
Copy link
Member

tthvo commented May 16, 2025

+1 Agreed! Allowing user-provider configmap/secret sounds good to me! I guess it'd be like Andrew suggestion. To extend that idea, I also think it would be a good idea to allow env vars defined via Helm values. How about the below proposal?

Reference: I took inspiration from the s3 ack chart here [0] and [1].

Proposed changes to the values.yaml

storage:
  config:
    ## @param storage.config.extra Extra configurations for Cryostat storage container
    extra:
      ## @param storage.config.extra.envVars Extra environment variables for the Cryostat storage container. You can define:
      # 1. static environment variable:
      #  - name: DEMO_GREETING
      #    value: "Hello from the environment"
      #
      # 2. secret environment variable:
      # - name: USERNAME
      #   valueFrom:
      #     secretKeyRef:
      #       name: mysecret
      #       key: username
      # See: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
      envVars: []
      # @param storage.config.extra.envSources Sources for extra variables for the Cryostat storage container. You can define:
      # 1. from config map:
      #  - configMapRef:
      #      name: special-config
      #      optional: false
      # 2. from secret:
      #  - secretRef:
      #      name: special-config
      #      optional: false
      # See: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#configure-all-key-value-pairs-in-a-configmap-as-container-environment-variables
      envSources: []

Proposed changes to deployment template

          env:
          {{- if .Values.storage.config.extra.envVars }}
            {{- toYaml .Values.storage.config.extra.envVars | nindent 10 }}
          {{- end }}
          envFrom:
          {{- if .Values.storage.config.extra.envSources }}
            {{- toYaml .Values.storage.config.extra.envSources | nindent 10 }}
          {{- end }}

Sample customized values.yaml file

storage:
  config:
    extra:
      envVars:
      - name: DEMO_GREETING
        value: "Hello from the environment"
      - name: USERNAME
        valueFrom:
          secretKeyRef:
            name: mysecret
            key: username
      envSources:
      - secretRef:
          name: special-config
          optional: false
      - configMapRef:
          name: special-config
          optional: false

This allows both specifying env var directly in the extra helm value file and from existing configmap/secrets in a "kubernetes" way. The downside is that each entry has a strict format, but k8s will validate it anyway, not us.

@andrewazores
Copy link
Member Author

Really good idea @tthvo , I like that a lot. Would you like to proceed with a PR for that? I can then rebase my work on top of that.

@tthvo
Copy link
Member

tthvo commented May 16, 2025

Cool! @andrewazores I opened #248 to allow storage containers for now. If it looks good, I will circle back next week for the rest of the containers 😄

@andrewazores
Copy link
Member Author

@cryostatio/reviewers ping

@andrewazores
Copy link
Member Author

@reviewers ping

@Josh-Matsuoka
Copy link
Contributor

Tested Scenario 1, works well.

For Scenario 2 I seem to be unable to run it, I followed the testing instructions on a fresh crc instance and ran into this

2025-09-19 02:01:04,206 ERROR [io.fab.kub.cli.inf.imp.cac.Reflector] (vert.x-eventloop-thread-0) listSyncAndWatch failed for v1/namespaces/apps1/endpoints, will stop: java.util.concurrent.CompletionException: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.217.4.1:443/api/v1/namespaces/apps1/endpoints?resourceVersion=0. Message: endpoints is forbidden: User "system:serviceaccount:cryostat-helm:cryostat" cannot list resource "endpoints" in API group "" in the namespace "apps1". Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=endpoints, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=endpoints is forbidden: User "system:serviceaccount:cryostat-helm:cryostat" cannot list resource "endpoints" in API group "" in the namespace "apps1", metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, ..

It's possible something is misconfigured somewhere?

Running crc-linux-2.54.0-amd64

@andrewazores
Copy link
Member Author

Message: endpoints is forbidden: User "system:serviceaccount:cryostat-helm:cryostat" cannot list resource "endpoints"

This looks like a mismatch between the Helm Chart and Cryostat relating to the switch from Endpoints to EndpointSlices. This log message indicates that the Cryostat container is still trying to watch/list Endpoints objects, but the Helm version that deployed it is newer and so the RBAC is set up to grant Cryostat permissions for EndpointSlices, not Endpoints.

I'm guessing it's because of this part of the original testing instructions:

 --set core.image.repository=quay.io/andrewazores/cryostat \
  --set core.image.tag=object-tagging-alt-4 \

That image was prepared when I opened this PR back in May, which predates cryostatio/cryostat#740 . Since that and cryostatio/cryostat#927 have been merged in the meantime, I think these Helm value overrides can just be left out (ie allow the chart to install the default cryostat/cryostat:latest image) and it should be good to go.

Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@andrewazores andrewazores merged commit 9d8aa69 into cryostatio:main Sep 24, 2025
7 checks passed
@andrewazores andrewazores deleted the external-s3 branch September 24, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants