Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
330 changes: 330 additions & 0 deletions enhancements/must-gather-operator-disable-upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,330 @@
---
title: must-gather-operator-disable-upload
authors:
- "@rausingh"
reviewers:
- "@TrilokGeer"
- "@Prashanth684"
approvers:
- "@TrilokGeer"
- "@Prashanth684"
api-approvers:
- "@TrilokGeer"
- "@Prashanth684"
creation-date: 2025-09-03
last-updated: 2025-09-03
tracking-link:
- https://issues.redhat.com/browse/MG-86
- https://issues.redhat.com/browse/MG-67
status: implementable
see-also:
---

# Must Gather Operator Enhancement: DisableUpload in MustGatherSpec

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Add an optional field `disableUpload` to `MustGather.spec` that allows users to control whether the must-gather bundle should be uploaded to the SFTP server. When set to `true`, the bundle will be collected but not uploaded, providing users with the flexibility to handle the collected data locally or through alternative means.

Supported values: `true`, `false`
Default: `false` (upload enabled by default for backward compatibility)

## Motivation

### User Stories

* As a cluster admin, I want to collect must-gather data without automatically uploading it to Red Hat's case management system so I can review the data locally first.
Copy link
Contributor

@Prashanth684 Prashanth684 Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disconnected case could also be a relevant use case where you don't have the ability to upload outside the airgapped env

* As a security-conscious administrator, I want to disable automatic upload to ensure sensitive cluster information doesn't leave my environment without explicit approval.
* As a developer or QA engineer, I want to collect must-gather data for debugging purposes without creating unnecessary uploads to production case management systems.
* As a support engineer, I want to provide customers with the option to collect data locally when they have concerns about uploading sensitive information.

### Goals

* Add `spec.disableUpload` with boolean validation and a safe default.
* Ensure the upload container respects the `disableUpload` flag and skips upload when disabled.
* Make `caseID` and `caseManagementAccountSecretRef` optional when upload is disabled.
* Maintain full backward compatibility.
* Provide clear examples and documentation for both use cases.

### Non-Goals

* Introduce alternative upload mechanisms (e.g., different protocols or destinations).
* Add compression or encryption features for local storage.

## Proposal

When `spec.disableUpload` is set to `true`, the must-gather process will:
1. Collect all the required cluster data as usual
2. Skip the upload phase entirely
3. Allow the job to complete successfully without requiring case management credentials
4. Make `caseID` and `caseManagementAccountSecretRef` optional fields

When `spec.disableUpload` is set to `false` (default), the operator continues to work exactly as before, maintaining full backward compatibility.

### Workflow Description

**cluster administrator** is a human user responsible for configuring must-gather collection.

1. The cluster administrator creates a MustGather custom resource with an optional `disableUpload` field.
2. If `disableUpload` is set to `true`:
- The must-gather operator creates a job that collects data but skips upload
- `caseID` and `caseManagementAccountSecretRef` become optional
- The upload container receives the `disable_upload=true` environment variable
3. If `disableUpload` is omitted or set to `false`:
- The operator behaves exactly as before (backward compatibility)
- `caseID` and `caseManagementAccountSecretRef` remain required
- The upload container performs the normal upload process

### API Extensions

#### Types: must-gather-operator/api/v1alpha1/mustgather_types.go

```go
// MustGatherSpec defines the desired state of MustGather
// +kubebuilder:validation:XValidation:rule="!(has(self.disableUpload) && self.disableUpload) ? (has(self.caseID) && self.caseID != '' && has(self.caseManagementAccountSecretRef) && self.caseManagementAccountSecretRef.name != '') : true",message="caseID and caseManagementAccountSecretRef are required when disableUpload is false or unset"
type MustGatherSpec struct {
// The is of the case this must gather will be uploaded to
// Required when disableUpload is false, optional when disableUpload is true
// +kubebuilder:validation:Optional
CaseID string `json:"caseID,omitempty"`

// the secret container a username and password field to be used to authenticate with red hat case management systems
// Required when disableUpload is false, optional when disableUpload is true
// +kubebuilder:validation:Optional
CaseManagementAccountSecretRef corev1.LocalObjectReference `json:"caseManagementAccountSecretRef,omitempty"`

// ...existing fields...

// A flag to control whether the must-gather bundle should be uploaded to SFTP server.
// If set to true, the bundle will be collected but not uploaded.
// +kubebuilder:default:=false
DisableUpload bool `json:"disableUpload,omitempty"`
Copy link
Member

@swghosh swghosh Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamkirkbater suggested on the other EP, to use an API field like spec.uploadTarget. It sounds reasonable to add UnionDiscriminator type field that uploads only when set appropriately.

ref: https://github.com/openshift/api/blob/1517fca97fe327e076bcff90accb86d1f4e804ed/example/v1/types_stable.go#L69 of UnionDiscriminator

eg.



type MustGatherSpec struct {
    // ...

    // uploadTarget sets the target config for uploading the collected must-gather tar,
    // uploading is disabled if this field is unset
    // 
    // +optional
    UploadTarget *UploadTarget `json:"uploadTarget,omitempty"`

    // ...
}


// UploadType is a specific method for uploading to a target
// +kubebuilder:validation:Enum=SFTP
type UploadType string


// UploadTarget defines the configuration for uploading the must-gather tar ...
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'SFTP' ?  has(self.sftp) : !has(self.sftp)",message="sftp upload target config is required when upload type is SFTP, and forbidden otherwise"
// +union
type UploadTarget struct {
	// type defines the method used for uploading to a specific target
	// Available types are SFTP only.
	//
	// +unionDiscriminator
	// +required
	Type UploadType `json:"type"`

	// sftp defines the target details for uploading to a valid
        // SFTP server.
	//
	// +unionMember
	// +optional
	SFTP *SFTPUploadTargetConfig `json:"sftp,omitempty"`
}

// SFTPUploadTargetConfig defines the ...
type SFTPUploadTargetConfig struct {
	// caseID specifies the ...
	//
	// +kubebuilder:validation:MaxLength=128
	// +kubebuilder:validation:MinLength=1
	// +required
	CaseID string `json:"caseId"`
	
	// uploadTokenSecretRef ...
	// +required
	UploadTokenSecretRef corev1.LocalObjectReference `json:".."`

	// sftpHost specifies ...
	// +kubebuilder:default:="sftp.access.redhat.com"
	// +optional
	SFTPHost string `json:"sftpHost"`
}

also 👀 @shivprakashmuley as this also might interfere with #1831.

Copy link
Contributor

@Prashanth684 Prashanth684 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we updating the spec here or is it rolled into #1830 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be closing this EP. Changes will be rolled into #1830

cc @swghosh @shivprakashmuley

}
```

#### CRD OpenAPI Schema

```yaml
spec:
description: MustGatherSpec defines the desired state of MustGather
properties:
disableUpload:
default: false
description: |-
A flag to control whether the must-gather bundle should be uploaded to SFTP server.
If set to true, the bundle will be collected but not uploaded.
type: boolean
caseID:
description: |-
The is of the case this must gather will be uploaded to
Required when disableUpload is false, optional when disableUpload is true
type: string
caseManagementAccountSecretRef:
description: |-
the secret container a username and password field to be used to authenticate with red hat case management systems
Required when disableUpload is false, optional when disableUpload is true
properties:
name:
default: ""
description: |-
Name of the referent.
This field is effectively required, but due to backwards compatibility is
allowed to be empty. Instances of this type with an empty value here are
almost certainly wrong.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
type: string
type: object
x-kubernetes-map-type: atomic
type: object
x-kubernetes-validations:
- message: caseID and caseManagementAccountSecretRef are required when
disableUpload is false or unset
rule: '!(has(self.disableUpload) && self.disableUpload) ? (has(self.caseID)
&& self.caseID != '''' && has(self.caseManagementAccountSecretRef)
&& self.caseManagementAccountSecretRef.name != '''') : true'
```

### Topology Considerations

#### Hypershift / Hosted Control Planes

This change is relevant for hosted control planes where administrators may want to collect data locally before deciding whether to upload it to Red Hat support.

#### Standalone Clusters

This change is fully relevant for standalone clusters where must-gather operations are performed directly within the cluster.

#### Single-node Deployments or MicroShift

This enhancement is applicable to single-node deployments and MicroShift environments where resource constraints or security policies may require local data collection without automatic upload.

### Implementation Details/Notes/Constraints

* **Prerequisites**: Requires Kubernetes 1.25+ for CEL validation support (enabled by default)
* **API**: Add `DisableUpload` to `MustGatherSpec` with kubebuilder default marker and CEL validation rules.
* **Validation**: Use kubebuilder CEL validation (`+kubebuilder:validation:XValidation`) to enforce that `caseID` and `caseManagementAccountSecretRef` are mandatory when `disableUpload` is false or unset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah..ok..that makes sense..should have read this first :)

* **Template**: Update upload container to pass `disable_upload` environment variable from `spec.disableUpload`.
* **Secret Management**: Skip secret copying to operator namespace when upload is disabled.
* **Defaulting**: Respect defaulting when field is unset to maintain backward compatibility.
* **CRD Generation**: Regenerate deepcopy, OpenAPI, CRDs, and bundle files.

### Risks and Mitigations

* **Data Loss**: Users might forget to manually handle collected data when upload is disabled.
* *Mitigation*: Clear documentation and examples showing how to access collected data.
* **Support Impact**: Support cases may be harder to handle without automatic upload.
* *Mitigation*: Documentation should clearly explain when to use this feature and how to manually upload data later if needed.
* **Security**: Collected data remains on the cluster longer when upload is disabled.
* *Mitigation*: Job cleanup mechanisms remain in place; documentation should advise on data handling best practices.

### Drawbacks

* Adds another user-visible configuration option to the API.
* Requires users to understand the implications of disabling upload.

## Alternatives (Not Implemented)

* **Post-processing Flag**: Could have implemented a flag to delete uploaded data after upload, but the requirement is to prevent upload entirely.

## Open Questions

1. **Upload Container Strategy**: Should we omit the upload container entirely when `disableUpload: true`, or include it with early exit logic?
- **Option A**: Skip upload container creation when upload is disabled (cleaner, fewer resources)
- **Option B**: Always create upload container but exit early if upload is disabled (consistent job structure)
- **Decision**: Recommend Option B for consistency and easier debugging/monitoring
2. **Validation Fallback**: For clusters < Kubernetes 1.25 (without CEL support), should we implement controller-level validation as fallback?
- **Impact**: Ensures consistent behavior across all supported Kubernetes versions
- **Complexity**: Requires dual validation paths
- **Decision Needed**: Is this necessary given OpenShift's Kubernetes version requirements?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. probably not.


## Test Plan

### Unit Tests

* API defaulting validation (`disableUpload` defaults to false)
* CEL validation rules for conditional field requirements:
- Fields are optional when `disableUpload` is true
- Fields are required when `disableUpload` is false or unset
- Validation error messages are clear and helpful
* Job template generation includes `disable_upload` environment variable when set
* Upload container environment variable handling for disabled upload scenarios

### E2E Tests

* Happy path with `disableUpload` set to true - job completes without upload
* Default path (unset or `false`) - normal upload behavior
* CEL validation tests:
- Missing `caseID` or `caseManagementAccountSecretRef` when `disableUpload` is `false` or unset should fail at admission time
- Missing `caseID` or `caseManagementAccountSecretRef` or both when `disableUpload` is `true` should succeed
- Empty string values for required fields when upload is enabled should fail
* Secret management test - no secret copying when upload is disabled

## Graduation Criteria

### Dev Preview -> Tech Preview

* Field added with default validation and CEL validation rules
* Unit tests implemented for API validation and controller logic
* Basic documentation available
* Ability to utilize the enhancement end to end

### Tech Preview -> GA

* E2E tests implemented
* Customer validation in staging environments
* User-facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/)
* Performance impact assessment completed

### Removing a deprecated feature

- None

## Upgrade / Downgrade Strategy

* **Backward compatible**: Field is optional with a safe default (`false`)
* **Upgrade**: Older operators ignore the unknown field; no disruption to current flows
* **Downgrade**: If downgrading to an operator version that doesn't support `disableUpload`, the field is ignored and uploads continue as normal

## Version Skew Strategy

* Ensure operator CSV/CRD includes the new field before shipping the controller that uses it
* If controller is older than CRD, it safely ignores the field
* No coordination required between control plane and kubelet components

## Operational Aspects of API Extensions

This enhancement modifies the MustGather CRD by adding an optional field with boolean validation, a default value, and CEL validation rules for conditional field requirements.

**Impact on existing SLIs**:
* No impact on API throughput or availability
* Minimal impact on admission latency due to CEL validation (microseconds)
* No impact on cluster performance as this is a configuration field only used during must-gather operations

**Failure modes**:
* CEL validation failures for missing required fields when upload is enabled are caught at admission time with clear error messages
* Upload container failures are handled the same way regardless of upload status

## Support Procedures

### Detecting Issues

* **Symptoms**: Job completion without upload when upload was expected, or upload attempts when upload was disabled
* **Logs**: Check must-gather operator logs and must-gather job logs for upload-related messages

### Troubleshooting

* Verify `disableUpload` field value in the MustGather resource
* Validate that `caseID` and `caseManagementAccountSecretRef` are provided when upload is enabled

### Disabling the Feature

* Set `disableUpload: false` or omit the field from MustGather resources to use default behavior
* No cluster-wide disable mechanism needed as this is a per-resource configuration

## Examples

### Default Configuration (upload enabled)

```yaml
apiVersion: managed.openshift.io/v1alpha1
kind: MustGather
metadata:
name: example-mustgather-upload
spec:
caseID: "01234567"
caseManagementAccountSecretRef:
name: case-management-creds
serviceAccountRef:
name: must-gather-admin
# disableUpload omitted → defaults to false (upload enabled)
```

### Upload Disabled Configuration

```yaml
apiVersion: managed.openshift.io/v1alpha1
kind: MustGather
metadata:
name: example-mustgather-local
spec:
serviceAccountRef:
name: must-gather-admin
disableUpload: true
# caseID and caseManagementAccountSecretRef are optional when upload is disabled
```

## Implementation History

* v0: Draft proposal created based on MG-67 requirements
* v1: API added with boolean field, default, and CEL validation rules; unit tests implemented
* v2: E2E tests and examples; CRDs regenerated with CEL validation; documentation updated
* v3: Ready for tech preview after validation and testing