Skip to content

Conversation

@jiangzho
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds support to generate SparkApplication and SparkCluster CRD yaml with both v1alpha1 and v1beta1 version, with storage set to v1beta1.

Why are the changes needed?

With this patch, our gradle task would be able to generate yaml that matches latest changes in CRD POJO, and save us from maintaining the yaml files for new version.

The similar pattern can be used when we promote to future (v1/v2 ...) versions. We may keep the previous version POJO in separate package, leaving them as a reference, and specify the storage version only to the latest.

No

Does this PR introduce any user-facing change?

No

How was this patch tested?

CIs. Also, validated the generated yaml includes both versions.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the API label Jun 13, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jiangzho .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing the PR, I'm not sure this is worthy to keep all legacy POJO itself of v1alpha1. We are going to remove v1alpha1 eventually because we can do with alpha versions.

Let me play with these and think about more because it is more convenient in principle as you mentioned in the PR description.

With this patch, our gradle task would be able to generate yaml that matches latest changes in CRD POJO, and save us from maintaining the yaml files for new version.

@dongjoon-hyun
Copy link
Member

BTW, @jiangzho , SPARK-52251 is for crd-generator-cli, not this. Please create a new JIRA issue with this PR. Or, correct me if I'm wrong.

@jiangzho jiangzho changed the title [SPARK-52251] Support generating both v1alpha1 and v1beta1 from CRD generator [SPARK-52468] Support generating both v1alpha1 and v1beta1 from CRD generator Jun 13, 2025
@jiangzho
Copy link
Contributor Author

Thanks Dongjoon for the clarification - created SPARK-52468 for this scope

@jiangzho
Copy link
Contributor Author

jiangzho commented Jun 13, 2025

I'm not sure this is worthy to keep all legacy POJO itself of v1alpha1. We are going to remove v1alpha1 eventually because we can do with alpha versions.

+1 that we won't keep all legacy versions. Let me clarify - we keep the CRD version(s) that is supported by the current version of operator. We'll keep the POJO inline with what we offer in the chart & yaml, will we not ? They can be removed together when we drop the support for legacy version.

@dongjoon-hyun
Copy link
Member

Yes, I agree with you that it will be perfect when we keep them in sync.

…enerator

### What changes were proposed in this pull request?

This PR adds support to generate SparkApplication and SparkCluster CRD yaml with both v1alpha1 and v1beta1 version, with storage set to v1beta1.

### Why are the changes needed?

With this patch, our gradle task would be able to generate yaml that matches latest changes in CRD POJO, and save us from maintaining the yaml files for new version.

The similar pattern can be used when we promote to future (v1/v2 ...) versions. We may keep the previous version POJO in separate package, leaving them as a reference, and specify the storage version only to the latest.

No

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CIs. Also, validated the generated yaml includes both versions.

### Was this patch authored or co-authored using generative AI tooling?

No
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to be incomplete.

Also, validated the generated yaml includes both versions.

Here is the result from my verification.

$ ./gradlew build -x test
$ ./gradlew buildDockerImage
$ ./gradlew spark-operator-api:relocateGeneratedCRD
$ git diff
diff --git a/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml b/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml
index c38270a..8a68b82 100644
--- a/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml
+++ b/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml
@@ -1,18 +1,4 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Generated by Fabric8 CRDGenerator, manual edits might get overwritten!
 apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
@@ -27,7 +13,7 @@ spec:
     singular: sparkapplication
   scope: Namespaced
   versions:
-    - name: v1alpha1
+    - name: v1beta1
       schema:
         openAPIV3Schema:
           properties:
@@ -8694,7 +8680,7 @@ spec:
               type: object
           type: object
       served: true
-      storage: false
+      storage: true
       subresources:
         status: {}
       additionalPrinterColumns:
@@ -8704,7 +8690,7 @@ spec:
         - jsonPath: .metadata.creationTimestamp
           name: Age
           type: date
-    - name: v1beta1
+    - name: v1alpha1
       schema:
         openAPIV3Schema:
           properties:
@@ -17371,13 +17357,6 @@ spec:
               type: object
           type: object
       served: true
-      storage: true
+      storage: false
       subresources:
         status: {}
-      additionalPrinterColumns:
-        - jsonPath: .status.currentState.currentStateSummary
-          name: Current State
-          type: string
-        - jsonPath: .metadata.creationTimestamp
-          name: Age
-          type: date
diff --git a/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml b/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml
index 36b8218..bd542a6 100644
--- a/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml
+++ b/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml
@@ -1,18 +1,4 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Generated by Fabric8 CRDGenerator, manual edits might get overwritten!
 apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
@@ -27,7 +13,7 @@ spec:
     singular: sparkcluster
   scope: Namespaced
   versions:
-    - name: v1alpha1
+    - name: v1beta1
       schema:
         openAPIV3Schema:
           properties:
@@ -7305,7 +7291,7 @@ spec:
               type: object
           type: object
       served: true
-      storage: false
+      storage: true
       subresources:
         status: {}
       additionalPrinterColumns:
@@ -7315,7 +7301,7 @@ spec:
         - jsonPath: .metadata.creationTimestamp
           name: Age
           type: date
-    - name: v1beta1
+    - name: v1alpha1
       schema:
         openAPIV3Schema:
           properties:
@@ -14593,13 +14579,6 @@ spec:
               type: object
           type: object
       served: true
-      storage: true
+      storage: false
       subresources:
         status: {}
-      additionalPrinterColumns:
-        - jsonPath: .status.currentState.currentStateSummary
-          name: Current State
-          type: string
-        - jsonPath: .metadata.creationTimestamp
-          name: Age
-          type: date

@dongjoon-hyun
Copy link
Member

To @jiangzho , the AS-IS methodology is very simpler than you are trying to do.
To be short, we only generate the current one and insert it to the existing yaml file additionally. The deletion is also simpler than this. We clean up the YAML file simply by removing the old entries.

As we see in this PR, this PR's approach is error prone.

  1. We already made a mistake at the first commit.
  2. In addition, this PR requires us to maintain the corresponding generator code.
  3. To remove old entry, we need to remove the source code and regenerate it.

Although I tested this PR with a hope initially, it turned out this is not what we want in the community. Let's not support generating multiple versions. Sorry for objecting your contribution.

@jiangzho
Copy link
Contributor Author

Thanks for the feedback Dongjoon!

I'm thinking that the POJOs may be needed not only for the CRD generation purpose (though this PR only targets that). Without these, our operator seems only listing / watching on v1beta1 despite the chart installs both v1alpha1 and v1beta1. If there's any alpha CRDs alive, the operator won't be reconciling or cleaning them up. For alpha version it could be okay, but we may need a strategy for upgrading versions FMPOV.

With a POJO for the old version, it becomes possible to enable additional lister watcher (could be protected by a config param), and reconcile them (as they can be converted to latest using the `toLatestSparkApplication), and users get the transition time to migrate workload to the next version with operator updated.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 17, 2025

Do you think we can make a test case for this?

If there's any alpha CRDs alive, the operator won't be reconciling or cleaning them up.

@jiangzho
Copy link
Contributor Author

Do you think we can make a test case for this?

I think so. While this PR only focus on the POJOs, it laid the foundation for the next steps. I'd be glad to work on the next patch to adding additional lister watcher, with test scenario(s) included.

@dongjoon-hyun
Copy link
Member

Let's fix this first if there is a bug. Please provide a reproducible example about your claim before going further.

@jiangzho
Copy link
Contributor Author

jiangzho commented Jun 18, 2025

My bad - correct previous statement. The v1alpha1 resource would still be reconciled with current design, it's apiVersion would be overridden to v1beta1 though.

Functionally there's no harm. There might be minor issue if user has an external engine to perform CURD on resources - but even so, I shall come back with a more lightweight solution comparing with add a full set of POJO. I'll reconsider this.

@jiangzho jiangzho closed this Jun 18, 2025
@dongjoon-hyun
Copy link
Member

Thank you for confirming, @jiangzho .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants