[SPARK-52468] Support generating both v1alpha1 and v1beta1 from CRD generator #241

jiangzho · 2025-06-13T00:06:37Z

What changes were proposed in this pull request?

This PR adds support to generate SparkApplication and SparkCluster CRD yaml with both v1alpha1 and v1beta1 version, with storage set to v1beta1.

Why are the changes needed?

With this patch, our gradle task would be able to generate yaml that matches latest changes in CRD POJO, and save us from maintaining the yaml files for new version.

The similar pattern can be used when we promote to future (v1/v2 ...) versions. We may keep the previous version POJO in separate package, leaving them as a reference, and specify the storage version only to the latest.

No

Does this PR introduce any user-facing change?

No

How was this patch tested?

CIs. Also, validated the generated yaml includes both versions.

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun

Thank you, @jiangzho .

dongjoon-hyun

While reviewing the PR, I'm not sure this is worthy to keep all legacy POJO itself of v1alpha1. We are going to remove v1alpha1 eventually because we can do with alpha versions.

Let me play with these and think about more because it is more convenient in principle as you mentioned in the PR description.

With this patch, our gradle task would be able to generate yaml that matches latest changes in CRD POJO, and save us from maintaining the yaml files for new version.

dongjoon-hyun · 2025-06-13T01:43:19Z

BTW, @jiangzho , SPARK-52251 is for crd-generator-cli, not this. Please create a new JIRA issue with this PR. Or, correct me if I'm wrong.

jiangzho · 2025-06-13T01:58:55Z

Thanks Dongjoon for the clarification - created SPARK-52468 for this scope

jiangzho · 2025-06-13T02:04:24Z

I'm not sure this is worthy to keep all legacy POJO itself of v1alpha1. We are going to remove v1alpha1 eventually because we can do with alpha versions.

+1 that we won't keep all legacy versions. Let me clarify - we keep the CRD version(s) that is supported by the current version of operator. We'll keep the POJO inline with what we offer in the chart & yaml, will we not ? They can be removed together when we drop the support for legacy version.

dongjoon-hyun · 2025-06-13T03:58:02Z

Yes, I agree with you that it will be perfect when we keep them in sync.

…enerator ### What changes were proposed in this pull request? This PR adds support to generate SparkApplication and SparkCluster CRD yaml with both v1alpha1 and v1beta1 version, with storage set to v1beta1. ### Why are the changes needed? With this patch, our gradle task would be able to generate yaml that matches latest changes in CRD POJO, and save us from maintaining the yaml files for new version. The similar pattern can be used when we promote to future (v1/v2 ...) versions. We may keep the previous version POJO in separate package, leaving them as a reference, and specify the storage version only to the latest. No ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CIs. Also, validated the generated yaml includes both versions. ### Was this patch authored or co-authored using generative AI tooling? No

dongjoon-hyun

This PR seems to be incomplete.

Also, validated the generated yaml includes both versions.

Here is the result from my verification.

$ ./gradlew build -x test
$ ./gradlew buildDockerImage
$ ./gradlew spark-operator-api:relocateGeneratedCRD
$ git diff
diff --git a/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml b/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml
index c38270a..8a68b82 100644
--- a/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml
+++ b/build-tools/helm/spark-kubernetes-operator/crds/sparkapplications.spark.apache.org-v1.yaml
@@ -1,18 +1,4 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Generated by Fabric8 CRDGenerator, manual edits might get overwritten!
 apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
@@ -27,7 +13,7 @@ spec:
     singular: sparkapplication
   scope: Namespaced
   versions:
-    - name: v1alpha1
+    - name: v1beta1
       schema:
         openAPIV3Schema:
           properties:
@@ -8694,7 +8680,7 @@ spec:
               type: object
           type: object
       served: true
-      storage: false
+      storage: true
       subresources:
         status: {}
       additionalPrinterColumns:
@@ -8704,7 +8690,7 @@ spec:
         - jsonPath: .metadata.creationTimestamp
           name: Age
           type: date
-    - name: v1beta1
+    - name: v1alpha1
       schema:
         openAPIV3Schema:
           properties:
@@ -17371,13 +17357,6 @@ spec:
               type: object
           type: object
       served: true
-      storage: true
+      storage: false
       subresources:
         status: {}
-      additionalPrinterColumns:
-        - jsonPath: .status.currentState.currentStateSummary
-          name: Current State
-          type: string
-        - jsonPath: .metadata.creationTimestamp
-          name: Age
-          type: date
diff --git a/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml b/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml
index 36b8218..bd542a6 100644
--- a/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml
+++ b/build-tools/helm/spark-kubernetes-operator/crds/sparkclusters.spark.apache.org-v1.yaml
@@ -1,18 +1,4 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
+# Generated by Fabric8 CRDGenerator, manual edits might get overwritten!
 apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
@@ -27,7 +13,7 @@ spec:
     singular: sparkcluster
   scope: Namespaced
   versions:
-    - name: v1alpha1
+    - name: v1beta1
       schema:
         openAPIV3Schema:
           properties:
@@ -7305,7 +7291,7 @@ spec:
               type: object
           type: object
       served: true
-      storage: false
+      storage: true
       subresources:
         status: {}
       additionalPrinterColumns:
@@ -7315,7 +7301,7 @@ spec:
         - jsonPath: .metadata.creationTimestamp
           name: Age
           type: date
-    - name: v1beta1
+    - name: v1alpha1
       schema:
         openAPIV3Schema:
           properties:
@@ -14593,13 +14579,6 @@ spec:
               type: object
           type: object
       served: true
-      storage: true
+      storage: false
       subresources:
         status: {}
-      additionalPrinterColumns:
-        - jsonPath: .status.currentState.currentStateSummary
-          name: Current State
-          type: string
-        - jsonPath: .metadata.creationTimestamp
-          name: Age
-          type: date

dongjoon-hyun · 2025-06-17T15:38:33Z

To @jiangzho , the AS-IS methodology is very simpler than you are trying to do.
To be short, we only generate the current one and insert it to the existing yaml file additionally. The deletion is also simpler than this. We clean up the YAML file simply by removing the old entries.

As we see in this PR, this PR's approach is error prone.

We already made a mistake at the first commit.
In addition, this PR requires us to maintain the corresponding generator code.
To remove old entry, we need to remove the source code and regenerate it.

Although I tested this PR with a hope initially, it turned out this is not what we want in the community. Let's not support generating multiple versions. Sorry for objecting your contribution.

jiangzho · 2025-06-17T18:03:23Z

Thanks for the feedback Dongjoon!

I'm thinking that the POJOs may be needed not only for the CRD generation purpose (though this PR only targets that). Without these, our operator seems only listing / watching on v1beta1 despite the chart installs both v1alpha1 and v1beta1. If there's any alpha CRDs alive, the operator won't be reconciling or cleaning them up. For alpha version it could be okay, but we may need a strategy for upgrading versions FMPOV.

With a POJO for the old version, it becomes possible to enable additional lister watcher (could be protected by a config param), and reconcile them (as they can be converted to latest using the `toLatestSparkApplication), and users get the transition time to migrate workload to the next version with operator updated.

dongjoon-hyun · 2025-06-17T19:58:32Z

Do you think we can make a test case for this?

If there's any alpha CRDs alive, the operator won't be reconciling or cleaning them up.

jiangzho · 2025-06-17T23:34:58Z

Do you think we can make a test case for this?

I think so. While this PR only focus on the POJOs, it laid the foundation for the next steps. I'd be glad to work on the next patch to adding additional lister watcher, with test scenario(s) included.

dongjoon-hyun · 2025-06-18T15:22:21Z

Let's fix this first if there is a bug. Please provide a reproducible example about your claim before going further.

jiangzho · 2025-06-18T20:54:01Z

My bad - correct previous statement. The v1alpha1 resource would still be reconciled with current design, it's apiVersion would be overridden to v1beta1 though.

Functionally there's no harm. There might be minor issue if user has an external engine to perform CURD on resources - but even so, I shall come back with a more lightweight solution comparing with add a full set of POJO. I'll reconsider this.

dongjoon-hyun · 2025-06-18T22:14:24Z

Thank you for confirming, @jiangzho .

github-actions bot added the API label Jun 13, 2025

dongjoon-hyun reviewed Jun 13, 2025

View reviewed changes

jiangzho changed the title ~~[SPARK-52251] Support generating both v1alpha1 and v1beta1 from CRD generator~~ [SPARK-52468] Support generating both v1alpha1 and v1beta1 from CRD generator Jun 13, 2025

jiangzho force-pushed the cleanup branch from 05f2716 to 2ba27f3 Compare June 13, 2025 18:25

dongjoon-hyun requested changes Jun 13, 2025

View reviewed changes

Fix printer columns patch to include all versions

ee1fcd3

jiangzho closed this Jun 18, 2025

[SPARK-52468] Support generating both v1alpha1 and v1beta1 from CRD generator #241

[SPARK-52468] Support generating both v1alpha1 and v1beta1 from CRD generator #241

Uh oh!

Conversation

jiangzho commented Jun 13, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 13, 2025

Uh oh!

jiangzho commented Jun 13, 2025

Uh oh!

jiangzho commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jun 13, 2025

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 17, 2025

Uh oh!

jiangzho commented Jun 17, 2025

Uh oh!

dongjoon-hyun commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangzho commented Jun 17, 2025

Uh oh!

dongjoon-hyun commented Jun 18, 2025

Uh oh!

jiangzho commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dongjoon-hyun left a comment •

edited

Loading

jiangzho commented Jun 13, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Jun 17, 2025 •

edited

Loading

jiangzho commented Jun 18, 2025 •

edited

Loading