@@ -136,6 +136,9 @@ resource "google_dataproc_cluster" "accelerated_cluster" {
136136 instances in the cluster. GCP generates some itself including ` goog-dataproc-cluster-name `
137137 which is the name of the cluster.
138138
139+ * ` virtual_cluster_config ` - (Optional) Allows you to configure a virtual Dataproc on GKE cluster.
140+ Structure [ defined below] ( #nested_virtual_cluster_config ) .
141+
139142* ` cluster_config ` - (Optional) Allows you to configure various aspects of the cluster.
140143 Structure [ defined below] ( #nested_cluster_config ) .
141144
@@ -149,6 +152,161 @@ resource "google_dataproc_cluster" "accelerated_cluster" {
149152 For more context see the [ docs] ( https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters/patch#query-parameters )
150153- - -
151154
155+ <a name =" nested_virtual_cluster_config " ></a >The ` virtual_cluster_config ` block supports:
156+
157+ ``` hcl
158+ virtual_cluster_config {
159+ auxiliary_services_config { ... }
160+ kubernetes_cluster_config { ... }
161+ }
162+ ```
163+
164+ * ` staging_bucket ` - (Optional) The Cloud Storage staging bucket used to stage files,
165+ such as Hadoop jars, between client machines and the cluster.
166+ Note: If you don't explicitly specify a ` staging_bucket `
167+ then GCP will auto create / assign one for you. However, you are not guaranteed
168+ an auto generated bucket which is solely dedicated to your cluster; it may be shared
169+ with other clusters in the same region/zone also choosing to use the auto generation
170+ option.
171+
172+ * ` auxiliary_services_config ` (Optional) Configuration of auxiliary services used by this cluster.
173+ Structure [ defined below] ( #nested_auxiliary_services_config ) .
174+
175+ * ` kubernetes_cluster_config ` (Required) The configuration for running the Dataproc cluster on Kubernetes.
176+ Structure [ defined below] ( #nested_kubernetes_cluster_config ) .
177+ - - -
178+
179+ <a name =" nested_auxiliary_services_config " ></a >The ` auxiliary_services_config ` block supports:
180+
181+ ``` hcl
182+ virtual_cluster_config {
183+ auxiliary_services_config {
184+ metastore_config {
185+ dataproc_metastore_service = google_dataproc_metastore_service.metastore_service.id
186+ }
187+
188+ spark_history_server_config {
189+ dataproc_cluster = google_dataproc_cluster.dataproc_cluster.id
190+ }
191+ }
192+ }
193+ ```
194+
195+ * ` metastore_config ` (Optional) The Hive Metastore configuration for this workload.
196+
197+ * ` dataproc_metastore_service ` (Required) Resource name of an existing Dataproc Metastore service.
198+
199+ * ` spark_history_server_config ` (Optional) The Spark History Server configuration for the workload.
200+
201+ * ` dataproc_cluster ` (Optional) Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.
202+ - - -
203+
204+ <a name =" nested_kubernetes_cluster_config " ></a >The ` kubernetes_cluster_config ` block supports:
205+
206+ ``` hcl
207+ virtual_cluster_config {
208+ kubernetes_cluster_config {
209+ kubernetes_namespace = "foobar"
210+
211+ kubernetes_software_config {
212+ component_version = {
213+ "SPARK" : "3.1-dataproc-7"
214+ }
215+
216+ properties = {
217+ "spark:spark.eventLog.enabled": "true"
218+ }
219+ }
220+
221+ gke_cluster_config {
222+ gke_cluster_target = google_container_cluster.primary.id
223+
224+ node_pool_target {
225+ node_pool = "dpgke"
226+ roles = ["DEFAULT"]
227+
228+ node_pool_config {
229+ autoscaling {
230+ min_node_count = 1
231+ max_node_count = 6
232+ }
233+
234+ config {
235+ machine_type = "n1-standard-4"
236+ preemptible = true
237+ local_ssd_count = 1
238+ min_cpu_platform = "Intel Sandy Bridge"
239+ }
240+
241+ locations = ["us-central1-c"]
242+ }
243+ }
244+ }
245+ }
246+ }
247+ ```
248+
249+ * ` kubernetes_namespace ` (Optional) A namespace within the Kubernetes cluster to deploy into.
250+ If this namespace does not exist, it is created.
251+ If it exists, Dataproc verifies that another Dataproc VirtualCluster is not installed into it.
252+ If not specified, the name of the Dataproc Cluster is used.
253+
254+ * ` kubernetes_software_config ` (Required) The software configuration for this Dataproc cluster running on Kubernetes.
255+
256+ * ` component_version ` (Required) The components that should be installed in this Dataproc cluster. The key must be a string from the
257+ KubernetesComponent enumeration. The value is the version of the software to be installed. At least one entry must be specified.
258+ * ** NOTE** : ` component_version[SPARK] ` is mandatory to set, or the creation of the cluster will fail.
259+
260+ * ` properties ` (Optional) The properties to set on daemon config files. Property keys are specified in prefix: property format,
261+ for example spark: spark .kubernetes.container.image.
262+
263+ * ` gke_cluster_config ` (Required) The configuration for running the Dataproc cluster on GKE.
264+
265+ * ` gke_cluster_target ` (Optional) A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster
266+ (the GKE cluster can be zonal or regional)
267+
268+ * ` node_pool_target ` (Optional) GKE node pools where workloads will be scheduled. At least one node pool must be assigned the ` DEFAULT `
269+ GkeNodePoolTarget.Role. If a GkeNodePoolTarget is not specified, Dataproc constructs a ` DEFAULT ` GkeNodePoolTarget.
270+ Each role can be given to only one GkeNodePoolTarget. All node pools must have the same location settings.
271+
272+ * ` node_pool ` (Required) The target GKE node pool.
273+
274+ * ` roles ` (Required) The roles associated with the GKE node pool.
275+ One of ` "DEFAULT" ` , ` "CONTROLLER" ` , ` "SPARK_DRIVER" ` or ` "SPARK_EXECUTOR" ` .
276+
277+ * ` node_pool_config ` (Input only) The configuration for the GKE node pool.
278+ If specified, Dataproc attempts to create a node pool with the specified shape.
279+ If one with the same name already exists, it is verified against all specified fields.
280+ If a field differs, the virtual cluster creation will fail.
281+
282+ * ` autoscaling ` (Optional) The autoscaler configuration for this node pool.
283+ The autoscaler is enabled only when a valid configuration is present.
284+
285+ * ` min_node_count ` (Optional) The minimum number of nodes in the node pool. Must be >= 0 and <= maxNodeCount.
286+
287+ * ` max_node_count ` (Optional) The maximum number of nodes in the node pool. Must be >= minNodeCount, and must be > 0.
288+
289+ * ` config ` (Optional) The node pool configuration.
290+
291+ * ` machine_type ` (Optional) The name of a Compute Engine machine type.
292+
293+ * ` local_ssd_count ` (Optional) The number of local SSD disks to attach to the node,
294+ which is limited by the maximum number of disks allowable per zone.
295+
296+ * ` preemptible ` (Optional) Whether the nodes are created as preemptible VM instances.
297+ Preemptible nodes cannot be used in a node pool with the CONTROLLER role or in the DEFAULT node pool if the
298+ CONTROLLER role is not assigned (the DEFAULT node pool will assume the CONTROLLER role).
299+
300+ * ` min_cpu_platform ` (Optional) Minimum CPU platform to be used by this instance.
301+ The instance may be scheduled on the specified or a newer CPU platform.
302+ Specify the friendly names of CPU platforms, such as "Intel Haswell" or "Intel Sandy Bridge".
303+
304+ * ` spot ` (Optional) Spot flag for enabling Spot VM, which is a rebrand of the existing preemptible flag.
305+
306+ * ` locations ` (Optional) The list of Compute Engine zones where node pool nodes associated
307+ with a Dataproc on GKE virtual cluster will be located.
308+ - - -
309+
152310<a name =" nested_cluster_config " ></a >The ` cluster_config ` block supports:
153311
154312``` hcl
0 commit comments