Skip to content

Commit ac59ba3

Browse files
authored
[Exporter] Correctly handle account-level identities when generating the code (#4650)
## Changes <!-- Summary of your changes that are easy to understand --> When Unity Catalog metastore is attached to a workspace, the Identity Federation is enabled on it. With Identity Federation users, service principals, and groups are coming from the account level via assignment to a workspace. But there is still an ability to create workspace-level groups via API, and the `databricks_group` resource uses it and always creates workspace-level. As a result, we shouldn't generate resources for account-level groups because they will be turned into workspace-level groups. Due to the limitations of APIs, we can't use `databricks_permission_assignment` on the workspace level to emulate the assignment. See further explanation in the doc ## Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] `make test` run locally - [x] relevant change in `docs/` folder - [ ] covered with integration tests in `internal/acceptance` - [ ] using Go SDK - [ ] using TF Plugin Framework
1 parent 42dd50c commit ac59ba3

File tree

6 files changed

+141
-63
lines changed

6 files changed

+141
-63
lines changed

NEXT_CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,6 @@
1212

1313
### Exporter
1414

15+
* Correctly handle account-level identities when generating the code ([#4650](https://github.com/databricks/terraform-provider-databricks/pull/4650))
16+
1517
### Internal Changes

docs/guides/experimental-exporter.md

Lines changed: 49 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,55 +3,53 @@ page_title: "Experimental resource exporter"
33
---
44
# Experimental resource exporter
55

6-
-> **Note** This tooling is experimental and provided as is. It has an evolving interface, which may change or be removed in future provider versions.
6+
-> This tooling is experimental and provided as is. It has an evolving interface, which may change or be removed in future provider versions.
77

8-
-> **Note** Use the same user who did the exporting to import the exported templates. Otherwise, it could cause changes in the ownership of the objects.
8+
-> Use the same user who did the exporting to import the exported templates. Otherwise, it could cause changes in the ownership of the objects.
99

1010
Generates `*.tf` files for Databricks resources together with `import.sh` that is used to import objects into the Terraform state. Available as part of provider binary. The only way to authenticate is through [environment variables](../index.md#Environment-variables). It's best used when you need to export Terraform configuration for an existing Databricks workspace quickly. After generating the configuration, we strongly recommend manually reviewing all created files.
1111

1212
## Example Usage
1313

14-
After downloading the [latest released binary](https://github.com/databricks/terraform-provider-databricks/releases), unpack it and place it in the same folder. You may have already downloaded this binary - check the `.terraform` folder of any state directory where you've used the `databricks` provider. It could also be in your plugin cache `~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks`.
14+
After downloading the [latest released binary](https://github.com/databricks/terraform-provider-databricks/releases), unpack it and place it in the same folder. You may have already downloaded this binary - check the `.terraform` folder of any state directory where you've used the `databricks` provider. It could also be in your plugin cache `~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks`.
1515

1616
Here's the tool in action:
1717

1818
[![asciicast](https://asciinema.org/a/Rv8ZFJQpfrfp6ggWddjtyXaOy.svg)](https://asciinema.org/a/Rv8ZFJQpfrfp6ggWddjtyXaOy)
1919

20-
-> **Note**
21-
Please note that in the interactive mode, the selected services are passed as the `-listing` option, not as `-services` option (see below).
20+
-> Please note that in the interactive mode, the selected services are passed as the `-listing` option, not as `-services` option (see below).
2221

2322
Exporter can also be used in a non-interactive mode that allows a more granular selection of services and dependencies. For example, the following command will list all resources related to `jobs` and `compute` services and import them with their dependencies from `groups,secrets,access,compute,users,jobs,storage` services.
2423

2524
```bash
2625
export DATABRICKS_HOST=...
2726
export DATABRICKS_TOKEN=...
2827
./terraform-provider-databricks exporter -skip-interactive \
29-
-services=groups,secrets,access,compute,users,jobs,storage \
30-
-listing=jobs,compute \
31-
-debug
28+
-services=groups,secrets,access,compute,users,jobs,storage \
29+
-listing=jobs,compute \
30+
-debug
3231
```
3332

3433
The exporter is also supported on the account level for resources that could be defined on an account level. For example, we can export everything defined on the account level:
3534

36-
```
35+
```sh
3736
export DATABRICKS_HOST=https://accounts.azuredatabricks.net
3837
export DATABRICKS_ACCOUNT_ID=...
3938
./terraform-provider-databricks exporter -skip-interactive
4039
```
4140

4241
Or export only specific resources - users and groups:
4342

44-
```
45-
DATABRICKS_HOST=https://accounts.azuredatabricks.net \
46-
DATABRICKS_ACCOUNT_ID=<uuid> \
47-
./terraform-provider-databricks exporter -directory output \
48-
-listing groups,users -skip-interactive
43+
```sh
44+
DATABRICKS_HOST=https://accounts.azuredatabricks.net \
45+
DATABRICKS_ACCOUNT_ID=<uuid> \
46+
./terraform-provider-databricks exporter -directory output \
47+
-listing groups,users -skip-interactive
4948
```
5049

51-
5250
## Argument Reference
5351

54-
!> **Warning** This tooling was only extensively tested with administrator privileges.
52+
!> This tooling was only extensively tested with administrator privileges.
5553

5654
All arguments are optional, and they tune what code is being generated.
5755

@@ -62,7 +60,7 @@ All arguments are optional, and they tune what code is being generated.
6260
* `-services` - Comma-separated list of services to import. By default, all services are imported.
6361
* `-match` - Match resource names during listing operation. This filter applies to all resources that are getting listed, so if you want to import all dependencies of just one cluster, specify `-match=autoscaling -listing=compute`. By default, it is empty, which matches everything.
6462
* `-matchRegex` - Match resource names against a given regex during listing operation. Applicable to all resources selected for listing.
65-
* `-excludeRegex` - Exclude resource names matching a given regex. Applied during the listing operation and has higher priority than `-match` and `-matchRegex`. Applicable to all resources selected for listing. Could be used to exclude things like `databricks_automl` notebooks, etc.
63+
* `-excludeRegex` - Exclude resource names matching a given regex. Applied during the listing operation and has higher priority than `-match` and `-matchRegex`. Applicable to all resources selected for listing. Could be used to exclude things like `databricks_automl` notebooks, etc.
6664
* `-filterDirectoriesDuringWorkspaceWalking` - if we should apply match logic to directory names when we're performing workspace tree walking. *Note: be careful with it as it will be applied to all entries, so if you want to filter only specific users, then you will need to specify condition for `/Users` as well, so regex will be `^(/Users|/Users/[a-c].*)$`*.
6765
* `-mounts` - List DBFS mount points, an extremely slow operation that would not trigger unless explicitly specified.
6866
* `-generateProviderDeclaration` - the flag that toggles the generation of `databricks.tf` file with the declaration of the Databricks Terraform provider that is necessary for Terraform versions since Terraform 0.13 (disabled by default).
@@ -82,7 +80,7 @@ All arguments are optional, and they tune what code is being generated.
8280

8381
### Use of `-listing` and `-services` for granular resources selection
8482

85-
The `-listing` option is used to discover resources to export; if it's not specified, then all services are listed (if they have the `List` operation implemented). The `-services` restricts the export of resources only to those resources whose service type is in the list specified by this option.
83+
The `-listing` option is used to discover resources to export; if it's not specified, then all services are listed (if they have the `List` operation implemented). The `-services` restricts the export of resources only to those resources whose service type is in the list specified by this option.
8684

8785
For example, if we have a job comprising two notebooks and one SQL dashboard, and tasks have Python libraries on DBFS attached. If we just specify the `-listing jobs`, then it will export the following resources:
8886

@@ -106,27 +104,52 @@ but if we also specify `-services notebooks,storage` then it will export only:
106104

107105
The rest of the values, like SQL object IDs, etc. will be hard-coded and not portable between workspaces.
108106

109-
You can also use predefined aliases (`all` and `uc`) to specify multiple services at once. For example, if `-listing` has value `all,-uc`, then we will discover all services except of Unity Catalog + vector search.
107+
You can also use predefined aliases (`all` and `uc`) to specify multiple services at once. For example, if `-listing` has value `all,-uc`, then we will discover all services except of Unity Catalog + vector search.
110108

111109
We can also exclude specific services For example, we can specify `-services` as `-all,-uc-tables` and then we won't generate code for `databricks_sql_table`.
112110

111+
### Migration between workspaces with identity federation enabled
112+
113+
When Unity Catalog metastore is attached to a workspace, the Identity Federation is enabled on it. With Identity Federation users, service principals, and groups are coming from the account level via assignment to a workspace. But there is still an ability to create workspace-level groups via API and `databricks_group` resource uses it and always creates workspace-level. As a result, we shouldn't generate resources for account-level groups, because they will be turned into workspace-level groups. Due to the limitations of APIs we can't use `databricks_permission_assignment` on workspace-level to emulate the assignment.
114+
115+
So migration of resources between two workspaces with Identity Federation enabled should be done in a few steps:
116+
117+
1. On the account level export `databricks_mws_permission_assignment` resources for your source workspace:
118+
119+
```sh
120+
DATABRICKS_CONFIG_PROFILE=<cli-profile> DATABRICKS_ACCOUNT_ID=<account-id> ./terraform-provider-databricks exporter \
121+
-matchRegex '^<source-workspace-id>$' -listing idfed -services idfed \
122+
-directory output -skip-interactive -noformat
123+
```
124+
125+
2. Replace source workspace ID with destination workspace ID in the generated `idfed.tf` file, i.e. with `sed`:
126+
127+
```sh
128+
sed -ibak -e 's|workspace_id = <source-workspace-id>|workspace_id = <destination-workspace-id>|' idfed.tf
129+
```
130+
131+
and do `terraform apply` on the account level to assign users, service principals, and groups to a destination workspace.
132+
133+
3. Export resources from the source workspace using the exporter on the workspace level. It will automatically detect that Identity Federation is enabled and export account-level objects as data sources instead of resources.
134+
135+
4. Apply exported code against the destination workspace.
136+
113137
## Services
114138

115139
Services are just logical groups of resources used for filtering and organization in files written in `-directory`. All resources are globally sorted by their resource name, which allows you to use generated files for compliance purposes. Nevertheless, managing the entire Databricks workspace with Terraform is the preferred way. Except for notebooks and possibly libraries, which may have their own CI/CD processes.
116140

117141
Services could be specified in combination with predefined aliases (`all` - for all services and listings, `uc` - for all UC services, including the vector search). The service could be specified as the service name, or it could have `-` prepended to the service, to exclude it from the list (including `-uc` to exclude all UC-related services).
118142

119-
-> **Note**
120-
Please note that for services not marked with **listing**, we'll export resources only if they are referenced from other resources.
143+
-> Please note that for services not marked with **listing**, we'll export resources only if they are referenced from other resources.
121144

122145
* `access` - **listing** [databricks_permissions](../resources/permissions.md), [databricks_instance_profile](../resources/instance_profile.md), [databricks_ip_access_list](../resources/ip_access_list.md), and [databricks_access_control_rule_set](../resources/access_control_rule_set.md). *Please note that for `databricks_permissions` we list only `authorization = "tokens"`, the permissions for other objects (notebooks, ...) will be emitted when corresponding objects are processed!*
123146
* `alerts` - **listing** [databricks_alert](../resources/alert.md).
124147
* `compute` - **listing** [databricks_cluster](../resources/cluster.md).
125148
* `dashboards` - **listing** [databricks_dashboard](../resources/dashboard.md).
126149
* `directories` - **listing** [databricks_directory](../resources/directory.md). *Please note that directories aren't listed when running in the incremental mode! Only directories with updated notebooks will be emitted.*
127150
* `dlt` - **listing** [databricks_pipeline](../resources/pipeline.md).
128-
* `groups` - **listing** [databricks_group](../data-sources/group.md) with [membership](../resources/group_member.md) and [data access](../resources/group_instance_profile.md).
129-
* `idfed` - **listing** [databricks_mws_permission_assignment](../resources/mws_permission_assignment.md). When listing allows to filter assignment only to specific workspace IDs as specified by `-match`, `-matchRegex` and `-excludeRegex` options. I.e., to export assignments only for two workspaces, use `-matchRegex '^1688808130562317|5493220389262917$'`.
151+
* `groups` - **listing** [databricks_group](../data-sources/group.md) with [membership](../resources/group_member.md) and [data access](../resources/group_instance_profile.md). If Identity Federation is enabled on the workspace (when UC Metastore is attached), then account-level groups are exposed as data sources because they are defined on account level, and only workspace-level groups are exposed as resources. See the note above on how to perform migration between workspaces with Identity Federation enabled.
152+
* `idfed` - **listing** [databricks_mws_permission_assignment](../resources/mws_permission_assignment.md). When listing allows filtering assignment only to specific workspace IDs as specified by `-match`, `-matchRegex`, and `-excludeRegex` options. I.e., to export assignments only for two workspaces, use `-matchRegex '^1688808130562317|5493220389262917$'`.
130153
* `jobs` - **listing** [databricks_job](../resources/job.md). Usually, there are more automated workflows than interactive clusters, so they get their own file in this tool's output. *Please note that workflows deployed and maintained via [Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html) aren't exported!*
131154
* `mlflow-webhooks` - **listing** [databricks_mlflow_webhook](../resources/mlflow_webhook.md).
132155
* `model-serving` - **listing** [databricks_model_serving](../resources/model_serving.md).
@@ -136,7 +159,7 @@ Services could be specified in combination with predefined aliases (`all` - for
136159
* `policies` - **listing** [databricks_cluster_policy](../resources/cluster_policy).
137160
* `pools` - **listing** [instance pools](../resources/instance_pool.md).
138161
* `queries` - **listing** [databricks_query](../resources/query.md).
139-
* `repos` - **listing** [databricks_repo](../resources/repo.md) (both classical Repos in `/Repos` and Git Folders in artbitrary locations).
162+
* `repos` - **listing** [databricks_repo](../resources/repo.md) (both classical Repos in `/Repos` and Git Folders in arbitrary locations).
140163
* `secrets` - **listing** [databricks_secret_scope](../resources/secret_scope.md) along with [keys](../resources/secret.md) and [ACLs](../resources/secret_acl.md).
141164
* `settings` - **listing** [databricks_notification_destination](../resources/notification_destination.md).
142165
* `sql-dashboards` - **listing** Legacy [databricks_sql_dashboard](../resources/sql_dashboard.md) along with associated [databricks_sql_widget](../resources/sql_widget.md) and [databricks_sql_visualization](../resources/sql_visualization.md).
@@ -157,7 +180,7 @@ Services could be specified in combination with predefined aliases (`all` - for
157180
* `uc-system-schemas` - **listing** exports [databricks_system_schema](../resources/system_schema.md) resources for the UC metastore of the current workspace.
158181
* `uc-tables` - **listing** (*we can't list directly, only via dependencies to top-level object*) [databricks_sql_table](../resources/sql_table.md) resource.
159182
* `uc-volumes` - **listing** (*we can't list directly, only via dependencies to top-level object*) [databricks_volume](../resources/volume.md)
160-
* `users` - **listing** [databricks_user](../resources/user.md) and [databricks_service_principal](../resources/service_principal.md) are written to their own file, simply because of their amount. If you use SCIM provisioning, migrating workspaces is the only use case for importing `users` service.
183+
* `users` - **listing** [databricks_user](../resources/user.md) and [databricks_service_principal](../resources/service_principal.md) are written to their own file, simply because of their amount. If Identity Federation is enabled on the workspace (when UC Metastore is attached), then users and service principals are exposed as data sources because they are defined on an account level. See the note above on how to perform migration between workspaces with Identity Federation enabled.
161184
* `vector-search` - **listing** exports [databricks_vector_search_endpoint](../resources/vector_search_endpoint.md) and [databricks_vector_search_index](../resources/vector_search_index.md)
162185
* `wsconf` - **listing** exports Workspace-level configuration: [databricks_workspace_conf](../resources/workspace_conf.md), [databricks_sql_global_config](../resources/sql_global_config.md) and [databricks_global_init_script](../resources/global_init_script.md).
163186
* `wsfiles` - **listing** [databricks_workspace_file](../resources/workspace_file.md).
@@ -173,7 +196,7 @@ To speed up export, Terraform Exporter performs many operations, such as listing
173196
* `EXPORTER_WS_LIST_PARALLELISM` (default: `5`) controls how many Goroutines are used to perform parallel listing of Databricks Workspace objects (notebooks, directories, workspace files, ...).
174197
* `EXPORTER_DIRECTORIES_CHANNEL_SIZE` (default: `300000`) controls the channel's capacity when listing workspace objects. Please ensure that this value is big enough (greater than the number of directories in the workspace; default value should be ok for most cases); otherwise, there is a chance of deadlock.
175198
* `EXPORTER_DEDICATED_RESOUSE_CHANNELS` - by default, only specific resources (`databricks_user`, `databricks_service_principal`, `databricks_group`) have dedicated channels - the rest are handled by the shared channel. This is done to prevent throttling by specific APIs. You can override this by providing a comma-separated list of resources as this environment variable.
176-
* `EXPORTER_PARALLELISM_NNN` - number of Goroutines used to process resources of a specific type (replace `NNN` with the exact resource name, for example, `EXPORTER_PARALLELISM_databricks_notebook=10` sets the number of Goroutines for `databricks_notebook` resource to `10`). There is a shared channel (with name `default`) for handling of resources for which there are no dedicated channels - use `EXPORTER_PARALLELISM_default` to increase its size (default size is `15`). Defaults for some resources are defined by the `goroutinesNumber` map in `exporter/context.go` or equal to `2` if there is no value. *Don't increase default values too much to avoid REST API throttling!*
199+
* `EXPORTER_PARALLELISM_NNN` - number of Goroutines used to process resources of a specific type (replace `NNN` with the exact resource name, for example, `EXPORTER_PARALLELISM_databricks_notebook=10` sets the number of Goroutines for `databricks_notebook` resource to `10`). There is a shared channel (with name `default`) for handling resources for which there are no dedicated channels - use `EXPORTER_PARALLELISM_default` to increase its size (default size is `15`). Defaults for some resources are defined by the `goroutinesNumber` map in `exporter/context.go` or equal to `2` if there is no value. *Don't increase default values too much to avoid REST API throttling!*
177200
* `EXPORTER_DEFAULT_HANDLER_CHANNEL_SIZE` is the size of the shared channel (default: `200000`). You may need to increase it if you have a huge workspace.
178201

179202
## Support Matrix

0 commit comments

Comments
 (0)