You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Exporter] Correctly handle account-level identities when generating the code (#4650)
## Changes
<!-- Summary of your changes that are easy to understand -->
When Unity Catalog metastore is attached to a workspace, the Identity
Federation is enabled on it. With Identity Federation users, service
principals, and groups are coming from the account level via assignment
to a workspace. But there is still an ability to create workspace-level
groups via API, and the `databricks_group` resource uses it and always
creates workspace-level. As a result, we shouldn't generate resources
for account-level groups because they will be turned into
workspace-level groups. Due to the limitations of APIs, we can't use
`databricks_permission_assignment` on the workspace level to emulate the
assignment.
See further explanation in the doc
## Tests
<!--
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->
- [x] `make test` run locally
- [x] relevant change in `docs/` folder
- [ ] covered with integration tests in `internal/acceptance`
- [ ] using Go SDK
- [ ] using TF Plugin Framework
-> **Note**This tooling is experimental and provided as is. It has an evolving interface, which may change or be removed in future provider versions.
6
+
-> This tooling is experimental and provided as is. It has an evolving interface, which may change or be removed in future provider versions.
7
7
8
-
-> **Note**Use the same user who did the exporting to import the exported templates. Otherwise, it could cause changes in the ownership of the objects.
8
+
-> Use the same user who did the exporting to import the exported templates. Otherwise, it could cause changes in the ownership of the objects.
9
9
10
10
Generates `*.tf` files for Databricks resources together with `import.sh` that is used to import objects into the Terraform state. Available as part of provider binary. The only way to authenticate is through [environment variables](../index.md#Environment-variables). It's best used when you need to export Terraform configuration for an existing Databricks workspace quickly. After generating the configuration, we strongly recommend manually reviewing all created files.
11
11
12
12
## Example Usage
13
13
14
-
After downloading the [latest released binary](https://github.com/databricks/terraform-provider-databricks/releases), unpack it and place it in the same folder. You may have already downloaded this binary - check the `.terraform` folder of any state directory where you've used the `databricks` provider. It could also be in your plugin cache `~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks`.
14
+
After downloading the [latest released binary](https://github.com/databricks/terraform-provider-databricks/releases), unpack it and place it in the same folder. You may have already downloaded this binary - check the `.terraform` folder of any state directory where you've used the `databricks` provider. It could also be in your plugin cache `~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks`.
Please note that in the interactive mode, the selected services are passed as the `-listing` option, not as `-services` option (see below).
20
+
-> Please note that in the interactive mode, the selected services are passed as the `-listing` option, not as `-services` option (see below).
22
21
23
22
Exporter can also be used in a non-interactive mode that allows a more granular selection of services and dependencies. For example, the following command will list all resources related to `jobs` and `compute` services and import them with their dependencies from `groups,secrets,access,compute,users,jobs,storage` services.
The exporter is also supported on the account level for resources that could be defined on an account level. For example, we can export everything defined on the account level:
!> **Warning**This tooling was only extensively tested with administrator privileges.
52
+
!> This tooling was only extensively tested with administrator privileges.
55
53
56
54
All arguments are optional, and they tune what code is being generated.
57
55
@@ -62,7 +60,7 @@ All arguments are optional, and they tune what code is being generated.
62
60
*`-services` - Comma-separated list of services to import. By default, all services are imported.
63
61
*`-match` - Match resource names during listing operation. This filter applies to all resources that are getting listed, so if you want to import all dependencies of just one cluster, specify `-match=autoscaling -listing=compute`. By default, it is empty, which matches everything.
64
62
*`-matchRegex` - Match resource names against a given regex during listing operation. Applicable to all resources selected for listing.
65
-
*`-excludeRegex` - Exclude resource names matching a given regex. Applied during the listing operation and has higher priority than `-match` and `-matchRegex`. Applicable to all resources selected for listing. Could be used to exclude things like `databricks_automl` notebooks, etc.
63
+
*`-excludeRegex` - Exclude resource names matching a given regex. Applied during the listing operation and has higher priority than `-match` and `-matchRegex`. Applicable to all resources selected for listing. Could be used to exclude things like `databricks_automl` notebooks, etc.
66
64
*`-filterDirectoriesDuringWorkspaceWalking` - if we should apply match logic to directory names when we're performing workspace tree walking. *Note: be careful with it as it will be applied to all entries, so if you want to filter only specific users, then you will need to specify condition for `/Users` as well, so regex will be `^(/Users|/Users/[a-c].*)$`*.
67
65
*`-mounts` - List DBFS mount points, an extremely slow operation that would not trigger unless explicitly specified.
68
66
*`-generateProviderDeclaration` - the flag that toggles the generation of `databricks.tf` file with the declaration of the Databricks Terraform provider that is necessary for Terraform versions since Terraform 0.13 (disabled by default).
@@ -82,7 +80,7 @@ All arguments are optional, and they tune what code is being generated.
82
80
83
81
### Use of `-listing` and `-services` for granular resources selection
84
82
85
-
The `-listing` option is used to discover resources to export; if it's not specified, then all services are listed (if they have the `List` operation implemented). The `-services` restricts the export of resources only to those resources whose service type is in the list specified by this option.
83
+
The `-listing` option is used to discover resources to export; if it's not specified, then all services are listed (if they have the `List` operation implemented). The `-services` restricts the export of resources only to those resources whose service type is in the list specified by this option.
86
84
87
85
For example, if we have a job comprising two notebooks and one SQL dashboard, and tasks have Python libraries on DBFS attached. If we just specify the `-listing jobs`, then it will export the following resources:
88
86
@@ -106,27 +104,52 @@ but if we also specify `-services notebooks,storage` then it will export only:
106
104
107
105
The rest of the values, like SQL object IDs, etc. will be hard-coded and not portable between workspaces.
108
106
109
-
You can also use predefined aliases (`all` and `uc`) to specify multiple services at once. For example, if `-listing` has value `all,-uc`, then we will discover all services except of Unity Catalog + vector search.
107
+
You can also use predefined aliases (`all` and `uc`) to specify multiple services at once. For example, if `-listing` has value `all,-uc`, then we will discover all services except of Unity Catalog + vector search.
110
108
111
109
We can also exclude specific services For example, we can specify `-services` as `-all,-uc-tables` and then we won't generate code for `databricks_sql_table`.
112
110
111
+
### Migration between workspaces with identity federation enabled
112
+
113
+
When Unity Catalog metastore is attached to a workspace, the Identity Federation is enabled on it. With Identity Federation users, service principals, and groups are coming from the account level via assignment to a workspace. But there is still an ability to create workspace-level groups via API and `databricks_group` resource uses it and always creates workspace-level. As a result, we shouldn't generate resources for account-level groups, because they will be turned into workspace-level groups. Due to the limitations of APIs we can't use `databricks_permission_assignment` on workspace-level to emulate the assignment.
114
+
115
+
So migration of resources between two workspaces with Identity Federation enabled should be done in a few steps:
116
+
117
+
1. On the account level export `databricks_mws_permission_assignment` resources for your source workspace:
2. Replace source workspace ID with destination workspace ID in the generated `idfed.tf` file, i.e. with `sed`:
126
+
127
+
```sh
128
+
sed -ibak -e 's|workspace_id = <source-workspace-id>|workspace_id = <destination-workspace-id>|' idfed.tf
129
+
```
130
+
131
+
and do `terraform apply` on the account level to assign users, service principals, and groups to a destination workspace.
132
+
133
+
3. Export resources from the source workspace using the exporter on the workspace level. It will automatically detect that Identity Federation is enabled and export account-level objects as data sources instead of resources.
134
+
135
+
4. Apply exported code against the destination workspace.
136
+
113
137
## Services
114
138
115
139
Services are just logical groups of resources used for filtering and organization in files written in `-directory`. All resources are globally sorted by their resource name, which allows you to use generated files for compliance purposes. Nevertheless, managing the entire Databricks workspace with Terraform is the preferred way. Except for notebooks and possibly libraries, which may have their own CI/CD processes.
116
140
117
141
Services could be specified in combination with predefined aliases (`all` - for all services and listings, `uc` - for all UC services, including the vector search). The service could be specified as the service name, or it could have `-` prepended to the service, to exclude it from the list (including `-uc` to exclude all UC-related services).
118
142
119
-
-> **Note**
120
-
Please note that for services not marked with **listing**, we'll export resources only if they are referenced from other resources.
143
+
-> Please note that for services not marked with **listing**, we'll export resources only if they are referenced from other resources.
121
144
122
145
*`access` - **listing**[databricks_permissions](../resources/permissions.md), [databricks_instance_profile](../resources/instance_profile.md), [databricks_ip_access_list](../resources/ip_access_list.md), and [databricks_access_control_rule_set](../resources/access_control_rule_set.md). *Please note that for `databricks_permissions` we list only `authorization = "tokens"`, the permissions for other objects (notebooks, ...) will be emitted when corresponding objects are processed!*
*`directories` - **listing**[databricks_directory](../resources/directory.md). *Please note that directories aren't listed when running in the incremental mode! Only directories with updated notebooks will be emitted.*
*`groups` - **listing**[databricks_group](../data-sources/group.md) with [membership](../resources/group_member.md) and [data access](../resources/group_instance_profile.md).
129
-
*`idfed` - **listing**[databricks_mws_permission_assignment](../resources/mws_permission_assignment.md). When listing allows to filter assignment only to specific workspace IDs as specified by `-match`, `-matchRegex` and `-excludeRegex` options. I.e., to export assignments only for two workspaces, use `-matchRegex '^1688808130562317|5493220389262917$'`.
151
+
*`groups` - **listing**[databricks_group](../data-sources/group.md) with [membership](../resources/group_member.md) and [data access](../resources/group_instance_profile.md). If Identity Federation is enabled on the workspace (when UC Metastore is attached), then account-level groups are exposed as data sources because they are defined on account level, and only workspace-level groups are exposed as resources. See the note above on how to perform migration between workspaces with Identity Federation enabled.
152
+
*`idfed` - **listing**[databricks_mws_permission_assignment](../resources/mws_permission_assignment.md). When listing allows filtering assignment only to specific workspace IDs as specified by `-match`, `-matchRegex`, and `-excludeRegex` options. I.e., to export assignments only for two workspaces, use `-matchRegex '^1688808130562317|5493220389262917$'`.
130
153
*`jobs` - **listing**[databricks_job](../resources/job.md). Usually, there are more automated workflows than interactive clusters, so they get their own file in this tool's output. *Please note that workflows deployed and maintained via [Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html) aren't exported!*
*`repos` - **listing**[databricks_repo](../resources/repo.md) (both classical Repos in `/Repos` and Git Folders in artbitrary locations).
162
+
*`repos` - **listing**[databricks_repo](../resources/repo.md) (both classical Repos in `/Repos` and Git Folders in arbitrary locations).
140
163
*`secrets` - **listing**[databricks_secret_scope](../resources/secret_scope.md) along with [keys](../resources/secret.md) and [ACLs](../resources/secret_acl.md).
*`sql-dashboards` - **listing** Legacy [databricks_sql_dashboard](../resources/sql_dashboard.md) along with associated [databricks_sql_widget](../resources/sql_widget.md) and [databricks_sql_visualization](../resources/sql_visualization.md).
@@ -157,7 +180,7 @@ Services could be specified in combination with predefined aliases (`all` - for
157
180
*`uc-system-schemas` - **listing** exports [databricks_system_schema](../resources/system_schema.md) resources for the UC metastore of the current workspace.
158
181
*`uc-tables` - **listing** (*we can't list directly, only via dependencies to top-level object*) [databricks_sql_table](../resources/sql_table.md) resource.
159
182
*`uc-volumes` - **listing** (*we can't list directly, only via dependencies to top-level object*) [databricks_volume](../resources/volume.md)
160
-
*`users` - **listing**[databricks_user](../resources/user.md) and [databricks_service_principal](../resources/service_principal.md) are written to their own file, simply because of their amount. If you use SCIM provisioning, migrating workspaces is the only use case for importing `users`service.
183
+
*`users` - **listing**[databricks_user](../resources/user.md) and [databricks_service_principal](../resources/service_principal.md) are written to their own file, simply because of their amount. If Identity Federation is enabled on the workspace (when UC Metastore is attached), then users and service principals are exposed as data sources because they are defined on an account level. See the note above on how to perform migration between workspaces with Identity Federation enabled.
161
184
*`vector-search` - **listing** exports [databricks_vector_search_endpoint](../resources/vector_search_endpoint.md) and [databricks_vector_search_index](../resources/vector_search_index.md)
162
185
*`wsconf` - **listing** exports Workspace-level configuration: [databricks_workspace_conf](../resources/workspace_conf.md), [databricks_sql_global_config](../resources/sql_global_config.md) and [databricks_global_init_script](../resources/global_init_script.md).
@@ -173,7 +196,7 @@ To speed up export, Terraform Exporter performs many operations, such as listing
173
196
*`EXPORTER_WS_LIST_PARALLELISM` (default: `5`) controls how many Goroutines are used to perform parallel listing of Databricks Workspace objects (notebooks, directories, workspace files, ...).
174
197
*`EXPORTER_DIRECTORIES_CHANNEL_SIZE` (default: `300000`) controls the channel's capacity when listing workspace objects. Please ensure that this value is big enough (greater than the number of directories in the workspace; default value should be ok for most cases); otherwise, there is a chance of deadlock.
175
198
*`EXPORTER_DEDICATED_RESOUSE_CHANNELS` - by default, only specific resources (`databricks_user`, `databricks_service_principal`, `databricks_group`) have dedicated channels - the rest are handled by the shared channel. This is done to prevent throttling by specific APIs. You can override this by providing a comma-separated list of resources as this environment variable.
176
-
*`EXPORTER_PARALLELISM_NNN` - number of Goroutines used to process resources of a specific type (replace `NNN` with the exact resource name, for example, `EXPORTER_PARALLELISM_databricks_notebook=10` sets the number of Goroutines for `databricks_notebook` resource to `10`). There is a shared channel (with name `default`) for handling of resources for which there are no dedicated channels - use `EXPORTER_PARALLELISM_default` to increase its size (default size is `15`). Defaults for some resources are defined by the `goroutinesNumber` map in `exporter/context.go` or equal to `2` if there is no value. *Don't increase default values too much to avoid REST API throttling!*
199
+
*`EXPORTER_PARALLELISM_NNN` - number of Goroutines used to process resources of a specific type (replace `NNN` with the exact resource name, for example, `EXPORTER_PARALLELISM_databricks_notebook=10` sets the number of Goroutines for `databricks_notebook` resource to `10`). There is a shared channel (with name `default`) for handling resources for which there are no dedicated channels - use `EXPORTER_PARALLELISM_default` to increase its size (default size is `15`). Defaults for some resources are defined by the `goroutinesNumber` map in `exporter/context.go` or equal to `2` if there is no value. *Don't increase default values too much to avoid REST API throttling!*
177
200
*`EXPORTER_DEFAULT_HANDLER_CHANNEL_SIZE` is the size of the shared channel (default: `200000`). You may need to increase it if you have a huge workspace.
0 commit comments