You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-databricks/howto-regional-disaster-recovery.md
+20-18Lines changed: 20 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ The Databricks control plane manages and monitors the Databricks workspace envir
22
22

23
23
24
24
One of the advantages of this architecture is that users can connect Azure Databricks to any storage resource in their account. A key benefit is that both compute (Azure Databricks) and storage can be scaled independently of each other.
25
-
25
+
26
26
## How to create a regional disaster recovery topology
27
27
28
28
As you notice in the preceding architecture description, there are a number of components used for a Big Data pipeline with Azure Databricks: Azure Storage, Azure Database, and other data sources. Azure Databricks is the *compute* for the Big Data pipeline. It is *ephemeral* in nature, meaning that while your data is still available in Azure Storage, the *compute* (Azure Databricks cluster) can be terminated so that you don’t have to pay for compute when you don’t need it. The *compute* (Azure Databricks) and storage sources must be in the same region so that jobs don’t experience high latency.
@@ -50,8 +50,8 @@ To create your own regional disaster recovery topology, follow these requirement
50
50
> [!NOTE]
51
51
> Any python scripts provided in this article are expected to work with Python 2.7+ < 3.x.
52
52
53
-
**2. Configure two profiles.**
54
-
53
+
2.**Configure two profiles.**
54
+
55
55
Configure one for the primary workspace, and another one for the secondary workspace:
56
56
57
57
```bash
@@ -60,22 +60,24 @@ To create your own regional disaster recovery topology, follow these requirement
60
60
```
61
61
62
62
The code blocks in this article switch between profiles in each subsequent step using the corresponding workspace command. Be sure that the names of the profiles you create are substituted into each code block.
63
+
63
64
```python
64
65
EXPORT_PROFILE="primary"
65
66
IMPORT_PROFILE="secondary"
66
67
```
67
68
68
69
You can manually switch at the command line if needed:
70
+
69
71
```bash
70
72
databricks workspace ls --profile primary
71
73
databricks workspace ls --profile secondary
72
74
```
73
75
74
-
**3. Migrate Azure Active Directory users**
76
+
3.**Migrate Azure Active Directory users**
75
77
76
78
Manually add the same Azure Active Directory users to the secondary workspace that exist in primary workspace.
77
79
78
-
**4. Migrate the user folders and notebooks**
80
+
4.**Migrate the user folders and notebooks**
79
81
80
82
Use the following python code to migrate the sandboxed user environments, which include the nested folder structure and notebooks per user.
81
83
@@ -86,23 +88,23 @@ To create your own regional disaster recovery topology, follow these requirement
# Export sandboxed environment (folders, notebooks) for each user and import into new workspace.
98
100
# Libraries are not included with these APIs / commands.
99
-
101
+
100
102
for user in user_list:
101
103
print"Trying to migrate workspace for user "+ user
102
-
104
+
103
105
call("mkdir -p "+ user, shell=True)
104
106
export_exit_status = call("databricks workspace export_dir /Users/"+ user +" ./"+ user +" --profile "+EXPORT_PROFILE, shell=True)
105
-
107
+
106
108
if export_exit_status==0:
107
109
print"Export Success"
108
110
import_exit_status = call("databricks workspace import_dir ./"+ user +" /Users/"+ user +" --profile "+IMPORT_PROFILE, shell=True)
@@ -112,11 +114,11 @@ To create your own regional disaster recovery topology, follow these requirement
112
114
print"Import Failure"
113
115
else:
114
116
print"Export Failure"
115
-
117
+
116
118
print"All done"
117
119
```
118
120
119
-
**5. Migrate the cluster configurations**
121
+
5.**Migrate the cluster configurations**
120
122
121
123
Once notebooks have been migrated, you can optionally migrate the cluster configurations to the new workspace. It's almost a fully automated step using databricks-cli, unless you would like to do selective cluster config migration rather than for all.
122
124
@@ -169,7 +171,7 @@ To create your own regional disaster recovery topology, follow these requirement
169
171
print"All done"
170
172
```
171
173
172
-
**6. Migrate the jobs configuration**
174
+
6.**Migrate the jobs configuration**
173
175
174
176
If you migrated cluster configurations in the previous step, you can opt to migrate job configurations to the new workspace. It is a fully automated step using databricks-cli, unless you would like to do selective job config migration rather than doing it for all jobs.
175
177
@@ -233,15 +235,15 @@ To create your own regional disaster recovery topology, follow these requirement
233
235
print"All done"
234
236
```
235
237
236
-
**7. Migrate libraries**
238
+
7.**Migrate libraries**
237
239
238
240
There's currently no straightforward way to migrate libraries from one workspace to another. Reinstall those libraries into the new workspace. Hence this step is mostly manual. This is possible to automate using combination of [DBFS CLI](https://github.com/databricks/databricks-cli#dbfs-cli-examples) to upload custom libraries to the workspace and [Libraries CLI](https://github.com/databricks/databricks-cli#libraries-cli).
239
241
240
-
**8. Migrate Azure blob storage and Azure Data Lake Store mounts**
242
+
8.**Migrate Azure blob storage and Azure Data Lake Store mounts**
241
243
242
244
Manually remount all [Azure Blob storage](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html) and [Azure Data Lake Store (Gen 1)](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html) mount points using a notebook-based solution. The storage resources would have been mounted in the primary workspace, and that has to be repeated in the secondary workspace. There is no external API for mounts.
243
245
244
-
**9. Migrate cluster init scripts**
246
+
9.**Migrate cluster init scripts**
245
247
246
248
Any cluster initialization scripts can be migrated from old to new workspace using the [DBFS CLI](https://github.com/databricks/databricks-cli#dbfs-cli-examples). First, copy the needed scripts from "dbfs:/dat abricks/init/.." to your local desktop or virtual machine. Next, copy those scripts into the new workspace at the same path.
247
249
@@ -253,7 +255,7 @@ To create your own regional disaster recovery topology, follow these requirement
**1. Manually reconfigure and reapply access control.**
258
+
10.**Manually reconfigure and reapply access control.**
257
259
258
260
If your existing primary workspace is configured to use the Premium tier (SKU), it's likely you also are using the [Access Control feature](https://docs.azuredatabricks.net/administration-guide/admin-settings/index.html#manage-access-control).
0 commit comments