Skip to content

Commit 20dc1ff

Browse files
JasonWHowellSyntaxC4
authored andcommitted
Fixing numbered list formatting
1 parent 035392b commit 20dc1ff

File tree

1 file changed

+20
-18
lines changed

1 file changed

+20
-18
lines changed

articles/azure-databricks/howto-regional-disaster-recovery.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The Databricks control plane manages and monitors the Databricks workspace envir
2222
![Databricks control plane architecture](media/howto-regional-disaster-recovery/databricks-control-plane.png)
2323

2424
One of the advantages of this architecture is that users can connect Azure Databricks to any storage resource in their account. A key benefit is that both compute (Azure Databricks) and storage can be scaled independently of each other.
25-
25+
2626
## How to create a regional disaster recovery topology
2727

2828
As you notice in the preceding architecture description, there are a number of components used for a Big Data pipeline with Azure Databricks: Azure Storage, Azure Database, and other data sources. Azure Databricks is the *compute* for the Big Data pipeline. It is *ephemeral* in nature, meaning that while your data is still available in Azure Storage, the *compute* (Azure Databricks cluster) can be terminated so that you don’t have to pay for compute when you don’t need it. The *compute* (Azure Databricks) and storage sources must be in the same region so that jobs don’t experience high latency.
@@ -50,8 +50,8 @@ To create your own regional disaster recovery topology, follow these requirement
5050
> [!NOTE]
5151
> Any python scripts provided in this article are expected to work with Python 2.7+ < 3.x.
5252
53-
**2. Configure two profiles.**
54-
53+
2. **Configure two profiles.**
54+
5555
Configure one for the primary workspace, and another one for the secondary workspace:
5656

5757
```bash
@@ -60,22 +60,24 @@ To create your own regional disaster recovery topology, follow these requirement
6060
```
6161

6262
The code blocks in this article switch between profiles in each subsequent step using the corresponding workspace command. Be sure that the names of the profiles you create are substituted into each code block.
63+
6364
```python
6465
EXPORT_PROFILE = "primary"
6566
IMPORT_PROFILE = "secondary"
6667
```
6768

6869
You can manually switch at the command line if needed:
70+
6971
```bash
7072
databricks workspace ls --profile primary
7173
databricks workspace ls --profile secondary
7274
```
7375

74-
**3. Migrate Azure Active Directory users**
76+
3. **Migrate Azure Active Directory users**
7577

7678
Manually add the same Azure Active Directory users to the secondary workspace that exist in primary workspace.
7779

78-
**4. Migrate the user folders and notebooks**
80+
4. **Migrate the user folders and notebooks**
7981

8082
Use the following python code to migrate the sandboxed user environments, which include the nested folder structure and notebooks per user.
8183

@@ -86,23 +88,23 @@ To create your own regional disaster recovery topology, follow these requirement
8688

8789
```python
8890
from subprocess import call, check_output
89-
91+
9092
EXPORT_PROFILE = "primary"
9193
IMPORT_PROFILE = "secondary"
92-
94+
9395
# Get a list of all users
9496
user_list_out = check_output(["databricks", "workspace", "ls", "/Users", "--profile", EXPORT_PROFILE])
9597
user_list = user_list_out.splitlines()
96-
98+
9799
# Export sandboxed environment (folders, notebooks) for each user and import into new workspace.
98100
# Libraries are not included with these APIs / commands.
99-
101+
100102
for user in user_list:
101103
print "Trying to migrate workspace for user " + user
102-
104+
103105
call("mkdir -p " + user, shell=True)
104106
export_exit_status = call("databricks workspace export_dir /Users/" + user + " ./" + user + " --profile " + EXPORT_PROFILE, shell=True)
105-
107+
106108
if export_exit_status==0:
107109
print "Export Success"
108110
import_exit_status = call("databricks workspace import_dir ./" + user + " /Users/" + user + " --profile " + IMPORT_PROFILE, shell=True)
@@ -112,11 +114,11 @@ To create your own regional disaster recovery topology, follow these requirement
112114
print "Import Failure"
113115
else:
114116
print "Export Failure"
115-
117+
116118
print "All done"
117119
```
118120

119-
**5. Migrate the cluster configurations**
121+
5. **Migrate the cluster configurations**
120122

121123
Once notebooks have been migrated, you can optionally migrate the cluster configurations to the new workspace. It's almost a fully automated step using databricks-cli, unless you would like to do selective cluster config migration rather than for all.
122124

@@ -169,7 +171,7 @@ To create your own regional disaster recovery topology, follow these requirement
169171
print "All done"
170172
```
171173

172-
**6. Migrate the jobs configuration**
174+
6. **Migrate the jobs configuration**
173175

174176
If you migrated cluster configurations in the previous step, you can opt to migrate job configurations to the new workspace. It is a fully automated step using databricks-cli, unless you would like to do selective job config migration rather than doing it for all jobs.
175177

@@ -233,15 +235,15 @@ To create your own regional disaster recovery topology, follow these requirement
233235
print "All done"
234236
```
235237

236-
**7. Migrate libraries**
238+
7. **Migrate libraries**
237239

238240
There's currently no straightforward way to migrate libraries from one workspace to another. Reinstall those libraries into the new workspace. Hence this step is mostly manual. This is possible to automate using combination of [DBFS CLI](https://github.com/databricks/databricks-cli#dbfs-cli-examples) to upload custom libraries to the workspace and [Libraries CLI](https://github.com/databricks/databricks-cli#libraries-cli).
239241

240-
**8. Migrate Azure blob storage and Azure Data Lake Store mounts**
242+
8. **Migrate Azure blob storage and Azure Data Lake Store mounts**
241243

242244
Manually remount all [Azure Blob storage](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html) and [Azure Data Lake Store (Gen 1)](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html) mount points using a notebook-based solution. The storage resources would have been mounted in the primary workspace, and that has to be repeated in the secondary workspace. There is no external API for mounts.
243245

244-
**9. Migrate cluster init scripts**
246+
9. **Migrate cluster init scripts**
245247

246248
Any cluster initialization scripts can be migrated from old to new workspace using the [DBFS CLI](https://github.com/databricks/databricks-cli#dbfs-cli-examples). First, copy the needed scripts from "dbfs:/dat abricks/init/.." to your local desktop or virtual machine. Next, copy those scripts into the new workspace at the same path.
247249

@@ -253,7 +255,7 @@ To create your own regional disaster recovery topology, follow these requirement
253255
dbfs cp -r old-ws-init-scripts dbfs:/databricks/init --profile secondary
254256
```
255257

256-
**1. Manually reconfigure and reapply access control.**
258+
10. **Manually reconfigure and reapply access control.**
257259

258260
If your existing primary workspace is configured to use the Premium tier (SKU), it's likely you also are using the [Access Control feature](https://docs.azuredatabricks.net/administration-guide/admin-settings/index.html#manage-access-control).
259261

0 commit comments

Comments
 (0)