Skip to content

Commit a0ad236

Browse files
committed
Merge branch 'master' of https://github.com/MicrosoftDocs/azure-docs-pr into rolyon-landing-storsimple
2 parents bf8aa14 + 71cc3db commit a0ad236

File tree

12 files changed

+218
-337
lines changed

12 files changed

+218
-337
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16880,6 +16880,11 @@
1688016880
"redirect_url": "/azure/storage/blobs/data-lake-storage-upgrade",
1688116881
"redirect_document_id": false
1688216882
},
16883+
{
16884+
"source_path": "articles/storage/blobs/data-lake-storage-upgrade.md",
16885+
"redirect_url": "/azure/storage/blobs/data-lake-storage-migrate-gen1-to-gen2",
16886+
"redirect_document_id": false
16887+
},
1688316888
{
1688416889
"source_path": "articles/storage/blobs/data-lake-storage-integrate-with-azure-services.md",
1688516890
"redirect_url": "/azure/storage/blobs/data-lake-storage-supported-azure-services",

articles/databox/data-box-disk-security.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The following diagram indicates the flow of data through the Azure Data Box Disk
2929

3030
## Security features
3131

32-
Data Box Disk provides a secure solution for data protection by ensuring that only authorized entities can view, modify, or delete your data. The security features for this solution are for the disk and for the associated service ensuring the security of the data stored on them.
32+
Data Box Disk provides a secure solution for data protection by ensuring that only authorized entities can view, modify, or delete your data. The security features for this solution are for the disk and for the associated service ensuring the security of the data stored on them.
3333

3434
### Data Box Disk protection
3535

@@ -43,18 +43,17 @@ The Data Box Disk is protected by the following features:
4343

4444
The data that flows in and out of Data Box Disk is protected by the following features:
4545

46-
- BitLocker encryption of data at all times.
46+
- BitLocker encryption of data at all times.
4747
- Secure erasure of data from disk once data upload to Azure is complete. Data erasure is in accordance with NIST 800-88r1 standards.
4848

4949
### Data Box service protection
5050

5151
The Data Box service is protected by the following features.
5252

5353
- Access to the Data Box Disk service requires that your organization has an Azure subscription that includes Data Box Disk. Your subscription governs the features that you can access in the Azure portal.
54-
- Because the Data Box service is hosted in Azure, it is protected by the Azure security features. For more information about the security features provided by Microsoft Azure, go to the [Microsoft Azure Trust Center](https://www.microsoft.com/TrustCenter/Security/default.aspx).
54+
- Because the Data Box service is hosted in Azure, it is protected by the Azure security features. For more information about the security features provided by Microsoft Azure, go to the [Microsoft Azure Trust Center](https://www.microsoft.com/TrustCenter/Security/default.aspx).
5555
- The Data Box Disk stores disk passkey that is used to unlock the disk in the service.
56-
- The Data box Disk service stores order details and status in the service. This information is deleted when the order is deleted.
57-
56+
- The Data box Disk service stores order details and status in the service. This information is deleted when the order is deleted.
5857

5958
## Managing personal data
6059

-3.48 KB
Loading

articles/sql-database/sql-database-serverless.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.topic: conceptual
1010
author: oslake
1111
ms.author: moslake
1212
ms.reviewer: sstein, carlrab
13-
ms.date: 12/03/2019
13+
ms.date: 3/11/2020
1414
---
1515
# Azure SQL Database serverless
1616

@@ -143,6 +143,10 @@ If a serverless database is paused, then the first login will resume the databas
143143

144144
The latency to autoresume and autopause a serverless database is generally order of 1 minute to autoresume and 1-10 minutes to autopause.
145145

146+
### Customer managed transparent data encryption (BYOK)
147+
148+
If using [customer managed transparent data encryption](transparent-data-encryption-byok-azure-sql.md) (BYOK) and the serverless database is auto-paused when key deletion or revocation occurs, then the database remains in the auto-paused state. In this case, when resuming is next attempted, the database remains paused until its status transitions to inaccessible after approximately 10 minutes or less. Once the database becomes inaccessible, the recovery process is the same as for provisioned compute databases. If the serverless database is online when key deletion or revocation occurs, then the database also becomes inaccessible after approximately 10 minutes or less in the same way as with provisioned compute databases.
149+
146150
## Onboarding into serverless compute tier
147151

148152
Creating a new database or moving an existing database into a serverless compute tier follows the same pattern as creating a new database in provisioned compute tier and involves the following two steps.

articles/storage/blobs/TOC.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -692,8 +692,8 @@
692692
href: ../common/storage-account-create.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json
693693
- name: Migrate
694694
items:
695-
- name: Upgrade from Data Lake Storage Gen1
696-
href: ../blobs/data-lake-storage-upgrade.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json
695+
- name: Migrate from Data Lake Storage Gen1
696+
href: ../blobs/data-lake-storage-migrate-gen1-to-gen2.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.jso
697697
- name: Migrate an on-premises HDFS store
698698
href: ../blobs/data-lake-storage-migrate-on-premises-HDFS-cluster.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json
699699
- name: Secure
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
title: Migrate Azure Data Lake Storage from Gen1 to Gen2
3+
description: Migrate Azure Data Lake Storage from Gen1 to Gen2.
4+
author: normesta
5+
ms.topic: conceptual
6+
ms.author: normesta
7+
ms.date: 03/11/2020
8+
ms.service: storage
9+
ms.reviewer: rukmani-msft
10+
ms.subservice: data-lake-storage-gen2
11+
---
12+
13+
# Migrate Azure Data Lake Storage from Gen1 to Gen2
14+
15+
You can migrate your data, workloads, and applications from Data Lake Storage Gen1 to Data Lake Storage Gen2.
16+
17+
‎Azure Data Lake Storage Gen2 is built on [Azure Blob storage](storage-blobs-introduction.md) and provides a set of capabilities dedicated to big data analytics. [Data Lake Storage Gen2](https://azure.microsoft.com/services/storage/data-lake-storage/) combines features from [Azure Data Lake Storage Gen1](https://docs.microsoft.com/azure/data-lake-store/index), such as file system semantics, directory, and file level security and scale with low-cost, tiered storage, high availability/disaster recovery capabilities from [Azure Blob storage](storage-blobs-introduction.md).
18+
19+
> [!NOTE]
20+
> For easier reading, this article uses the term *Gen1* to refer to Azure Data Lake Storage Gen1, and the term *Gen2* to refer to Azure Data Lake Storage Gen2.
21+
22+
## Recommended approach
23+
24+
To migrate to Gen2, we recommend the following approach.
25+
26+
:heavy_check_mark: Step 1: Assess readiness
27+
28+
:heavy_check_mark: Step 2: Prepare to migrate
29+
30+
:heavy_check_mark: Step 3: Migrate data and application workloads
31+
32+
:heavy_check_mark: Step 4: Cutover from Gen1 to Gen2
33+
34+
> [!NOTE]
35+
> Gen1 and Gen2 are different services, there is no in-place upgrade experience, intentional migration effort required.
36+
37+
### Step 1: Assess readiness
38+
39+
1. Learn about the [Data Lake Storage Gen2 offering](https://azure.microsoft.com/services/storage/data-lake-storage/); it's benefits, costs, and general architecture.
40+
41+
2. [Compare the capabilities](#gen1-gen2-feature-comparison) of Gen1 with those of Gen2.
42+
43+
3. Review a list of [known issues](data-lake-storage-known-issues.md) to assess any gaps in functionality.
44+
45+
4. Gen2 supports Blob storage features such as [diagnostic logging](../common/storage-analytics-logging.md), [access tiers](storage-blob-storage-tiers.md), and [Blob storage lifecycle management policies](storage-lifecycle-management-concepts.md). If you're interesting in using any of these features, review [current level of support](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-multi-protocol-access?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#blob-storage-feature-support).
46+
47+
5. Review the current state of [Azure ecosystem support](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-multi-protocol-access?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#azure-ecosystem-support) to ensure that Gen2 supports any services that your solutions depend upon.
48+
49+
### Step 2: Prepare to migrate
50+
51+
1. Identify the data sets that you'll migrate.
52+
53+
Take this opportunity to clean up data sets that you no longer use. Unless you plan to migrate all of your data at one time, Take this time to identify logical groups of data that you can migrate in phases.
54+
55+
2. Determine the impact that a migration will have on your business.
56+
57+
For example, consider whether you can afford any downtime while the migration takes place. These considerations can help you to identify a suitable migration pattern, and to choose the most appropriate tools.
58+
59+
3. Create a migration plan.
60+
61+
We recommend these [migration patterns](#migration-patterns). You can choose one of these patterns, combine them together, or design a custom pattern of your own.
62+
63+
### Step 3: Migrate data, workloads, and applications
64+
65+
Migrate data, workloads, and applications by using the pattern that you prefer. We recommend that you validate scenarios incrementally.
66+
67+
1. [Create a storage account](data-lake-storage-quickstart-create-account.md) and enable the hierarchical namespace feature.
68+
69+
2. Migrate your data.
70+
71+
3. Configure [services in your workloads](data-lake-storage-integrate-with-azure-services.md) to point to your Gen2 endpoint.
72+
73+
4. Update applications to use Gen2 APIs. See guides for [.NET](data-lake-storage-directory-file-acl-dotnet.md), [Java](data-lake-storage-directory-file-acl-java.md), [Python](data-lake-storage-directory-file-acl-python.md), [JavaScript](data-lake-storage-directory-file-acl-javascript.md) and [REST](https://docs.microsoft.com/rest/api/storageservices/data-lake-storage-gen2).
74+
75+
5. Update scripts to use Data Lake Storage Gen2 [PowerShell cmdlets](data-lake-storage-directory-file-acl-powershell.md), and [Azure CLI commands](data-lake-storage-directory-file-acl-cli.md).
76+
77+
6. Search for URI references that contain the string `adl://` in code files, or in Databricks notebooks, Apache Hive HQL files or any other file used as part of your workloads. Replace these references with the [Gen2 formatted URI](data-lake-storage-introduction-abfs-uri.md) of your new storage account. For example: the Gen1 URI: `adl://mydatalakestore.azuredatalakestore.net/mydirectory/myfile` might become `abfss://[email protected]/mydirectory/myfile`.
78+
79+
7. Configure the security on your account to include [Role-based access control (RBAC) roles](../common/storage-auth-aad-rbac-portal.md), [file and folder level security](data-lake-storage-access-control.md), and [Azure Storage firewalls and virtual networks](../common/storage-network-security.md).
80+
81+
### Step 4: Cutover from Gen1 to Gen2
82+
83+
After you're confident that your applications and workloads are stable on Gen2, you can begin using Gen2 to satisfy your business scenarios. Turn off any remaining pipelines that are running on Gen1 and decommission your Gen1 account.
84+
85+
<a id="gen1-gen2-feature-comparison" />
86+
87+
## Gen1 vs Gen2 capabilities
88+
89+
This table compares the capabilities of Gen1 to that of Gen2.
90+
91+
|Area |Gen1 |Gen2 |
92+
|---|---|---|
93+
|Data organization|[Hierarchical namespace](data-lake-storage-namespace.md)<br>File and folder support|[Hierarchical namespace](data-lake-storage-namespace.md)<br>Container, file and folder support |
94+
|Geo-redundancy| [LRS](../common/storage-redundancy.md#locally-redundant-storage)| [LRS](../common/storage-redundancy.md#locally-redundant-storage), [ZRS](../common/storage-redundancy.md#zone-redundant-storage), [GRS](../common/storage-redundancy.md#geo-redundant-storage), [RA-GRS](../common/storage-redundancy.md#read-access-to-data-in-the-secondary-region) |
95+
|Authentication|[AAD managed identity](../../active-directory/managed-identities-azure-resources/overview.md)<br>[Service principals](../../active-directory/develop/app-objects-and-service-principals.md)|[AAD managed identity](../../active-directory/managed-identities-azure-resources/overview.md)<br>[Service principals](../../active-directory/develop/app-objects-and-service-principals.md)<br>[Shared Access Key](https://docs.microsoft.com/rest/api/storageservices/authorize-with-shared-key)|
96+
|Authorization|Management - [RBAC](../../role-based-access-control/overview.md)<br>Data – [ACLs](data-lake-storage-access-control.md)|Management – [RBAC](../../role-based-access-control/overview.md)<br>Data - [ACLs](data-lake-storage-access-control.md), [RBAC](../../role-based-access-control/overview.md) |
97+
|Encryption – Data at rest|Server side – with [service managed](../common/storage-service-encryption.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#microsoft-managed-keys) or [customer managed](../common/storage-service-encryption.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#customer-managed-keys-with-azure-key-vault) keys|Server side – with [service managed](../common/storage-service-encryption.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#microsoft-managed-keys) or [customer managed](../common/storage-service-encryption.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#customer-managed-keys-with-azure-key-vault) keys|
98+
|VNET Support|[VNET Integration](../../data-lake-store/data-lake-store-network-security.md)|[Service Endpoints](../common/storage-network-security.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json), [Private Endpoints (public preview)](../common/storage-private-endpoints.md)|
99+
|Developer experience|[REST](../../data-lake-store/data-lake-store-data-operations-rest-api.md), [.NET](../../data-lake-store/data-lake-store-data-operations-net-sdk.md), [Java](../../data-lake-store/data-lake-store-get-started-java-sdk.md), [Python](../../data-lake-store/data-lake-store-data-operations-python.md), [PowerShell](../../data-lake-store/data-lake-store-get-started-powershell.md), [Azure CLI](../../data-lake-store/data-lake-store-get-started-cli-2.0.md)|[REST](https://review.docs.microsoft.com/rest/api/storageservices/data-lake-storage-gen2), [.NET](/data-lake-storage-directory-file-acl-dotnet.md), [Java](data-lake-storage-directory-file-acl-java.md), [Python](data-lake-storage-directory-file-acl-python.md), [JavaScript](data-lake-storage-directory-file-acl-javascript.md), [PowerShell](data-lake-storage-directory-file-acl-powershell.md), [Azure CLI](data-lake-storage-directory-file-acl-cli.md) (In public preview)|
100+
|Diagnostic logs|Classic logs<br>[Azure Monitor integrated](../../data-lake-store/data-lake-store-diagnostic-logs.md)|[Classic logs](../common/storage-analytics-logging.md) (In public preview)<br>Azure monitor integration – timeline TBD|
101+
|Ecosystem|[HDInsight (3.6)](../../data-lake-store/data-lake-store-hdinsight-hadoop-use-portal.md), [Azure Databricks (3.1 and above)](https://docs.databricks.com/data/data-sources/azure/azure-datalake.html), [SQL DW](https://docs.microsoft.com/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store), [ADF](../../data-factory/load-azure-data-lake-store.md)|[HDInsight (3.6, 4.0)](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md), [Azure Databricks (5.1 and above)](https://docs.microsoft.com/azure/databricks/data/data-sources/azure/azure-datalake-gen2), [SQL DW](../../sql-database/sql-database-vnet-service-endpoint-rule-overview.md), [ADF](../../data-factory/load-azure-data-lake-storage-gen2.md)|
102+
103+
<a id="migration-patterns" />
104+
105+
## Gen1 to Gen2 patterns
106+
107+
Choose a migration pattern, and then modify that pattern as needed.
108+
109+
|||
110+
|---|---|
111+
|**Lift and Shift**|The simplest pattern. Ideal if your data pipelines can afford downtime.|
112+
|**Incremental copy**|Similar to *lift and shift*, but with less downtime. Ideal for large amounts of data that take longer to copy.|
113+
|**Dual pipeline**|Ideal for pipelines that can't afford any downtime.|
114+
|**Bidirectional sync**|Similar to *dual pipeline*, but with a more phased approach that is suited for more complicated pipelines.|
115+
116+
Let's take a closer look at each pattern.
117+
118+
### Lift and shift pattern
119+
120+
This is the simplest pattern.
121+
122+
1. Stop all writes to Gen1.
123+
124+
2. Move data from Gen1 to Gen2. We recommend [Azure Data Factory](https://docs.microsoft.com/azure/data-factory/connector-azure-data-lake-storage). ACLs copy with the data.
125+
126+
3. Point ingest operations and workloads to Gen2.
127+
128+
4. Decommission Gen1.
129+
130+
![lift and shift pattern](./media/data-lake-storage-migrate-gen1-to-gen2/lift-and-shift.png)
131+
132+
#### Considerations for using the lift and shift pattern
133+
134+
:heavy_check_mark: Cutover from Gen1 to Gen2 for all workloads at the same time.
135+
136+
:heavy_check_mark: Expect downtime during the migration and the cutover period.
137+
138+
:heavy_check_mark: Ideal for pipelines that can afford downtime and all apps can be upgraded at one time.
139+
140+
### Incremental copy pattern
141+
142+
1. Start moving data from Gen1 to Gen2. We recommend [Azure Data Factory](https://docs.microsoft.com/azure/data-factory/connector-azure-data-lake-storage). ACLs copy with the data.
143+
144+
2. Incrementally copy new data from Gen1.
145+
146+
3. After all data is copied, stop all writes to Gen1, and point workloads to Gen2.
147+
148+
4. Decommission Gen1.
149+
150+
![Incremental copy pattern](./media/data-lake-storage-migrate-gen1-to-gen2/incremental-copy.png)
151+
152+
#### Considerations for using the incremental copy pattern:
153+
154+
:heavy_check_mark: Cutover from Gen1 to Gen2 for all workloads at the same time.
155+
156+
:heavy_check_mark: Expect downtime during cutover period only.
157+
158+
:heavy_check_mark: Ideal for pipelines where all apps upgraded at one time, but the data copy requires more time.
159+
160+
### Dual pipeline pattern
161+
162+
1. Move data from Gen1 to Gen2. We recommend [Azure Data Factory](https://docs.microsoft.com/azure/data-factory/connector-azure-data-lake-storage). ACLs copy with the data.
163+
164+
2. Ingest new data to both Gen1 and Gen2.
165+
166+
3. Point workloads to Gen2.
167+
168+
4. Stop all writes to Gen1 and then decommission Gen1.
169+
170+
![Dual pipeline pattern](./media/data-lake-storage-migrate-gen1-to-gen2/dual-pipeline.png)
171+
172+
#### Considerations for using the dual pipeline pattern:
173+
174+
:heavy_check_mark: Gen1 and Gen2 pipelines run side-by-side.
175+
176+
:heavy_check_mark: Supports zero downtime.
177+
178+
:heavy_check_mark: Ideal in situations where your workloads and applications can't afford any downtime, and you can ingest into both storage accounts.
179+
180+
### Bi-directional sync pattern
181+
182+
1. Set up bidirectional replication between Gen1 and Gen2. We recommend [WanDisco](https://docs.wandisco.com/bigdata/wdfusion/adls/). It offers a repair feature for existing data.
183+
184+
3. When all moves are complete, stop all writes to Gen1 and turn off bidirectional replication.
185+
186+
4. Decommission Gen1.
187+
188+
![Bidirectional pattern](./media/data-lake-storage-migrate-gen1-to-gen2/bidirectional-sync.png)
189+
190+
#### Considerations for using the bi-directional sync pattern:
191+
192+
:heavy_check_mark: Ideal for complex scenarios that involve a large number of pipelines and dependencies where a phased approach might make more sense.
193+
194+
:heavy_check_mark: Migration effort is high, but it provides side-by-side support for Gen1 and Gen2.
195+
196+
## Next steps
197+
198+
- Learn about the various parts of setting up security for a storage account. See [Azure Storage security guide](../common/storage-security-guide.md).
199+
- Optimize the performance for your Data Lake Store. See [Optimize Azure Data Lake Storage Gen2 for performance](data-lake-storage-performance-tuning-guidance.md)
200+
- Review the best practices for managing your Data Lake Store. See [Best practices for using Azure Data Lake Storage Gen2](data-lake-storage-best-practices.md)
201+

0 commit comments

Comments
 (0)