|
1 | 1 | ---
|
2 |
| -title: Data Lake Storage and WANdisco LiveData Platform for Azure (preview) |
3 |
| -description: Migrate on-premises Hadoop data to Azure Data Lake Storage Gen2 by using WANdisco LiveData Platform for Azure. |
| 2 | +title: Data Lake Storage and WANdisco LiveData Platform for Azure |
| 3 | +description: Learn how to migrate petabytes of on-premises Hadoop data to Azure Data Lake Storage Gen2 file systems without interrupting data operations or requiring downtime. |
4 | 4 | author: normesta
|
5 | 5 | ms.topic: how-to
|
6 | 6 | ms.author: normesta
|
7 | 7 | ms.reviewer: b-pauls
|
8 |
| -ms.date: 11/17/2020 |
| 8 | +ms.date: 10/26/2021 |
9 | 9 | ms.service: storage
|
10 | 10 | ms.custom: references_regions
|
11 | 11 | ms.subservice: data-lake-storage-gen2
|
12 | 12 | ---
|
13 | 13 |
|
14 |
| -# Meet demanding migration requirements with WANdisco LiveData Platform for Azure (preview) |
| 14 | +# Migrate on-premises Hadoop data to Azure Data Lake Storage Gen2 with WANdisco LiveData Platform for Azure |
15 | 15 |
|
16 |
| -Migrate on-premises Hadoop data to Azure Data Lake Storage Gen2 by using [WANdisco LiveData Platform for Azure](https://docs.wandisco.com/live-data-platform/docs/landing/). This platform eliminates the need for application downtime, remove the chance of data loss, and ensure data consistency even while operations continue on-premises. |
| 16 | +[WANdisco LiveData Platform for Azure](https://docs.wandisco.com/live-data-platform/docs/landing/) migrates petabytes of on-premises Hadoop data to Azure Data Lake Storage Gen2 file systems without interrupting data operations or requiring downtime. The platform's continuous checks prevent data from being lost while keeping it consistent at both ends of transference even while it undergoes modification. |
17 | 17 |
|
18 |
| -> [!NOTE] |
19 |
| -> WANdisco LiveData Platform for Azure is in public preview. For regional availability, see [Supported regions](https://docs.wandisco.com/live-data-platform/docs/prereq#supported-regions). |
20 |
| -
|
21 |
| -The platform consists of two services: [LiveData Migrator for Azure](https://www.wandisco.com/products/livedata-migrator-for-azure) to migrate actively used data from on-premises environments to Azure storage, and [LiveData Plane for Azure](https://www.wandisco.com/products/livedata-plane-for-azure) which ensures that all modified data or ingest data are replicated consistently. |
| 18 | +The platform consists of two services. [LiveData Migrator for Azure](https://www.wandisco.com/products/livedata-migrator-for-azure) migrates actively used data from on-premises environments to Azure storage, and [LiveData Plane for Azure](https://www.wandisco.com/products/livedata-plane-for-azure) ensures that all modified or ingested data is replicated consistently. |
22 | 19 |
|
23 | 20 | > [!div class="mx-imgBorder"]
|
24 |
| ->  |
| 21 | +>  |
25 | 22 |
|
26 |
| -You can manage both services by using the Azure portal and the Azure CLI, and both follow the same metered, pay-as-you-go billing model as all other Azure services. LiveData Platform for Azure consumption will appear on the same monthly Azure bill and will provide a consistent and convenient way to track and monitor your usage. |
| 23 | +Manage both services by using the Azure portal and the Azure CLI. Each service follows the same metered, pay-as-you-go billing model as all other Azure services: data consumption in LiveData Platform for Azure will appear on the monthly Azure bill, which will provide usage metrics. |
27 | 24 |
|
28 | 25 | Unlike migrating data *offline* by [copying static information to Azure Data Box](./data-lake-storage-migrate-on-premises-hdfs-cluster.md), or by using Hadoop tools like [DistCp](https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html), you can maintain full operation of your business systems during *online* migration with WANdisco LiveData for Azure. Keep your big data environments operating even while moving their data to Azure.
|
29 | 26 |
|
30 |
| -## Key features of WANdisco LiveData Platform for Azure |
| 27 | +## Key benefits of WANdisco LiveData Platform for Azure |
| 28 | + |
| 29 | +[WANdisco LiveData Platform for Azure](https://docs.wandisco.com/live-data-platform/docs/landing/)'s wide-area network capable consensus engine achieves data consistency, and conducts real-time data replication at scale. See the following video for more information:<br><br> |
| 30 | + |
| 31 | +>[!VIDEO https://www.youtube.com/embed/KRrmcYPxEho] |
| 32 | +
|
| 33 | +Key benefits of the platform include the following: |
| 34 | + |
| 35 | +- **Data accuracy**: End-to-end validation of data prevents data loss and ensures transferred data is fit for use. |
31 | 36 |
|
32 |
| -[WANdisco LiveData Platform for Azure](https://docs.wandisco.com/live-data-platform/docs/landing/) uses a unique, wide-area network capable consensus engine to achieve data consistency, and to conduct data replication at scale while applications can continue to modify the data under replication. <br><br> |
| 37 | +- **Data consistency**: Keep data volumes automatically consistent between environments even while they undergo continuous change. |
33 | 38 |
|
34 |
| -> [!VIDEO https://www.youtube.com/embed/KRrmcYPxEho] |
| 39 | +- **Data efficiency**: Transfer large data volumes continuously with full control of bandwidth consumption. |
| 40 | + |
| 41 | +- **Downtime elimination**: Freely create, modify, read, and delete data with other applications during migration, without the need to disrupt business operations during data transference to Azure. Continue to operate applications, analytics infrastructure, ingest jobs, and other processing. |
| 42 | + |
| 43 | +- **Simple use**: Use the Platform's Azure integration to create, configure, schedule, and track the progress of automated migrations. Additionally, configure selective data replication, Hive metadata, data security, and confidentiality as needed. |
| 44 | + |
| 45 | +## Key features of WANdisco LiveData Platform for Azure |
35 | 46 |
|
36 | 47 | Key features of the platform include the following:
|
37 | 48 |
|
38 |
| -- **Data consistency**: Solve the challenges of migrating large data volumes between environments and keeping those data consistent across storage systems throughput migration, even while they are under continual change. Employ WANdisco's unique, wide-area network capable consensus engine directly in Azure to achieve data consistency and to migrate data with consistency guarantees throughout data changes. |
| 49 | +- **Metadata Migration**: In addition to HDFS data, migrate metadata (from Hive and other storages) with LiveData Migrator for Azure. |
| 50 | + |
| 51 | +- **Scheduled Transfer**: Use LiveData Migrator for Azure to control and automate when data transfer will initiate, eliminating the need to manually migrate changes to data. |
| 52 | + |
| 53 | +- **Kerberos**: LiveData Migrator for Azure supports Kerberized clusters. |
39 | 54 |
|
40 |
| -- **Maintain operations**: Because applications can continue to create, modify, read, and delete data during migration, there is no need to disrupt business operations or introduce an outage window just to migrate big data to Azure. Continue to operate applications, analytics infrastructure, ingest jobs, and other processing. |
| 55 | +- **Exclusion Templates**: Create rules in LiveData Migrator for Azure to prevent certain file sizes or file names (defined using glob patterns) from being migrated to your target storage. Create exclusion templates in the Azure portal or with the CLI, and apply them to any number of migrations. |
41 | 56 |
|
42 |
| -- **Validate outcomes**: End-to-end validation that your data can be used effectively once migrated to Azure requires that you run production application workloads against them. Only a LiveData Service provides this without introducing the risk of data divergence, by maintaining data consistency regardless of whether change occurs at the source or target of your migration. Test and validate application behavior without risk or change to your processes and systems. |
| 57 | +- **Path Mappings**: Define alternate target paths for specific target file systems, which automatically move transferred data to directories you specify. |
43 | 58 |
|
44 |
| -- **Reduce complexity**: Eliminate the need to create and manage scheduled jobs to copy data by migrating data through automation. Use the deep integration with Azure as a control plane to manage and monitor migration progress, including selective data replication, Hive metadata, data security and confidentiality. |
| 59 | +- **Bandwidth Management**: Configure the maximum amount of network bandwidth LiveData Migrator for Azure can use to prevent bandwidth over consumption. |
45 | 60 |
|
46 |
| -- **Efficiency**: Maintain high throughput and performance, and scale to big data volumes easily. With control of bandwidth consumption, you can ensure that you can meet your migration goals without impacting other system operations. |
| 61 | +- **Exclusions**: Define template queries that prevent the migration of any files and directories that meet the criteria, allowing you to selectively migrate data from your source system. |
| 62 | + |
| 63 | +- **Metrics**: View details about data transfer in LiveData Migrator for Azure, such as files transferred over time, excluded paths, items that failed to transfer and more. |
47 | 64 |
|
48 | 65 | ## Migrate big data faster without risk
|
49 | 66 |
|
50 |
| -The first service of WANdisco LiveData Platform for Azure is [LiveData Migrator for Azure](https://www.wandisco.com/products/livedata-migrator-for-azure); a solution for migrating actively used data from on-premises environments to Azure storage. LiveData Migrator for Azure is provisioned and managed entirely from the Azure portal or Azure CLI, and operates alongside your Hadoop cluster on-premises without any configuration change, application modifications, or service restarts to begin migrating data immediately. |
| 67 | +The first service included in WANdisco LiveData Platform for Azure is [LiveData Migrator for Azure](https://www.wandisco.com/products/livedata-migrator-for-azure), which migrates data from on-premises environments to Azure Storage. Once you've deployed LiveData Migrator to your on-premises Hadoop cluster, it will automatically create the best configuration for your file system. From there, supply the Kerberos details for the system. LiveData Migrator for Azure will then be ready to migrate data to Azure Storage. |
51 | 68 |
|
52 | 69 | > [!div class="mx-imgBorder"]
|
53 |
| ->  |
| 70 | +>  |
| 71 | +
|
| 72 | +Before you start with LiveData Migrator for Azure, review these [prerequisites](https://docs.wandisco.com/live-data-platform/docs/prereq/). |
| 73 | + |
| 74 | +To perform a migration: |
54 | 75 |
|
55 |
| -Big data migrations can be complex and challenging. Moving petabytes of information without disrupting business operations has been impossible to achieve with offline data copy technologies. [LiveData Migrator for Azure](https://www.wandisco.com/products/livedata-migrator-for-azure) offers simple deployment and can establish a LiveData Service with continuous data migration and replication while applications read, write, and modify the data being migrated. |
| 76 | +1. In the Azure CLI: |
56 | 77 |
|
57 |
| -Performing a migration is as simple as these three steps: |
| 78 | + - Register for the WANdisco resource provider in the Azure CLI by running `az provider register --namespace Wandisco.Fusion --consent-to-permissions`. |
| 79 | + - Accept the metered billing terms of LiveData Platform by running `az vm image terms accept --offer ldma --plan metered-v1 --publisher Wandisco --subscription <subscriptionID>`. |
58 | 80 |
|
59 |
| -1. Provision the LiveData Migrator instance from the Azure portal to your on-premises Hadoop cluster. No cluster change or downtime is needed, and applications can continue to operate. |
| 81 | +2. Deploy a LiveData Migrator instance from the Azure portal to your on-premises Hadoop cluster. (You do not need to make changes to or restart the cluster.) |
60 | 82 |
|
61 | 83 | > [!div class="mx-imgBorder"]
|
62 | 84 | > 
|
63 | 85 |
|
64 |
| -2. Define the target Azure Data Lake Storage Gen2-enabled storage account. |
| 86 | + > [!NOTE] |
| 87 | + > WANdisco LiveData Migrator for Azure provides the option to create a Hadoop Test Cluster. |
| 88 | +
|
| 89 | +3. Configure Kerberos details, if applicable. |
| 90 | + |
| 91 | +4. Define the target Azure Data Lake Storage Gen2-enabled storage account. |
65 | 92 |
|
66 | 93 | > [!div class="mx-imgBorder"]
|
67 | 94 | > 
|
68 | 95 |
|
69 |
| -3. Define the location of the data that you want to migrate, for example: `/user/hive/warehouse`, and start the migration. |
| 96 | +5. Define the location of the data that you want to migrate, for example: `/user/hive/warehouse`. |
70 | 97 |
|
71 | 98 | > [!div class="mx-imgBorder"]
|
72 | 99 | > 
|
73 | 100 |
|
74 |
| -Monitor your migration progress through standard Azure tooling including the Azure CLI and Azure portal, and continue to use your on-premises environment throughout. Before you start, review these [prerequisites](https://docs.wandisco.com/live-data-platform/docs/prereq/). |
| 101 | +6. Start the migration. |
| 102 | + |
| 103 | +Monitor your migration progress through standard Azure tooling including the Azure CLI and Azure portal. |
75 | 104 |
|
76 |
| -## Replicate data under active change |
| 105 | +For more detailed instructions, see the [LiveData Migrator for Azure How-To video series](https://fast.wistia.com/embed/channel/qg51p8erky). |
77 | 106 |
|
78 |
| -Large-scale migrations of on-premises data lakes to Azure need application testing and validation. Being able to do this without the risk of introducing data changes that will create multiple sources of truth that cannot be easily reconciled is critical to eliminating risk and minimizing the cost of moving to Azure. [LiveData Plane for Azure](https://www.wandisco.com/products/livedata-plane-for-azure) uses WANdisco's coordination engine technology to overcome these concerns. |
| 107 | +## Bidirectionally replicate data under active change with LiveData Plane for Azure |
| 108 | + |
| 109 | +The second service included in the LiveData Platform is [LiveData Plane for Azure](https://www.wandisco.com/products/livedata-plane-for-azure). LiveData Plane uses WANdisco's coordination engine to keep data consistent across many on-premises Hadoop clusters and Azure Storage by intelligently applying changes to data on all systems, removing the risk of data conflicts at different points of use. |
79 | 110 |
|
80 | 111 | > [!div class="mx-imgBorder"]
|
81 | 112 | > 
|
82 | 113 |
|
83 |
| -Keep your data consistent across on-premises Hadoop clusters and Azure storage with LiveData Plane for Azure after initial migration: |
| 114 | +After initial migration, keep your data consistent with LiveData Plane for Azure: |
84 | 115 |
|
85 |
| -1. Provision LiveData Plane for Azure on-premises and in Azure, starting from the Azure portal. No application changes are required. |
| 116 | +1. Deploy LiveData Plane for Azure on-premises and in Azure, starting from the Azure portal. No application changes are required. |
86 | 117 |
|
87 |
| -2. Configure replication rules that cover that data locations that you want to keep consistent, for example: `/user/contoso/sales/region/WA`. |
| 118 | +2. Configure replication rules that cover the data locations that you want to keep consistent, for example: `/user/contoso/sales/region/WA`. |
88 | 119 |
|
89 |
| -3. Run applications that access and modify data in either location as a Hadoop-compatible file system as you need. |
| 120 | +3. Run applications that access and modify data in either location as you need. |
90 | 121 |
|
91 |
| -LiveData Plane for Azure keeps your data consistent without imposing significant overhead on cluster operation or application performance. Modify or ingest data while all changes are replicated consistently. |
| 122 | +LiveData Plane for Azure consistently replicates data changes across all environments without significant impact on cluster operation or application performance. |
92 | 123 |
|
93 |
| -## Next steps |
| 124 | +## Test drive or Trial |
94 | 125 |
|
95 |
| -- [LiveData Platform for Azure](https://docs.wandisco.com/live-data-platform/docs/landing/) for Azure is used like any other Azure resource, and is available in preview now. |
| 126 | +From [LiveData Platform for Azure's Marketplace page](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldma?tab=Overview), you have two options: |
96 | 127 |
|
97 |
| -- Understand the [prerequisites](https://docs.wandisco.com/live-data-platform/docs/prereq/), plan your migration, and complete a large-scale migration rapidly with LiveData Migrator for Azure. |
| 128 | +- The **Get It Now** button launches the service in your subscription. From there, you may use your own Hadoop cluster or WANdisco's Trial cluster. |
98 | 129 |
|
99 |
| -- Try out the LiveData Migrator without needing to have an on-premise Hadoop cluster by using the [HDFS Sandbox](https://docs.wandisco.com/live-data-platform/docs/create-sandbox-intro/). |
| 130 | +- Click **Test Drive** to test LiveData Migrator for Azure in an environment that is preconfigured and hosted for you. This enables you to try LiveData Migrator for Azure before adding it to your subscription, without any cost or risk to your data. |
100 | 131 |
|
101 |
| -## See also |
| 132 | +Watch the [Test Drive Demonstration Video](https://fast.wistia.net/embed/channel/qg51p8erky?wchannelid=qg51p8erky&wmediaid=ute6gsc60w) to see the test drive in action. |
102 | 133 |
|
103 |
| -- [LiveData Migrator for Azure on Azure Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldm?tab=Overview) |
| 134 | +## Next Steps |
104 | 135 |
|
105 |
| -- [LiveData Plane for Azure on Azure Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldp?tab=Overview) |
| 136 | +- [Plan and create a migration in LiveData Migrator for Azure](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldma). |
106 | 137 |
|
107 |
| -- [LiveData Migrator for Azure plans and pricing](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldm?tab=PlansAndPrice) |
| 138 | +## See also |
108 | 139 |
|
109 |
| -- [LiveData Plane for Azure plans and pricing](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldp?tab=PlansAndPrice) |
| 140 | +- [LiveData Migrator for Azure on Azure Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldma?tab=Overview) |
| 141 | + |
| 142 | +- [LiveData Migrator for Azure plans and pricing](https://azuremarketplace.microsoft.com/marketplace/apps/wandisco.ldma?tab=PlansAndPricee) |
110 | 143 |
|
111 | 144 | - [LiveData Platform for Azure Frequently Asked Questions](https://docs.wandisco.com/live-data-platform/docs/faq/)
|
112 | 145 |
|
113 | 146 | - [Known Issues with LiveData Platform for Azure](https://docs.wandisco.com/live-data-platform/docs/known-issues/)
|
114 |
| - |
115 |
| -- [Introduction to Azure Data Lake Storage Gen2](data-lake-storage-introduction.md) |
|
0 commit comments