You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-data-migration.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
1
---
2
2
title: 'Data migration: On-premises Apache Hadoop to Azure HDInsight'
3
3
description: Learn data migration best practices for migrating on-premises Hadoop clusters to Azure HDInsight.
4
-
ms.reviewer: ashishth
5
4
ms.service: hdinsight
6
5
ms.topic: how-to
7
6
ms.custom: hdinsightactive
@@ -19,13 +18,13 @@ There are two main options to migrate data from on-premises to Azure environment
19
18
* Transfer data over network with TLS
20
19
* Over internet - You can transfer data to Azure storage over a regular internet connection using any one of several tools such as: Azure Storage Explorer, AzCopy, Azure PowerShell, and Azure CLI. For more information, see [Moving data to and from Azure Storage](../../storage/common/storage-choose-data-transfer-solution.md).
21
20
22
-
* Express Route - ExpressRoute is an Azure service that lets you create private connections between Microsoft datacenters and infrastructure that’s on your premises or in a colocation facility. ExpressRoute connections don't go over the public Internet, and offer higher security, reliability, and speeds with lower latencies than typical connections over the Internet. For more information, see [Create and modify an ExpressRoute circuit](../../expressroute/expressroute-howto-circuit-portal-resource-manager.md).
21
+
* Express Route - ExpressRoute is an Azure service that lets you create private connections between Microsoft datacenters and infrastructure that's on your premises or in a colocation facility. ExpressRoute connections don't go over the public Internet, and offer higher security, reliability, and speeds with lower latencies than typical connections over the Internet. For more information, see [Create and modify an ExpressRoute circuit](../../expressroute/expressroute-howto-circuit-portal-resource-manager.md).
23
22
24
23
* Data Box online data transfer - Data Box Edge and Data Box Gateway are online data transfer products that act as network storage gateways to manage data between your site and Azure. Data Box Edge, an on-premises network device, transfers data to and from Azure and uses artificial intelligence (AI)-enabled edge compute to process data. Data Box Gateway is a virtual appliance with storage gateway capabilities. For more information, see [Azure Data Box Documentation - Online Transfer](../../databox-online/index.yml).
25
24
26
25
* Shipping data Offline
27
26
28
-
Data Box offline data transfer - Data Box, Data Box Disk, and Data Box Heavy devices help you transfer large amounts of data to Azure when the network isn’t an option. These offline data transfer devices are shipped between your organization and the Azure datacenter. They use AES encryption to help protect your data in transit, and they undergo a thorough post-upload sanitization process to delete your data from the device. For more information on the Data Box offline transfer devices, see [Azure Data Box Documentation - Offline Transfer](../../databox/index.yml). For more information on migration of Hadoop clusters, see [Use Azure Data Box to migrate from an on-premises HDFS store to Azure Storage](../../storage/blobs/data-lake-storage-migrate-on-premises-hdfs-cluster.md).
27
+
Data Box offline data transfer - Data Box, Data Box Disk, and Data Box Heavy devices help you transfer large amounts of data to Azure when the network isn't an option. These offline data transfer devices are shipped between your organization and the Azure datacenter. They use AES encryption to help protect your data in transit, and they undergo a thorough post-upload sanitization process to delete your data from the device. For more information on the Data Box offline transfer devices, see [Azure Data Box Documentation - Offline Transfer](../../databox/index.yml). For more information on migration of Hadoop clusters, see [Use Azure Data Box to migrate from an on-premises HDFS store to Azure Storage](../../storage/blobs/data-lake-storage-migrate-on-premises-hdfs-cluster.md).
29
28
30
29
The following table has approximate data transfer duration based on the data volume and network bandwidth. Use a Data box if the data migration is expected to take more than three weeks.
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-upgrade-cluster.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,6 @@
2
2
title: Migrate cluster to a newer version
3
3
titleSuffix: Azure HDInsight
4
4
description: Learn guidelines to migrate your Azure HDInsight cluster to a newer version.
5
-
ms.reviewer: jasonh
6
5
ms.service: hdinsight
7
6
ms.topic: how-to
8
7
ms.custom: hdinsightactive
@@ -52,7 +51,7 @@ As mentioned above, Microsoft recommends that HDInsight clusters be regularly mi
52
51
* The cluster version is [Retired](hdinsight-retired-versions.md) or in [Basic support](hdinsight-36-component-versioning.md) and you are having a cluster issue that would be resolved with a newer version.
53
52
* The root cause of a cluster issue is determined to be related to an undersized VM. [View Microsoft's recommended node configuration](hdinsight-supported-node-configuration.md).
54
53
* A customer opens a support case and the Microsoft engineering team determines the issue has already been fixed in a newer cluster version.
55
-
* A default metastore database (Ambari, Hive, Oozie, Ranger) has reached it’s utilization limit. Microsoft will ask you to recreate the cluster using a [custom metastore](hdinsight-use-external-metadata-stores.md#custom-metastore) database.
54
+
* A default metastore database (Ambari, Hive, Oozie, Ranger) has reached it's utilization limit. Microsoft will ask you to recreate the cluster using a [custom metastore](hdinsight-use-external-metadata-stores.md#custom-metastore) database.
56
55
* The root cause of a cluster issue is due to an **Unsupported Operation**. Here are some of the common unsupported operations:
57
56
***Moving or Adding a service in Ambari**. When viewing information on the cluster services in Ambari, one of the actions available from the Service Actions menu is **Move [Service Name]**. Another action is **Add [Service Name]**. Both of these options are unsupported.
58
57
***Python package corruption**. HDInsight clusters depend on the built-in Python environments, Python 2.7 and Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes and break the cluster. Learn how to [safely install custom external Python packages](./spark/apache-spark-python-package-installation.md#safely-install-external-python-packages) for your Spark applications.
> It's recommended to have sufficient gap between two schedules so that data cache is efficiently utilized i.e schedule scale up’s when there is peak usage and scale down’s when there is no usage.
55
+
> It's recommended to have sufficient gap between two schedules so that data cache is efficiently utilized i.e schedule scale up's when there is peak usage and scale down's when there is no usage.
0 commit comments