Skip to content

Commit 337d1e6

Browse files
Merge pull request #210489 from sreekzz/docs-editor/hdinsight-capacity-planning-1662615777
Modified Gen1 as Azure Data Lake Storage
2 parents 17d400c + 3172069 commit 337d1e6

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

articles/hdinsight/hdinsight-capacity-planning.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Identify key questions for capacity and performance planning of an
44
ms.service: hdinsight
55
ms.topic: conceptual
66
ms.custom: hdinsightactive
7-
ms.date: 04/27/2022
7+
ms.date: 09/08/2022
88
---
99

1010
# Capacity planning for HDInsight clusters
@@ -29,18 +29,16 @@ HDInsight is available in many Azure regions. To find the closest region, see [P
2929

3030
### Location of default storage
3131

32-
The default storage, either an Azure Storage account or Azure Data Lake Storage, must be in the same location as your cluster. Azure Storage is available at all locations. Data Lake Storage Gen1 is available in some regions - see the current [Data Lake Storage availability](https://azure.microsoft.com/global-infrastructure/services/?products=storage).
33-
32+
The default storage, either an Azure Storage account or Azure Data Lake Storage, must be in the same location as your cluster. Azure Storage is available at all locations. Data Lake Storage is available in some regions - see the current [Data Lake Storage availability](https://azure.microsoft.com/global-infrastructure/services/?products=storage).
3433
### Location of existing data
3534

3635
If you want to use an existing storage account or Data Lake Storage as your cluster's default storage, then you must deploy your cluster at that same location.
3736

3837
### Storage size
3938

40-
On a deployed cluster, you can attach additional Azure Storage accounts or access other Data Lake Storage. All your storage accounts must live in the same location as your cluster. A Data Lake Storage can be in a different location, though great distances may introduce some latency.
41-
42-
Azure Storage has some [capacity limits](../azure-resource-manager/management/azure-subscription-service-limits.md#storage-limits), while Data Lake Storage Gen1 is almost unlimited.
39+
On a deployed cluster, you can attach another Azure Storage accounts or access other Data Lake Storage. All your storage accounts must live in the same location as your cluster. A Data Lake Storage can be in a different location, though great distances may introduce some latency.
4340

41+
Azure Storage has some [capacity limits](../azure-resource-manager/management/azure-subscription-service-limits.md#storage-limits), while Data Lake Storage is almost unlimited.
4442
A cluster can access a combination of different storage accounts. Typical examples include:
4543

4644
* When the amount of data is likely to exceed the storage capacity of a single blob storage
@@ -67,9 +65,9 @@ For more information on how to choose the right VM family for your workload, see
6765

6866
## Choose the cluster scale
6967

70-
A cluster's scale is determined by the quantity of its VM nodes. For all cluster types, there are node types that have a specific scale, and node types that support scale-out. For example, a cluster may require exactly three [Apache ZooKeeper](https://zookeeper.apache.org/) nodes or two Head nodes. Worker nodes that do data processing in a distributed fashion benefit from the additional worker nodes.
68+
A cluster's scale is determined by the quantity of its VM nodes. For all cluster types, there are node types that have a specific scale, and node types that support scale-out. For example, a cluster may require exactly three [Apache ZooKeeper](https://zookeeper.apache.org/) nodes or two Head nodes. Worker nodes that do data processing in a distributed fashion benefit from another worker nodes.
7169

72-
Depending on your cluster type, increasing the number of worker nodes adds additional computational capacity (such as more cores). More nodes will increase the total memory required for the entire cluster to support in-memory storage of data being processed. As with the choice of VM size and type, selecting the right cluster scale is typically reached empirically. Use simulated workloads or canary queries.
70+
Depending on your cluster type, increasing the number of worker nodes adds more computational capacity (such as more cores). More nodes will increase the total memory required for the entire cluster to support in-memory storage of data being processed. As with the choice of VM size and type, selecting the right cluster scale is typically reached empirically. Use simulated workloads or canary queries.
7371

7472
You can scale out your cluster to meet peak load demands. Then scale it back down when those extra nodes are no longer needed. The [Autoscale feature](hdinsight-autoscale-clusters.md) allows you to automatically scale your cluster based upon predetermined metrics and timings. For more information on scaling your clusters manually, see [Scale HDInsight clusters](hdinsight-scaling-best-practices.md).
7573

@@ -92,3 +90,4 @@ For more information on managing subscription quotas, see [Requesting quota incr
9290

9391
* [Set up clusters in HDInsight with Apache Hadoop, Spark, Kafka, and more](hdinsight-hadoop-provision-linux-clusters.md): Learn how to set up and configure clusters in HDInsight.
9492
* [Monitor cluster performance](hdinsight-key-scenarios-to-monitor.md): Learn about key scenarios to monitor for your HDInsight cluster that might affect your cluster's capacity.
93+

0 commit comments

Comments
 (0)