Skip to content

Commit 8b60b74

Browse files
authored
Merge pull request #292397 from sreekzz/storage-tab-image-change
MSI for SQL DB
2 parents 8279076 + cde2706 commit 8b60b74

File tree

12 files changed

+141
-37
lines changed

12 files changed

+141
-37
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,8 @@ items:
215215
href: ./interactive-query/hive-migration-across-storage-accounts.md
216216
- name: Manage
217217
items:
218+
- name: Use Managed Identity for SQL Database authentication in Azure HDInsight
219+
href: .//use-managed-identity-for-sql-database-authentication-in-azure-hdinsight.md
218220
- name: Manage clusters using the Apache Ambari web UI
219221
href: ./hdinsight-hadoop-manage-ambari.md
220222
- name: Disable auto logout from Ambari Web UI
Loading

articles/hdinsight/hdinsight-custom-ambari-db.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ description: Learn how to create HDInsight clusters with your own custom Apache
44
ms.service: azure-hdinsight
55
ms.custom: hdinsightactive
66
ms.topic: how-to
7-
ms.date: 09/06/2024
7+
ms.date: 12/27/2024
88
---
99
# Set up HDInsight clusters with a custom Ambari DB
1010

1111
Apache Ambari simplifies the management and monitoring of an Apache Hadoop cluster. Ambari provides an easy to use web UI and REST API. Ambari is included on HDInsight clusters, and is used to monitor the cluster and make configuration changes.
1212

13-
In normal cluster creation, as described in other articles such as [Set up clusters in HDInsight](hdinsight-hadoop-provision-linux-clusters.md), Ambari is deployed in an [S0 Azure SQL Database](/azure/azure-sql/database/resource-limits-dtu-single-databases#standard-service-tier) that is managed by HDInsight and is not accessible to users.
13+
In normal cluster creation, as described in other articles such as [Set up clusters in HDInsight](hdinsight-hadoop-provision-linux-clusters.md), Ambari is deployed in an [S0 Azure SQL Database](/azure/azure-sql/database/resource-limits-dtu-single-databases#standard-service-tier) managed by HDInsight and isn't accessible to users.
1414

1515
The custom Ambari DB feature allows you to deploy a new cluster and setup Ambari in an external database that you manage. The deployment is done with an Azure Resource Manager template. This feature has the following benefits:
1616

@@ -25,11 +25,11 @@ The remainder of this article discusses the following points:
2525

2626
## Custom Ambari DB requirements
2727

28-
You can deploy a custom Ambari DB with all cluster types and versions. Multiple clusters cannot use the same Ambari DB.
28+
You can deploy a custom Ambari DB with all cluster types and versions. Multiple clusters can't use the same Ambari DB.
2929

3030
The custom Ambari DB has the following other requirements:
3131

32-
- The name of the database cannot contain hyphens or spaces
32+
- The name of the database can't contain hyphens or spaces
3333
- You must have an existing Azure SQL DB server and database.
3434
- The database that you provide for Ambari setup must be empty. There should be no tables in the default dbo schema.
3535
- The user used to connect to the database should have **SELECT, CREATE TABLE, INSERT, UPDATE, DELETE, ALTER ON SCHEMA and REFERENCES ON SCHEMA** permissions on the database.
@@ -50,7 +50,11 @@ When you host your Apache Ambari DB in an external database, remember the follow
5050

5151
- You're responsible for the extra costs of the Azure SQL DB that holds Ambari.
5252
- Back up your custom Ambari DB periodically. Azure SQL Database generates backups automatically, but the backup retention time-frame varies. For more information, see [Learn about automatic SQL Database backups](/azure/azure-sql/database/automated-backups-overview).
53-
- Don't change the custom Ambari DB password after the HDInsight cluster reaches the **Running** state. It is not supported.
53+
- Don't change the custom Ambari DB password after the HDInsight cluster reaches the **Running** state. It isn't supported.
54+
55+
> [!NOTE]
56+
> You can use Managed Identity to authenticate with SQL database for Ambari. For more information, see [Use Managed Identity for SQL Database authentication in Azure HDInsight](./use-managed-identity-for-sql-database-authentication-in-azure-hdinsight.md)
57+
5458

5559
## Deploy clusters with a custom Ambari DB
5660

@@ -69,10 +73,9 @@ az deployment group create --name HDInsightAmbariDBDeployment \
6973

7074

7175
> [!WARNING]
72-
> Please use the following recommended SQL DB and Headnode VM for your HDInsight cluster. Please don't use default Ambari DB (S0) for any production environment.
76+
> Use the following recommended SQL DB and Headnode VM for your HDInsight cluster. Don't use default Ambari DB (S0) for any production environment.
7377
>
7478
75-
7679
## Database and Headnode sizing
7780

7881
The following table provides guidelines on which Azure SQL DB tier to select based on the size of your HDInsight cluster.

articles/hdinsight/hdinsight-hadoop-provision-linux-clusters.md

Lines changed: 24 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Set up Hadoop, Kafka, Spark, or HBase clusters for HDInsight from a
44
ms.service: azure-hdinsight
55
ms.topic: conceptual
66
ms.custom: hdinsightactive, devx-track-azurepowershell, linux-related-content
7-
ms.date: 04/11/2024
7+
ms.date: 01/08/2025
88
---
99

1010
# Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more
@@ -16,9 +16,9 @@ Learn how to set up and configure Apache Hadoop, Apache Spark, Apache Kafka, Int
1616
A Hadoop cluster consists of several virtual machines (nodes) that are used for distributed processing of tasks. Azure HDInsight handles implementation details of installation and configuration of individual nodes, so you only have to provide general configuration information.
1717

1818
> [!IMPORTANT]
19-
> HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use. Learn how to [delete a cluster.](hdinsight-delete-cluster.md)
19+
> HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it's no longer in use. Learn how to [delete a cluster.](hdinsight-delete-cluster.md)
2020
21-
If you're using multiple clusters together, you'll want to create a virtual network, and if you're using a Spark cluster you'll also want to use the Hive Warehouse Connector. For more information, see [Plan a virtual network for Azure HDInsight](./hdinsight-plan-virtual-network-deployment.md) and [Integrate Apache Spark and Apache Hive with the Hive Warehouse Connector](interactive-query/apache-hive-warehouse-connector.md).
21+
If you're using multiple clusters together, you want to create a virtual network, and if you're using a Spark cluster you also want to use the Hive Warehouse Connector. For more information, see [Plan a virtual network for Azure HDInsight](./hdinsight-plan-virtual-network-deployment.md) and [Integrate Apache Spark and Apache Hive with the Hive Warehouse Connector](interactive-query/apache-hive-warehouse-connector.md).
2222

2323
## Cluster setup methods
2424

@@ -64,7 +64,7 @@ You don't need to specify the cluster location explicitly: The cluster is in the
6464
Azure HDInsight currently provides the following cluster types, each with a set of components to provide certain functionalities.
6565

6666
> [!IMPORTANT]
67-
> HDInsight clusters are available in various types, each for a single workload or technology. There is no supported method to create a cluster that combines multiple types, such HBase on one cluster. If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](../virtual-network/index.yml) can connect the required cluster types.
67+
> HDInsight clusters are available in various types, each for a single workload or technology. There's no supported method to create a cluster that combines multiple types, such HBase on one cluster. If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](../virtual-network/index.yml) can connect the required cluster types.
6868
6969
| Cluster type | Functionality |
7070
| --- | --- |
@@ -82,7 +82,7 @@ Choose the version of HDInsight for this cluster. For more information, see [Sup
8282

8383
With HDInsight clusters, you can configure two user accounts during cluster creation:
8484

85-
* Cluster login username: The default username is *admin*. It uses the basic configuration on the Azure portal. Sometimes it's called "Cluster user," or "HTTP user."
85+
* Cluster login username: The default username is *admin*. It uses the basic configuration on the Azure portal. Also called as "Cluster user," or "HTTP user."
8686
* Secure Shell (SSH) username: Used to connect to the cluster through SSH. For more information, see [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md).
8787

8888
The HTTP username has the following restrictions:
@@ -113,14 +113,14 @@ HDInsight clusters can use the following storage options:
113113
For more information on storage options with HDInsight, see [Compare storage options for use with Azure HDInsight clusters](hdinsight-hadoop-compare-storage-options.md).
114114

115115
> [!WARNING]
116-
> Using an additional storage account in a different location from the HDInsight cluster is not supported.
116+
> Using more storage account in a different location from the HDInsight cluster isn't supported.
117117
118-
During configuration, for the default storage endpoint you specify a blob container of an Azure Storage account or Data Lake Storage. The default storage contains application and system logs. Optionally, you can specify additional linked Azure Storage accounts and Data Lake Storage accounts that the cluster can access. The HDInsight cluster and the dependent storage accounts must be in the same Azure location.
118+
During configuration, for the default storage endpoint you specify a blob container of an Azure Storage account or Data Lake Storage. The default storage contains application and system logs. Optionally, you can specify more linked Azure Storage accounts and Data Lake Storage accounts that the cluster can access. The HDInsight cluster and the dependent storage accounts must be in the same Azure location.
119119

120120
[!INCLUDE [secure-transfer-enabled-storage-account](includes/hdinsight-secure-transfer.md)]
121121

122122
> [!IMPORTANT]
123-
> Enabling secure storage transfer after creating a cluster can result in errors using your storage account and is not recommended. It is better to create a new cluster using a storage account with secure transfer already enabled.
123+
> Enabling secure storage transfer after creating a cluster can result in errors using your storage account and isn't recommended. It's better to create a new cluster using a storage account with secure transfer already enabled.
124124
125125
> [!Note]
126126
> Azure HDInsight does not automatically transfer, move or copy your data stored in Azure Storage from one region to another.
@@ -132,27 +132,33 @@ You can create optional Hive or Apache Oozie metastores. However, not all cluste
132132
For more information, see [Use external metadata stores in Azure HDInsight](./hdinsight-use-external-metadata-stores.md).
133133

134134
> [!IMPORTANT]
135-
> When you create a custom metastore, don't use dashes, hyphens, or spaces in the database name. This can cause the cluster creation process to fail.
135+
> When you create a custom metastore, don't use dashes, hyphens, or spaces in the database name. This such characters can cause the cluster creation process to fail.
136136
137137
#### SQL database for Hive
138138

139139
If you want to retain your Hive tables after you delete an HDInsight cluster, use a custom metastore. You can then attach the metastore to another HDInsight cluster.
140140

141141
An HDInsight metastore that is created for one HDInsight cluster version can't be shared across different HDInsight cluster versions. For a list of HDInsight versions, see [Supported HDInsight versions](hdinsight-component-versioning.md#supported-hdinsight-versions).
142142

143+
You can use Managed Identity to authenticate with SQL database for Hive. For more information, see [Use Managed Identity for SQL Database authentication in Azure HDInsight](./use-managed-identity-for-sql-database-authentication-in-azure-hdinsight.md)
144+
143145
> [!IMPORTANT]
144146
> The default metastore provides an Azure SQL Database with a **basic tier 5 DTU limit (not upgradeable)**! Suitable for basic testing purposes. For large or production workloads, we recommend migrating to an external metastore.
145147
146148
#### SQL database for Oozie
147149

148150
To increase performance when using Oozie, use a custom metastore. A metastore can also provide access to Oozie job data after you delete your cluster.
149151

152+
You can use Managed Identity to authenticate with SQL database for Oozie. For more information, see [Use Managed Identity for SQL Database authentication in Azure HDInsight](./use-managed-identity-for-sql-database-authentication-in-azure-hdinsight.md)
153+
150154
#### SQL database for Ambari
151155

152-
Ambari is used to monitor HDInsight clusters, make configuration changes, and store cluster management information as well as job history. The custom Ambari DB feature allows you to deploy a new cluster and setup Ambari in an external database that you manage. For more information, see [Custom Ambari DB](./hdinsight-custom-ambari-db.md).
156+
Ambari is used to monitor HDInsight clusters, make configuration changes, and store cluster management information and job history. The custom Ambari DB feature allows you to deploy a new cluster and setup Ambari in an external database that you manage. For more information, see [Custom Ambari DB](./hdinsight-custom-ambari-db.md).
157+
158+
You can use Managed Identity to authenticate with SQL database for Ambari. For more information, see [Use Managed Identity for SQL Database authentication in Azure HDInsight](./use-managed-identity-for-sql-database-authentication-in-azure-hdinsight.md)
153159

154160
> [!IMPORTANT]
155-
> You cannot reuse a custom Oozie metastore. To use a custom Oozie metastore, you must provide an empty Azure SQL Database when creating the HDInsight cluster.
161+
> You can't reuse a custom Oozie metastore. To use a custom Oozie metastore, you must provide an empty Azure SQL Database when creating the HDInsight cluster.
156162
157163
## Security + networking
158164

@@ -194,7 +200,7 @@ For more information, see [Managed identities in Azure HDInsight](./hdinsight-ma
194200

195201
:::image type="content" source="./media/hdinsight-hadoop-provision-linux-clusters/azure-portal-cluster-configuration-disk-attach.png" alt-text="HDInsight choose your node size.":::
196202

197-
You're billed for node usage for as long as the cluster exists. Billing starts when a cluster is created and stops when the cluster is deleted. Clusters can't be de-allocated or put on hold.
203+
You're billed for node usage for as long as the cluster exists. Billing starts when a cluster is created and stops when the cluster is deleted. Clusters can't be deallocated or put on hold.
198204

199205
### Node configuration
200206

@@ -208,7 +214,7 @@ Each cluster type has its own number of nodes, terminology for nodes, and defaul
208214

209215
For more information, see [Default node configuration and virtual machine sizes for clusters](hdinsight-supported-node-configuration.md) in "What are the Hadoop components and versions in HDInsight?"
210216

211-
The cost of HDInsight clusters is determined by the number of nodes and the virtual machines sizes for the nodes.
217+
The cost of HDInsight clusters determined by the number of nodes and the virtual machines sizes for the nodes.
212218

213219
Different cluster types have different node types, numbers of nodes, and node sizes:
214220

@@ -245,9 +251,9 @@ For more information, see [Sizes for virtual machines](/azure/virtual-machines/s
245251
> The added disks are only configured for node manager local directories and **not for datanode directories**
246252
247253

248-
HDInsight cluster comes with pre-defined disk space based on SKU. If you run some large applications, can lead to insufficient disk space, with disk full error - `LinkId=221672#ERROR_NOT_ENOUGH_DISK_SPACE` and job failures.
254+
HDInsight cluster comes with predefined disk space based on SKU. If you run some large applications, can lead to insufficient disk space, with disk full error - `LinkId=221672#ERROR_NOT_ENOUGH_DISK_SPACE` and job failures.
249255

250-
More discs can be added to the cluster using the new feature **NodeManager**’s local directory. At the time of Hive and Spark cluster creation, the number of discs can be selected and added to the worker nodes. The selected disk, which will be of size 1TB each, would be part of **NodeManager**'s local directories.
256+
More discs can be added to the cluster using the new feature **NodeManager**’s local directory. At the time of Hive and Spark cluster creation, the number of discs can be selected and added to the worker nodes. The selected disk, which can be of size 1 TB each, would be part of **NodeManager**'s local directories.
251257

252258
1. From **Configuration + pricing** tab
253259
1. Select **Enable managed disk** option
@@ -260,18 +266,18 @@ You can verify the number of disks from **Review + create** tab, under **Cluster
260266

261267
HDInsight application is an application, that users can install on a Linux-based HDInsight cluster. You can use applications provided by Microsoft, third parties, or developed by you. For more information, see [Install third-party Apache Hadoop applications on Azure HDInsight](hdinsight-apps-install-applications.md).
262268

263-
Most of the HDInsight applications are installed on an empty edge node. An empty edge node is a Linux virtual machine with the same client tools installed and configured as in the head node. You can use the edge node for accessing the cluster, testing your client applications, and hosting your client applications. For more information, see [Use empty edge nodes in HDInsight](hdinsight-apps-use-edge-node.md).
269+
Most of the HDInsight applications are installed on an empty edge node. An empty edge node is a Linux virtual machine with the same client tools installed and configured as in the head node. You can use the edge node for accessing the cluster, testing your client applications, and hosting your client applications. For more information, see [Use empty edge nodes in HDInsight](hdinsight-apps-use-edge-node.md).
264270

265271
### Script actions
266272

267-
You can install additional components or customize cluster configuration by using scripts during creation. Such scripts are invoked via **Script Action**, which is a configuration option that can be used from the Azure portal, HDInsight Windows PowerShell cmdlets, or the HDInsight .NET SDK. For more information, see [Customize HDInsight cluster using Script Action](hdinsight-hadoop-customize-cluster-linux.md).
273+
You can install more components or customize cluster configuration by using scripts during creation. Such scripts are invoked via **Script Action**, which is a configuration option that can be used from the Azure portal, HDInsight Windows PowerShell cmdlets, or the HDInsight .NET SDK. For more information, see [Customize HDInsight cluster using Script Action](hdinsight-hadoop-customize-cluster-linux.md).
268274

269275
Some native Java components, like Apache Mahout and Cascading, can be run on the cluster as Java Archive (JAR) files. These JAR files can be distributed to Azure Storage and submitted to HDInsight clusters with Hadoop job submission mechanisms. For more information, see [Submit Apache Hadoop jobs programmatically](hadoop/submit-apache-hadoop-jobs-programmatically.md).
270276

271277
> [!NOTE]
272278
> If you have issues deploying JAR files to HDInsight clusters, or calling JAR files on HDInsight clusters, contact [Microsoft Support](https://azure.microsoft.com/support/options/).
273279
>
274-
> Cascading is not supported by HDInsight and is not eligible for Microsoft Support. For lists of supported components, see [What's new in the cluster versions provided by HDInsight](hdinsight-component-versioning.md).
280+
> Cascading not supported by HDInsight and not eligible for Microsoft Support. For lists of supported components, see [What's new in the cluster versions provided by HDInsight](hdinsight-component-versioning.md).
275281
276282
Sometimes, you want to configure the following configuration files during the creation process:
277283

0 commit comments

Comments
 (0)