You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-hadoop-connect-hive-power-bi-directquery.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,12 +13,12 @@ This article describes how to connect Microsoft Power BI to Azure HDInsight Inte
13
13
14
14
:::image type="content" source="./media/apache-hadoop-connect-hive-power-bi-directquery/hdinsight-power-bi-visualization.png" alt-text="HDInsight Power BI the map report." border="true":::
15
15
16
-
You can use the [Apache Hive ODBC driver](../hadoop/apache-hadoop-connect-hive-power-bi.md) to do import via the generic ODBC connector in Power BI Desktop. However it is not recommended for BI workloads given non-interactive nature of the Hive query engine. [HDInsight Interactive Query connector](./apache-hadoop-connect-hive-power-bi-directquery.md) and [HDInsight Apache Spark connector](/power-bi/spark-on-hdinsight-with-direct-connect) are better choices for their performance.
16
+
You can use the [Apache Hive ODBC driver](../hadoop/apache-hadoop-connect-hive-power-bi.md) to do import via the generic ODBC connector in Power BI Desktop. However it isn't recommended for BI workloads given non-interactive nature of the Hive query engine. [HDInsight Interactive Query connector](./apache-hadoop-connect-hive-power-bi-directquery.md) and [HDInsight Apache Spark connector](/power-bi/spark-on-hdinsight-with-direct-connect) are better choices for their performance.
17
17
18
18
## Prerequisites
19
19
Before going through this article, you must have the following items:
20
20
21
-
***HDInsight cluster**. The cluster can be either an HDInsight cluster with Apache Hive or a newly released Interactive Query cluster. For creating clusters, see [Create cluster](../hadoop/apache-hadoop-linux-tutorial-get-started.md).
21
+
***HDInsight cluster**. The cluster can be either a HDInsight cluster with Apache Hive or a newly released Interactive Query cluster. For creating clusters, see [Create cluster](../hadoop/apache-hadoop-linux-tutorial-get-started.md).
22
22
***[Microsoft Power BI Desktop](https://powerbi.microsoft.com/desktop/)**. You can download a copy from the [Microsoft Download Center](https://www.microsoft.com/download/details.aspx?id=45331).
23
23
24
24
## Load data from HDInsight
@@ -31,7 +31,7 @@ The `hivesampletable` Hive table comes with all HDInsight clusters.
31
31
32
32
:::image type="content" source="./media/apache-hadoop-connect-hive-power-bi-directquery/hdinsight-power-bi-open-odbc.png" alt-text="HDInsight Power BI Get Data More." border="true":::
33
33
34
-
3. From the **Get Data** window, enter **hdinsight** in the search box.
34
+
3. From the `Get Data` window, enter **hdinsight** in the search box.
35
35
36
36
4. From the search results, select **HDInsight Interactive Query**, and then select **Connect**. If you don't see **HDInsight Interactive Query**, you need to update your Power BI Desktop to the latest version.
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-interactive-query-get-started.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,8 +34,8 @@ To execute Hive queries, you have the following options:
34
34
|Microsoft Power BI|See [Visualize Interactive Query Apache Hive data with Power BI in Azure HDInsight](./apache-hadoop-connect-hive-power-bi-directquery.md), and [Visualize big data with Power BI in Azure HDInsight](../hadoop/apache-hadoop-connect-hive-power-bi.md).|
35
35
|Visual Studio|See [Connect to Azure HDInsight and run Apache Hive queries using Data Lake Tools for Visual Studio](../hadoop/apache-hadoop-visual-studio-tools-get-started.md#run-interactive-apache-hive-queries).|
36
36
|Visual Studio Code|See [Use Visual Studio Code for Apache Hive, LLAP, or pySpark](../hdinsight-for-vscode.md).|
37
-
|Apache Ambari Hive View|See [Use Apache Hive View with Apache Hadoop in Azure HDInsight](../hadoop/apache-hadoop-use-hive-ambari-view.md). Hive View is not available for HDInsight 4.0.|
38
-
|Apache Beeline|See [Use Apache Hive with Apache Hadoop in HDInsight with Beeline](../hadoop/apache-hadoop-use-hive-beeline.md). You can use Beeline from either the head node or from an empty edge node. We recommend using Beeline from an empty edge node. For information about creating an HDInsight cluster by using an empty edge node, see [Use empty edge nodes in HDInsight](../hdinsight-apps-use-edge-node.md).|
37
+
|Apache Ambari Hive View|See [Use Apache Hive View with Apache Hadoop in Azure HDInsight](../hadoop/apache-hadoop-use-hive-ambari-view.md). Hive View isn't available for HDInsight 4.0.|
38
+
|Apache Beeline|See [Use Apache Hive with Apache Hadoop in HDInsight with Beeline](../hadoop/apache-hadoop-use-hive-beeline.md). You can use Beeline from either the head node or from an empty edge node. We recommend using Beeline from an empty edge node. For information about creating a HDInsight cluster by using an empty edge node, see [Use empty edge nodes in HDInsight](../hdinsight-apps-use-edge-node.md).|
39
39
|Hive ODBC|See [Connect Excel to Apache Hadoop with the Microsoft Hive ODBC driver](../hadoop/apache-hadoop-connect-excel-hive-odbc-driver.md).|
40
40
41
41
To find the Java Database Connectivity (JDBC) connection string:
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/quickstart-bicep.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ The Bicep file used in this quickstart is from [Azure Quickstart Templates](http
29
29
Two Azure resources are defined in the Bicep file:
30
30
31
31
*[Microsoft.Storage/storageAccounts](/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
32
-
*[Microsoft.HDInsight/cluster](/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
32
+
*[Microsoft.HDInsight/cluster](/azure/templates/microsoft.hdinsight/clusters): create a HDInsight cluster.
33
33
34
34
### Deploy the Bicep file
35
35
@@ -55,13 +55,13 @@ Two Azure resources are defined in the Bicep file:
55
55
You need to provide values for the parameters:
56
56
57
57
* Replace **\<cluster-name\>** with the name of the HDInsight cluster to create.
58
-
* Replace **\<cluster-username\>** with the credentials used to submit jobs to the cluster and to log in to cluster dashboards.
59
-
* Replace **\<ssh-username\>** with the credentials used to remotely access the cluster. The username can not be admin username.
58
+
* Replace **\<cluster-username\>** with the credentials used to submit jobs to the cluster and to sign-in to cluster dashboards.
59
+
* Replace **\<ssh-username\>** with the credentials used to remotely access the cluster. The username can’t be admin username.
60
60
61
-
You are prompted to enter the following password:
61
+
You're prompted to enter the following password:
62
62
63
-
* **clusterLoginPassword**, which must be at least 10 characters long and contain one digit, one uppercase letter, one lowercase letter, and one non-alphanumeric character except single-quote, double-quote, backslash, right-bracket, full-stop. It also must not contain three consecutive characters from the cluster username or SSH username.
64
-
* **sshPassword**, which must be 6-72 characters long and must contain at least one digit, one uppercase letter, and one lowercase letter. It must not contain any three consecutive characters from the cluster login name.
63
+
* **clusterLoginPassword**, which must be at least 10 characters long and contain one digit, one uppercase letter, one lowercase letter, and one nonalphanumeric character except single-quote, double-quote, backslash, right-bracket, full-stop. It also must not contain three consecutive characters from the cluster username or SSH username.
64
+
* **sshPassword**, which must be 6-72 characters long and must contain at least one digit, one uppercase letter, and one lowercase letter. It must not contain any three consecutive characters from the cluster sign in name.
65
65
66
66
> [!NOTE]
67
67
> When the deployment finishes, you should see a message indicating the deployment succeeded.
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-run-machine-learning-automl.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,12 @@ ms.date: 10/17/2024
8
8
9
9
# Run Azure Machine Learning workloads with automated machine learning on Apache Spark in HDInsight
10
10
11
-
Azure Machine Learning simplifies and accelerates the building, training, and deployment of machine learning models. In automated machine learning (AutoML), you start with training data that has a defined target feature. Iterate through combinations of algorithms and feature selections automatically select the best model for your data based on the training scores. HDInsight allows customers to provision clusters with hundreds of nodes. AutoML running on Spark in an HDInsight cluster allows users to use compute capacity across these nodes to run training jobs in a scale-out fashion, and to run multiple training jobs in parallel. It allows users to run AutoML experiments while sharing the compute with their other big data workloads.
11
+
Azure Machine Learning simplifies and accelerates the building, training, and deployment of machine learning models. In automated machine learning (AutoML), you start with training data that has a defined target feature. Iterate through combinations of algorithms and feature selections automatically select the best model for your data based on the training scores. HDInsight allows customers to provision clusters with hundreds of nodes. AutoML running on Spark in a HDInsight cluster allows users to use compute capacity across these nodes to run training jobs in a scale-out fashion, and to run multiple training jobs in parallel. It allows users to run AutoML experiments while sharing the compute with their other big data workloads.
12
12
13
-
## Install Azure Machine Learning on an HDInsight cluster
13
+
## Install Azure Machine Learning on a HDInsight cluster
14
14
15
15
For general tutorials of automated machine learning, see [Tutorial: Use automated machine learning to build your regression model](/azure/machine-learning/tutorial-auto-train-models).
16
-
All new HDInsight-Spark clusters come pre-installed with AzureML-AutoML SDK.
16
+
All new HDInsight-Spark clusters come preinstalled with AzureML-AutoML SDK.
17
17
18
18
> [!Note]
19
19
> Azure Machine Learning packages are installed into Python3 conda environment. The installed Jupyter Notebook should be run using the PySpark3 kernel.
@@ -22,7 +22,7 @@ You can use Zeppelin notebooks to use AutoML as well.
22
22
23
23
## Authentication for workspace
24
24
25
-
Workspace creation and experiment submission require an authentication token. This token can be generated using an [Microsoft Entra application](../../active-directory/develop/app-objects-and-service-principals.md). An [Microsoft Entra user](/azure/developer/python/sdk/authentication-overview) can also be used to generate the required authentication token, if multi-factor authentication isn't enabled on the account.
25
+
Workspace creation and experiment submission require an authentication token. This token can be generated using an [Microsoft Entra application](../../active-directory/develop/app-objects-and-service-principals.md). An [Microsoft Entra user](/azure/developer/python/sdk/authentication-overview) can also be used to generate the required authentication token, if multifactor authentication isn't enabled on the account.
26
26
27
27
The following code snippet creates an authentication token using an **Microsoft Entra application**.
Copy file name to clipboardExpand all lines: articles/hdinsight/use-pig.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ ms.date: 10/17/2024
11
11
12
12
Learn how to use [Apache Pig](https://pig.apache.org/) with HDInsight.
13
13
14
-
Apache Pig is a platform for creating programs for Apache Hadoop by using a procedural language known as *Pig Latin*. Pig is an alternative to Java for creating *MapReduce* solutions, and it is included with Azure HDInsight. Use the following table to discover the various ways that Pig can be used with HDInsight:
14
+
Apache Pig is a platform for creating programs for Apache Hadoop by using a procedural language known as *Pig Latin*. Pig is an alternative to Java for creating *MapReduce* solutions, and it's included with Azure HDInsight. Use the following table to discover the various ways that Pig can be used with HDInsight:
15
15
16
16
## <aid="why"></a>Why use Apache Pig
17
17
@@ -35,7 +35,7 @@ For more information about Pig Latin, see [Pig Latin Reference Manual 1](https:/
35
35
36
36
## <aid="data"></a>Example data
37
37
38
-
HDInsight provides various example data sets, which are stored in the `/example/data` and `/HdiSamples` directories. These directories are in the default storage for your cluster. The Pig example in this document uses the *log4j* file from `/example/data/sample.log`.
38
+
HDInsight provides various example data sets, which are stored in the `/example/data` and `/HdiSamples` directories. These directories are in the default storage for your cluster. The Pig example in this document uses the *Log4j* file from `/example/data/sample.log`.
39
39
40
40
Each log inside the file consists of a line of fields that contains a `[LOG LEVEL]` field to show the type and the severity, for example:
0 commit comments