You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/event-hubs-parquet-capture-tutorial.md
+18-14Lines changed: 18 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,21 +1,21 @@
1
1
---
2
-
title: Capture Event Hubs data to ADLSG2 in parquet format
3
-
description: Use no code editor to capture Event Hubs data in parquet format
2
+
title: Capture Event Hubs data to ADLS in parquet format
3
+
description: Shows you how to use the Stream Analytics no code editor to create a job that captures Event Hubs data in to Azure Data Lake Storage Gen2 in the parquet format.
4
4
author: xujxu
5
5
ms.author: xujiang1
6
6
ms.service: stream-analytics
7
-
ms.topic: how-to
8
-
ms.date: 05/25/2022
7
+
ms.topic: tutorial
8
+
ms.date: 08/02/2022
9
9
ms.custom: seodec18
10
10
---
11
11
12
-
# Capture Event Hubs data in parquet format and analyze with Azure Synapse Analytics
13
-
This tutorial shows how you can use the Stream Analytics no code editor to capture Event Hubs data in Azure Data Lake Storage Gen2 in parquet format.
12
+
# Tutorial: Capture Event Hubs data in parquet format and analyze with Azure Synapse Analytics
13
+
This tutorial shows you how to use the Stream Analytics no code editor to create a job that captures Event Hubs data in to Azure Data Lake Storage Gen2 in the parquet format.
14
14
15
15
In this tutorial, you learn how to:
16
16
17
17
> [!div class="checklist"]
18
-
> * Deploy an event generator that sends data to your event hub
18
+
> * Deploy an event generator that sends sample events to an event hub
19
19
> * Create a Stream Analytics job using the no code editor
20
20
> * Review input data and schema
21
21
> * Configure Azure Data Lake Storage Gen2 to which event hub data will be captured
@@ -27,21 +27,21 @@ In this tutorial, you learn how to:
27
27
Before you start, make sure you've completed the following steps:
28
28
29
29
* If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/).
30
-
* Deploy the TollApp event generator to Azure, use this link to [Deploy TollApp Azure Template](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-stream-analytics%2Fmaster%2FSamples%2FTollApp%2FVSProjects%2FTollAppDeployment%2Fazuredeploy.json). Set the 'interval' parameter to 1. And use a new resource group for this step.
30
+
*[Deploy the TollApp event generator app to Azure](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-stream-analytics%2Fmaster%2FSamples%2FTollApp%2FVSProjects%2FTollAppDeployment%2Fazuredeploy.json). Set the 'interval' parameter to 1, and use a new resource group for this step.
31
31
* Create an [Azure Synapse Analytics workspace](../synapse-analytics/get-started-create-workspace.md) with a Data Lake Storage Gen2 account.
32
32
33
33
## Use no code editor to create a Stream Analytics job
34
34
1. Locate the Resource Group in which the TollApp event generator was deployed.
35
35
2. Select the Azure Event Hubs **namespace**.
36
36
1. On the **Event Hubs Namespace** page, select **Event Hubs** under **Entities** on the left menu.
37
-
1. Select **entrystream** instance.
37
+
1. Select `entrystream` instance.
38
38
39
39
:::image type="content" source="./media/stream-analytics-no-code/select-event-hub.png" alt-text="Screenshot showing the selection of the event hub." lightbox="./media/stream-analytics-no-code/select-event-hub.png":::
40
40
3. On the **Event Hubs instance** page, select **Process data** in the **Features** section on the left menu.
41
41
1. Select **Start** on the **Capture data to ADLS Gen2 in Parquet format** tile.
42
42
43
43
:::image type="content" source="./media/stream-analytics-no-code/parquet-capture-start.png" alt-text="Screenshot showing the selection of the **Capture data to ADLS Gen2 in Parquet format** tile." lightbox="./media/stream-analytics-no-code/parquet-capture-start.png":::
44
-
1. Name your job **parquetcapture** and select **Create**.
44
+
1. Name your job `parquetcapture` and select **Create**.
45
45
46
46
:::image type="content" source="./media/stream-analytics-no-code/new-stream-analytics-job.png" alt-text="Screenshot of the New Stream Analytics job page." lightbox="./media/stream-analytics-no-code/new-stream-analytics-job.png":::
47
47
1. On the **event hub** configuration page, confirm the following settings, and then select **Connect**.
@@ -62,7 +62,7 @@ Before you start, make sure you've completed the following steps:
62
62
* Select **Connect**
63
63
64
64
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png" alt-text="Screenshot showing the configuration settings for the Data Lake Storage." lightbox="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png":::
65
-
1. Select **Save** in the top ribbon to save your job and then select **Start**. Set Streaming Unit count to 3 and then Select **Start**to run your job.
65
+
1. Select **Save** in the top ribbon to save your job, and then select **Start** to run your job. Once the job is started, select X in the right corner to close the **Stream Analytics job** page.
1. You'll then see a list of all Stream Analytics jobs created using the no code editor. And within two minutes, your job will go to a **Running** state. Select the **Refresh** button on the page to see the status changing from Created -> Starting -> Running.
@@ -72,7 +72,8 @@ Before you start, make sure you've completed the following steps:
72
72
## View output in your Azure Data Lake Storage Gen 2 account
73
73
1. Locate the Azure Data Lake Storage Gen2 account you had used in the previous step.
74
74
2. Select the container you had used in the previous step. You'll see parquet files created based on the *{date}/{time}* path pattern used in the previous step.
75
-
[](./media/stream-analytics-no-code/capture-parquet-files.png#lightbox)
75
+
76
+
:::image type="content" source="./media/stream-analytics-no-code/capture-parquet-files.png" alt-text="Screenshot showing the captured parquet files in Azure Data Lake Storage Gen 2.":::
76
77
77
78
## Query captured data in Parquet format with Azure Synapse Analytics
78
79
### Query using Azure Synapse Spark
@@ -87,13 +88,16 @@ Before you start, make sure you've completed the following steps:
87
88
df.count()
88
89
df.printSchema()
89
90
```
90
-
5. Select **Run All** to see the results
91
+
5. For **Attach to** on the toolbar, select your Spark pool from the dropdown list.
92
+
1. Select **Run All** to see the results
91
93
92
94
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png" alt-text="Screenshot of spark run results in Azure Synapse Analytics." lightbox="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png":::
93
95
94
96
### Query using Azure Synapse Serverless SQL
95
97
1. In the **Develop** hub, create a new **SQL script**.
96
-
2. Paste the following script and**Run** it using the **Built-in** serverless SQL endpoint. Replace *container*and*adlsname*with the name of the container andADLS Gen2 account used in the previous step.
98
+
99
+
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/develop-sql-script.png" alt-text="Screenshot showing the Develop page with new SQL script menu selected.":::
100
+
1. Paste the following script and**Run** it using the **Built-in** serverless SQL endpoint. Replace *container*and*adlsname*with the name of the container andADLS Gen2 account used in the previous step.
0 commit comments