You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/event-hubs-parquet-capture-tutorial.md
+21-16Lines changed: 21 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: xujxu
5
5
ms.author: xujiang1
6
6
ms.service: azure-stream-analytics
7
7
ms.topic: tutorial
8
-
ms.date: 08/03/2023
8
+
ms.date: 12/17/2024
9
9
---
10
10
11
11
# Tutorial: Capture Event Hubs data in parquet format and analyze with Azure Synapse Analytics
@@ -31,8 +31,8 @@ Before you start, make sure you've completed the following steps:
31
31
32
32
## Use no code editor to create a Stream Analytics job
33
33
1. Locate the Resource Group in which the TollApp event generator was deployed.
34
-
2. Select the Azure Event Hubs **namespace**.
35
-
1. On the **Event Hubs Namespace** page, select **Event Hubs** under **Entities** on the left menu.
34
+
2. Select the Azure Event Hubs **namespace**. You might want to open it in a separate tab or a window.
35
+
1. On the **Event Hubs namespace** page, select **Event Hubs** under **Entities** on the left menu.
36
36
1. Select `entrystream` instance.
37
37
38
38
:::image type="content" source="./media/stream-analytics-no-code/select-event-hub.png" alt-text="Screenshot showing the selection of the event hub." lightbox="./media/stream-analytics-no-code/select-event-hub.png":::
@@ -43,24 +43,27 @@ Before you start, make sure you've completed the following steps:
43
43
1. Name your job `parquetcapture` and select **Create**.
44
44
45
45
:::image type="content" source="./media/stream-analytics-no-code/new-stream-analytics-job.png" alt-text="Screenshot of the New Stream Analytics job page." lightbox="./media/stream-analytics-no-code/new-stream-analytics-job.png":::
46
-
1. On the **event hub** configuration page, confirm the following settings, and then select **Connect**.
47
-
-*Consumer Group*: Default
48
-
-*Serialization type* of your input data: JSON
49
-
-*Authentication mode* that the job will use to connect to your event hub: Connection string.
46
+
1. On the **event hub** configuration page, follow these steps:
47
+
1. For **Consumer group**, select **Use existing**.
48
+
1. Confirm that `$Default` consumer group is selected.
49
+
1. Confirm that **Serialization** is set to JSON.
50
+
1. Confirm that **Authentication method** is set to **Connection String**.
51
+
1. Confirm that **Event hub shared access key name** is set to **RootManageSharedAccessKey**.
52
+
1. Select **Connect** at the bottom of the window.
50
53
51
54
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/event-hub-configuration.png" alt-text="Screenshot of the configuration page for your event hub." lightbox="./media/event-hubs-parquet-capture-tutorial/event-hub-configuration.png":::
52
-
1. Within few seconds, you'll see sample input data and the schema. You can choose to drop fields, rename fields or change data type.
55
+
1. Within few seconds, you'll see sample input data and the schema. You can choose to drop fields, rename fields, or change data type.
53
56
54
57
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-preview.png" alt-text="Screenshot showing the fields and preview of data." lightbox="./media/event-hubs-parquet-capture-tutorial/data-preview.png":::
55
58
1. Select the **Azure Data Lake Storage Gen2** tile on your canvas and configure it by specifying
56
59
* Subscription where your Azure Data Lake Gen2 account is located in
57
60
* Storage account name, which should be the same ADLS Gen2 account used with your Azure Synapse Analytics workspace done in the Prerequisites section.
58
61
* Container inside which the Parquet files will be created.
59
-
*Path pattern set to *{date}/{time}*
62
+
*For **Delta table path**, specify a name for the table.
60
63
* Date and time pattern as the default *yyyy-mm-dd* and *HH*.
61
64
* Select **Connect**
62
65
63
-
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png" alt-text="Screenshot showing the configuration settings for the Data Lake Storage." lightbox="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png":::
66
+
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png" alt-text="Screenshot showing the configuration settings for the Data Lake Storage." lightbox="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png":::
64
67
1. Select **Save** in the top ribbon to save your job, and then select **Start** to run your job. Once the job is started, select X in the right corner to close the **Stream Analytics job** page.
@@ -70,24 +73,26 @@ Before you start, make sure you've completed the following steps:
70
73
71
74
## View output in your Azure Data Lake Storage Gen 2 account
72
75
1. Locate the Azure Data Lake Storage Gen2 account you had used in the previous step.
73
-
2. Select the container you had used in the previous step. You'll see parquet files created based on the *{date}/{time}* path pattern used in the previous step.
76
+
2. Select the container you had used in the previous step. You'll see parquet files created in the folder you specified earlier.
74
77
75
-
:::image type="content" source="./media/stream-analytics-no-code/capture-parquet-files.png" alt-text="Screenshot showing the captured parquet files in Azure Data Lake Storage Gen 2.":::
78
+
:::image type="content" source="./media/stream-analytics-no-code/capture-parquet-files.png" alt-text="Screenshot showing the captured parquet files in Azure Data Lake Storage Gen 2." lightbox="./media/stream-analytics-no-code/capture-parquet-files.png":::
76
79
77
80
## Query captured data in Parquet format with Azure Synapse Analytics
78
81
### Query using Azure Synapse Spark
79
82
1. Locate your Azure Synapse Analytics workspace and open Synapse Studio.
80
83
2.[Create a serverless Apache Spark pool](../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool) in your workspace if one doesn't already exist.
81
84
3. In the Synapse Studio, go to the **Develop** hub and create a new **Notebook**.
82
-
4. Create a new code cell and paste the following code in that cell. Replace *container* and *adlsname* with the name of the container and ADLS Gen2 account used in the previous step.
85
+
86
+
:::image type="content" source="./media/stream-analytics-no-code/synapse-studio-develop-notebook.png" alt-text="Screenshot showing the Synapse Studio." :::
87
+
1. Create a new code cell and paste the following code in that cell. Replace *container* and *adlsname* with the name of the container and ADLS Gen2 account used in the previous step.
5. For **Attach to** on the toolbar, select your Spark pool from the dropdown list.
95
+
1. For **Attach to** on the toolbar, select your Spark pool from the dropdown list.
91
96
1. Select **Run All** to see the results
92
97
93
98
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png" alt-text="Screenshot of spark run results in Azure Synapse Analytics." lightbox="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png":::
@@ -102,7 +107,7 @@ Before you start, make sure you've completed the following steps:
0 commit comments