Skip to content

Commit 1c80847

Browse files
author
Jill Grant
authored
Merge pull request #292085 from spelluru/asafreshness1217
ASA Freshness, Acrolynx, Learn Linter
2 parents ad6e799 + ce62a1d commit 1c80847

16 files changed

+69
-101
lines changed

articles/stream-analytics/event-hubs-parquet-capture-tutorial.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: xujxu
55
ms.author: xujiang1
66
ms.service: azure-stream-analytics
77
ms.topic: tutorial
8-
ms.date: 08/03/2023
8+
ms.date: 12/17/2024
99
---
1010

1111
# Tutorial: Capture Event Hubs data in parquet format and analyze with Azure Synapse Analytics
@@ -31,8 +31,8 @@ Before you start, make sure you've completed the following steps:
3131

3232
## Use no code editor to create a Stream Analytics job
3333
1. Locate the Resource Group in which the TollApp event generator was deployed.
34-
2. Select the Azure Event Hubs **namespace**.
35-
1. On the **Event Hubs Namespace** page, select **Event Hubs** under **Entities** on the left menu.
34+
2. Select the Azure Event Hubs **namespace**. You might want to open it in a separate tab or a window.
35+
1. On the **Event Hubs namespace** page, select **Event Hubs** under **Entities** on the left menu.
3636
1. Select `entrystream` instance.
3737

3838
:::image type="content" source="./media/stream-analytics-no-code/select-event-hub.png" alt-text="Screenshot showing the selection of the event hub." lightbox="./media/stream-analytics-no-code/select-event-hub.png":::
@@ -43,24 +43,27 @@ Before you start, make sure you've completed the following steps:
4343
1. Name your job `parquetcapture` and select **Create**.
4444

4545
:::image type="content" source="./media/stream-analytics-no-code/new-stream-analytics-job.png" alt-text="Screenshot of the New Stream Analytics job page." lightbox="./media/stream-analytics-no-code/new-stream-analytics-job.png":::
46-
1. On the **event hub** configuration page, confirm the following settings, and then select **Connect**.
47-
- *Consumer Group*: Default
48-
- *Serialization type* of your input data: JSON
49-
- *Authentication mode* that the job will use to connect to your event hub: Connection string.
46+
1. On the **event hub** configuration page, follow these steps:
47+
1. For **Consumer group**, select **Use existing**.
48+
1. Confirm that `$Default` consumer group is selected.
49+
1. Confirm that **Serialization** is set to JSON.
50+
1. Confirm that **Authentication method** is set to **Connection String**.
51+
1. Confirm that **Event hub shared access key name** is set to **RootManageSharedAccessKey**.
52+
1. Select **Connect** at the bottom of the window.
5053

5154
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/event-hub-configuration.png" alt-text="Screenshot of the configuration page for your event hub." lightbox="./media/event-hubs-parquet-capture-tutorial/event-hub-configuration.png":::
52-
1. Within few seconds, you'll see sample input data and the schema. You can choose to drop fields, rename fields or change data type.
55+
1. Within few seconds, you'll see sample input data and the schema. You can choose to drop fields, rename fields, or change data type.
5356

5457
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-preview.png" alt-text="Screenshot showing the fields and preview of data." lightbox="./media/event-hubs-parquet-capture-tutorial/data-preview.png":::
5558
1. Select the **Azure Data Lake Storage Gen2** tile on your canvas and configure it by specifying
5659
* Subscription where your Azure Data Lake Gen2 account is located in
5760
* Storage account name, which should be the same ADLS Gen2 account used with your Azure Synapse Analytics workspace done in the Prerequisites section.
5861
* Container inside which the Parquet files will be created.
59-
* Path pattern set to *{date}/{time}*
62+
* For **Delta table path**, specify a name for the table.
6063
* Date and time pattern as the default *yyyy-mm-dd* and *HH*.
6164
* Select **Connect**
6265

63-
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png" alt-text="Screenshot showing the configuration settings for the Data Lake Storage." lightbox="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png":::
66+
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png" alt-text="Screenshot showing the configuration settings for the Data Lake Storage." lightbox="./media/event-hubs-parquet-capture-tutorial/data-lake-storage-settings.png":::
6467
1. Select **Save** in the top ribbon to save your job, and then select **Start** to run your job. Once the job is started, select X in the right corner to close the **Stream Analytics job** page.
6568

6669
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/start-job.png" alt-text="Screenshot showing the Start Stream Analytics Job page." lightbox="./media/event-hubs-parquet-capture-tutorial/start-job.png":::
@@ -70,24 +73,26 @@ Before you start, make sure you've completed the following steps:
7073

7174
## View output in your Azure Data Lake Storage Gen 2 account
7275
1. Locate the Azure Data Lake Storage Gen2 account you had used in the previous step.
73-
2. Select the container you had used in the previous step. You'll see parquet files created based on the *{date}/{time}* path pattern used in the previous step.
76+
2. Select the container you had used in the previous step. You'll see parquet files created in the folder you specified earlier.
7477

75-
:::image type="content" source="./media/stream-analytics-no-code/capture-parquet-files.png" alt-text="Screenshot showing the captured parquet files in Azure Data Lake Storage Gen 2.":::
78+
:::image type="content" source="./media/stream-analytics-no-code/capture-parquet-files.png" alt-text="Screenshot showing the captured parquet files in Azure Data Lake Storage Gen 2." lightbox="./media/stream-analytics-no-code/capture-parquet-files.png":::
7679

7780
## Query captured data in Parquet format with Azure Synapse Analytics
7881
### Query using Azure Synapse Spark
7982
1. Locate your Azure Synapse Analytics workspace and open Synapse Studio.
8083
2. [Create a serverless Apache Spark pool](../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool) in your workspace if one doesn't already exist.
8184
3. In the Synapse Studio, go to the **Develop** hub and create a new **Notebook**.
82-
4. Create a new code cell and paste the following code in that cell. Replace *container* and *adlsname* with the name of the container and ADLS Gen2 account used in the previous step.
85+
86+
:::image type="content" source="./media/stream-analytics-no-code/synapse-studio-develop-notebook.png" alt-text="Screenshot showing the Synapse Studio." :::
87+
1. Create a new code cell and paste the following code in that cell. Replace *container* and *adlsname* with the name of the container and ADLS Gen2 account used in the previous step.
8388
```py
8489
%%pyspark
85-
df = spark.read.load('abfss://[email protected]/*/*/*.parquet', format='parquet')
90+
df = spark.read.load('abfss://[email protected]/*/*.parquet', format='parquet')
8691
display(df.limit(10))
8792
df.count()
8893
df.printSchema()
8994
```
90-
5. For **Attach to** on the toolbar, select your Spark pool from the dropdown list.
95+
1. For **Attach to** on the toolbar, select your Spark pool from the dropdown list.
9196
1. Select **Run All** to see the results
9297

9398
:::image type="content" source="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png" alt-text="Screenshot of spark run results in Azure Synapse Analytics." lightbox="./media/event-hubs-parquet-capture-tutorial/spark-run-all.png":::
@@ -102,7 +107,7 @@ Before you start, make sure you've completed the following steps:
102107
TOP 100 *
103108
FROM
104109
OPENROWSET(
105-
BULK 'https://adlsname.dfs.core.windows.net/container/*/*/*.parquet',
110+
BULK 'https://adlsname.dfs.core.windows.net/container/*/*.parquet',
106111
FORMAT='PARQUET'
107112
) AS [result]
108113
```
9.57 KB
Loading
9.29 KB
Loading
7.72 KB
Loading
13.4 KB
Loading
67.2 KB
Loading
32.7 KB
Loading
31.6 KB
Loading
-9.54 KB
Loading

0 commit comments

Comments
 (0)