Skip to content

Commit add7dde

Browse files
author
Jill Grant
authored
Merge pull request #248380 from spelluru/asacaptureparquet0815
updated steps and screenshots, and more
2 parents 34e8e01 + a697a68 commit add7dde

11 files changed

+40
-20
lines changed

articles/stream-analytics/capture-event-hub-data-parquet.md

Lines changed: 40 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -6,60 +6,80 @@ ms.author: xujiang1
66
ms.service: stream-analytics
77
ms.topic: how-to
88
ms.custom: mvc, event-tier1-build-2022
9-
ms.date: 05/24/2022
9+
ms.date: 08/15/2023
1010
---
1111
# Capture data from Event Hubs in Parquet format
12-
13-
This article explains how to use the no code editor to automatically capture streaming data in Event Hubs in an Azure Data Lake Storage Gen2 account in Parquet format. You have the flexibility of specifying a time or size interval.
12+
This article explains how to use the no code editor to automatically capture streaming data in Event Hubs in an Azure Data Lake Storage Gen2 account in the Parquet format.
1413

1514
## Prerequisites
1615

17-
- Your Azure Event Hubs and Azure Data Lake Storage Gen2 resources must be publicly accessible and can't be behind a firewall or secured in an Azure Virtual Network.
18-
- The data in your Event Hubs must be serialized in either JSON, CSV, or Avro format.
16+
- An Azure Event Hubs namespace with an event hub and an Azure Data Lake Storage Gen2 account with a container to store the captured data. These resources must be publicly accessible and can't be behind a firewall or secured in an Azure virtual network.
17+
18+
If you don't have an event hub, create one by following instructions from [Quickstart: Create an event hub](../event-hubs/event-hubs-create.md).
19+
20+
If you don't have a Data Lake Storage Gen2 account, create one by following instructions from [Create a storage account](../storage/blobs/create-data-lake-storage-account.md)
21+
- The data in your Event Hubs must be serialized in either JSON, CSV, or Avro format. For testing purposes, select **Generate data (preview)** on the left menu, select **Stocks data** for dataset, and then select **Send**.
22+
23+
:::image type="content" source="./media/capture-event-hub-data-parquet/stocks-data.png" alt-text="Screenshot showing the Generate data page to generate sample stocks data." lightbox="./media/capture-event-hub-data-parquet/stocks-data.png":::
1924

2025
## Configure a job to capture data
2126

2227
Use the following steps to configure a Stream Analytics job to capture data in Azure Data Lake Storage Gen2.
2328

2429
1. In the Azure portal, navigate to your event hub.
25-
1. Select **Features** > **Process Data**, and select **Start** on the **Capture data to ADLS Gen2 in Parquet format** card.
30+
1. On the left menu, select **Process Data** under **Features**. Then, select **Start** on the **Capture data to ADLS Gen2 in Parquet format** card.
31+
2632
:::image type="content" source="./media/capture-event-hub-data-parquet/process-event-hub-data-cards.png" alt-text="Screenshot showing the Process Event Hubs data start cards." lightbox="./media/capture-event-hub-data-parquet/process-event-hub-data-cards.png" :::
27-
1. Enter a **name** to identify your Stream Analytics job. Select **Create**.
28-
:::image type="content" source="./media/capture-event-hub-data-parquet/new-stream-analytics-job-name.png" alt-text="Screenshot showing the New Stream Analytics job window where you enter the job name." lightbox="./media/capture-event-hub-data-parquet/new-stream-analytics-job-name.png" :::
29-
1. Specify the **Serialization** type of your data in the Event Hubs and the **Authentication method** that the job will use to connect to Event Hubs. Then select **Connect**.
33+
1. Enter a **name** for your Stream Analytics job, and then select **Create**.
34+
35+
:::image type="content" source="./media/capture-event-hub-data-parquet/new-stream-analytics-job-name.png" alt-text="Screenshot showing the New Stream Analytics job window where you enter the job name." :::
36+
1. Specify the **Serialization** type of your data in the Event Hubs and the **Authentication method** that the job uses to connect to Event Hubs. Then select **Connect**.
37+
3038
:::image type="content" source="./media/capture-event-hub-data-parquet/event-hub-configuration.png" alt-text="Screenshot showing the Event Hubs connection configuration." lightbox="./media/capture-event-hub-data-parquet/event-hub-configuration.png" :::
31-
1. When the connection is established successfully, you'll see:
39+
1. When the connection is established successfully, you see:
3240
- Fields that are present in the input data. You can choose **Add field** or you can select the three dot symbol next to a field to optionally remove, rename, or change its name.
3341
- A live sample of incoming data in the **Data preview** table under the diagram view. It refreshes periodically. You can select **Pause streaming preview** to view a static view of the sample input.
42+
3443
:::image type="content" source="./media/capture-event-hub-data-parquet/edit-fields.png" alt-text="Screenshot showing sample data under Data Preview." lightbox="./media/capture-event-hub-data-parquet/edit-fields.png" :::
3544
1. Select the **Azure Data Lake Storage Gen2** tile to edit the configuration.
3645
1. On the **Azure Data Lake Storage Gen2** configuration page, follow these steps:
3746
1. Select the subscription, storage account name and container from the drop-down menu.
3847
1. Once the subscription is selected, the authentication method and storage account key should be automatically filled in.
48+
1. Select **Parquet** for **Serialization** format.
49+
50+
:::image type="content" source="./media/capture-event-hub-data-parquet/job-top-settings.png" alt-text="Screenshot showing the Data Lake Storage Gen2 configuration page." lightbox="./media/capture-event-hub-data-parquet/job-top-settings.png":::
3951
1. For streaming blobs, the directory path pattern is expected to be a dynamic value. It's required for the date to be a part of the file path for the blob – referenced as `{date}`. To learn about custom path patterns, see to [Azure Stream Analytics custom blob output partitioning](stream-analytics-custom-path-patterns-blob-storage-output.md).
52+
4053
:::image type="content" source="./media/capture-event-hub-data-parquet/blob-configuration.png" alt-text="First screenshot showing the Blob window where you edit a blob's connection configuration." lightbox="./media/capture-event-hub-data-parquet/blob-configuration.png" :::
4154
1. Select **Connect**
42-
1. When the connection is established, you'll see fields that are present in the output data.
55+
1. When the connection is established, you see fields that are present in the output data.
4356
1. Select **Save** on the command bar to save your configuration.
57+
58+
:::image type="content" source="./media/capture-event-hub-data-parquet/save-configuration.png" alt-text="Screenshot showing the Save button selected on the command bar." :::
4459
1. Select **Start** on the command bar to start the streaming flow to capture data. Then in the Start Stream Analytics job window:
4560
1. Choose the output start time.
61+
1. Select the pricing plan.
4662
1. Select the number of Streaming Units (SU) that the job runs with. SU represents the computing resources that are allocated to execute a Stream Analytics job. For more information, see [Streaming Units in Azure Stream Analytics](stream-analytics-streaming-unit-consumption.md).
47-
1. In the **Choose Output data error handling** list, select the behavior you want when the output of the job fails due to data error. Select **Retry** to have the job retry until it writes successfully or select another option.
63+
4864
:::image type="content" source="./media/capture-event-hub-data-parquet/start-job.png" alt-text="Screenshot showing the Start Stream Analytics job window where you set the output start time, streaming units, and error handling." lightbox="./media/capture-event-hub-data-parquet/start-job.png" :::
65+
1. You should see the Stream Analytic job in the **Stream Analytics job** tab of the **Process data** page for your event hub.
4966

50-
## Verify output
51-
Verify that the Parquet files are generated in the Azure Data Lake Storage container.
52-
53-
:::image type="content" source="./media/capture-event-hub-data-parquet/verify-captured-data.png" alt-text="Screenshot showing the generated Parquet files in the ADLS container." lightbox="./media/capture-event-hub-data-parquet/verify-captured-data.png" :::
67+
:::image type="content" source="./media/capture-event-hub-data-parquet/process-data-page-jobs.png" alt-text="Screenshot showing the Stream Analytics job on the Process data page." lightbox="./media/capture-event-hub-data-parquet/process-data-page-jobs.png" :::
68+
5469

70+
## Verify output
5571

56-
The new job is shown on the **Stream Analytics jobs** tab. Select **Open metrics** to monitor it.
72+
1. On the Event Hubs instance page for your event hub, select **Generate data**, select **Stocks data** for dataset, and then select **Send** to send some sample data to the event hub.
73+
1. Verify that the Parquet files are generated in the Azure Data Lake Storage container.
5774

58-
:::image type="content" source="./media/capture-event-hub-data-parquet/open-metrics-link.png" alt-text="Screenshot showing Open Metrics link selected." lightbox="./media/capture-event-hub-data-parquet/open-metrics-link.png" :::
75+
:::image type="content" source="./media/capture-event-hub-data-parquet/verify-captured-data.png" alt-text="Screenshot showing the generated Parquet files in the ADLS container." lightbox="./media/capture-event-hub-data-parquet/verify-captured-data.png" :::
76+
1. Select **Process data** on the left menu. Switch to the **Stream Analytics jobs** tab. Select **Open metrics** to monitor it.
5977

60-
Here's an example screenshot of metrics showing input and output events.
78+
:::image type="content" source="./media/capture-event-hub-data-parquet/open-metrics-link.png" alt-text="Screenshot showing Open Metrics link selected." lightbox="./media/capture-event-hub-data-parquet/open-metrics-link.png" :::
79+
80+
Here's an example screenshot of metrics showing input and output events.
6181

62-
:::image type="content" source="./media/capture-event-hub-data-parquet/job-metrics.png" alt-text="Screenshot showing metrics of the Stream Analytics job." lightbox="./media/capture-event-hub-data-parquet/job-metrics.png" :::
82+
:::image type="content" source="./media/capture-event-hub-data-parquet/job-metrics.png" alt-text="Screenshot showing metrics of the Stream Analytics job." lightbox="./media/capture-event-hub-data-parquet/job-metrics.png" :::
6383

6484
## Next steps
6585

-15.8 KB
Loading
-16.2 KB
Loading
-4.96 KB
Loading
52.3 KB
Loading
290 Bytes
Loading
59.1 KB
Loading
16.2 KB
Loading
49.2 KB
Loading
11.3 KB
Loading

0 commit comments

Comments
 (0)