Skip to content

Commit d2a6f9d

Browse files
authored
Merge pull request #221811 from xujxu/add-delta-lake-output-no-code
add doc for delta lake capture in no-code
2 parents 26424f5 + 77e623d commit d2a6f9d

15 files changed

+78
-1
lines changed

articles/event-hubs/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,8 @@
210210

211211
- name: Process data
212212
items:
213+
- name: Capture Event Hubs data in Delta Lake format
214+
href: ../stream-analytics/capture-event-hub-data-delta-lake.md?toc=/azure/event-hubs/toc.json
213215
- name: Capture Event Hubs data in Parquet format
214216
href: ../stream-analytics/capture-event-hub-data-parquet.md?toc=/azure/event-hubs/toc.json
215217
- name: Materialize data to Azure Cosmos DB

articles/stream-analytics/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,8 @@
235235
href: move-cluster.md
236236
- name: Build with no code editor
237237
items:
238+
- name: Capture Event Hubs data in Delta Lake format
239+
href: capture-event-hub-data-delta-lake.md
238240
- name: Capture Event Hubs data in Parquet format
239241
href: capture-event-hub-data-parquet.md
240242
- name: Materialize data to Azure Cosmos DB
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: Capture data from Event Hubs into Azure Data Lake Storage Gen2 in Delta Lake format
3+
description: Learn how to use the node code editor to automatically capture the streaming data in Event Hubs in an Azure Data Lake Storage Gen2 account in Delta Lake format.
4+
author: xujxu
5+
ms.author: xujiang1
6+
ms.service: stream-analytics
7+
ms.topic: how-to
8+
ms.custom: mvc, event-tier1-build-2022
9+
ms.date: 12/18/2022
10+
---
11+
# Capture data from Event Hubs in Delta Lake format
12+
13+
This article explains how to use the no code editor to automatically capture streaming data in Event Hubs in an Azure Data Lake Storage Gen2 account in Delta Lake format.
14+
15+
## Prerequisites
16+
17+
- Your Azure Event Hubs and Azure Data Lake Storage Gen2 resources must be publicly accessible and can't be behind a firewall or secured in an Azure Virtual Network.
18+
- The data in your Event Hubs must be serialized in either JSON, CSV, or Avro format.
19+
20+
## Configure a job to capture data
21+
22+
Use the following steps to configure a Stream Analytics job to capture data in Azure Data Lake Storage Gen2.
23+
24+
1. In the Azure portal, navigate to your event hub.
25+
1. Select **Features** > **Process Data**, and select **Start** on the **Capture data to ADLS Gen2 in Delta Lake format** card.
26+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/process-event-hub-data-cards.png" alt-text="Screenshot showing the Process Event Hubs data start cards." lightbox="./media/capture-event-hub-data-delta-lake/process-event-hub-data-cards.png" :::
27+
28+
Alternatively, select **Features** > **Capture**, and select **Delta Lake** option under "Output event serialization format", then select **Start data capture configuration**.
29+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/create-job-through-capture-blade.png" alt-text="Screenshot showing the entry point of the capture data creation." lightbox="./media/capture-event-hub-data-delta-lake/create-job-through-capture-blade.png" :::
30+
31+
1. Enter a **name** to identify your Stream Analytics job. Select **Create**.
32+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/new-stream-analytics-job-name.png" alt-text="Screenshot showing the New Stream Analytics job window where you enter the job name." lightbox="./media/capture-event-hub-data-delta-lake/new-stream-analytics-job-name.png" :::
33+
1. Specify the **Serialization** type of your data in the Event Hubs and the **Authentication method** that the job will use to connect to Event Hubs. Then select **Connect**.
34+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/event-hub-configuration.png" alt-text="Screenshot showing the Event Hubs connection configuration." lightbox="./media/capture-event-hub-data-delta-lake/event-hub-configuration.png" :::
35+
1. When the connection is established successfully, you'll see:
36+
- Fields that are present in the input data. You can choose **Add field** or you can select the three dot symbol next to a field to optionally remove, rename, or change its name.
37+
- A live sample of incoming data in the **Data preview** table under the diagram view. It refreshes periodically. You can select **Pause streaming preview** to view a static view of the sample input.
38+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/edit-fields.png" alt-text="Screenshot showing sample data under Data Preview." lightbox="./media/capture-event-hub-data-delta-lake/edit-fields.png" :::
39+
1. Select the **Azure Data Lake Storage Gen2** tile to edit the configuration.
40+
1. On the **Azure Data Lake Storage Gen2** configuration page, follow these steps:
41+
1. Select the subscription, storage account name and container from the drop-down menu.
42+
1. Once the subscription is selected, the authentication method and storage account key should be automatically filled in.
43+
1. For **Delta table path**, it's used to specify the location and name of your Delta Lake table stored in Azure Data Lake Storage Gen2. You can choose to use one or more path segments to define the path to the delta table and the delta table name. To learn more, see to [Write to Delta Lake table](./write-to-delta-lake.md).
44+
1. Select **Connect**.
45+
46+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/blob-configuration.png" alt-text="First screenshot showing the Blob window where you edit a blob's connection configuration." lightbox="./media/capture-event-hub-data-delta-lake/blob-configuration.png" :::
47+
48+
1. When the connection is established, you'll see fields that are present in the output data.
49+
1. Select **Save** on the command bar to save your configuration.
50+
1. Select **Start** on the command bar to start the streaming flow to capture data. Then in the Start Stream Analytics job window:
51+
1. Choose the output start time.
52+
1. Select the number of Streaming Units (SU) that the job runs with. SU represents the computing resources that are allocated to execute a Stream Analytics job. For more information, see [Streaming Units in Azure Stream Analytics](stream-analytics-streaming-unit-consumption.md).
53+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/start-job.png" alt-text="Screenshot showing the Start Stream Analytics job window where you set the output start time, streaming units, and error handling." lightbox="./media/capture-event-hub-data-delta-lake/start-job.png" :::
54+
55+
56+
1. After you select **Start**, the job starts running within two minutes, and the metrics will be open in tab section below.
57+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/metrics-chart-in-tab-section.png" alt-text="Screenshot showing the metrics chart." lightbox="./media/capture-event-hub-data-delta-lake/metrics-chart-in-tab-section.png" :::
58+
59+
1. The new job can be seen on the **Stream Analytics jobs** tab.
60+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/open-metrics-link.png" alt-text="Screenshot showing Open Metrics link selected." lightbox="./media/capture-event-hub-data-delta-lake/open-metrics-link.png" :::
61+
62+
63+
## Verify output
64+
Verify that the parquet files with Delta lake format are generated in the Azure Data Lake Storage container.
65+
66+
:::image type="content" source="./media/capture-event-hub-data-delta-lake/verify-captured-data.png" alt-text="Screenshot showing the generated Parquet files in the ADLS container." lightbox="./media/capture-event-hub-data-delta-lake/verify-captured-data.png" :::
67+
68+
## Next steps
69+
70+
Now you know how to use the Stream Analytics no code editor to create a job that captures Event Hubs data to Azure Data Lake Storage Gen2 in Delta lake format. Next, you can learn more about Azure Stream Analytics and how to monitor the job that you created.
71+
72+
* [Introduction to Azure Stream Analytics](stream-analytics-introduction.md)
73+
* [Monitor Stream Analytics job with Azure portal](stream-analytics-monitoring.md)

articles/stream-analytics/capture-event-hub-data-parquet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Capture data from Azure Data Lake Storage Gen2 in Parquet format
2+
title: Capture data from Event Hubs into Azure Data Lake Storage Gen2 in Parquet format
33
description: Learn how to use the node code editor to automatically capture the streaming data in Event Hubs in an Azure Data Lake Storage Gen2 account in Parquet format.
44
author: xujxu
55
ms.author: xujiang1
37.7 KB
Loading
75.4 KB
Loading
95.2 KB
Loading
73 KB
Loading
70.8 KB
Loading
14 KB
Loading

0 commit comments

Comments
 (0)