Skip to content

Commit 7bed598

Browse files
authored
Merge pull request #199638 from sudhanshuvishodia/suvishod_Integration_using_ADF
Ingestion Using ADF
2 parents f96a218 + 476c2d8 commit 7bed598

10 files changed

+126
-0
lines changed

articles/postgresql/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -721,6 +721,8 @@
721721
items:
722722
- name: Azure Stream Analytics (ASA)
723723
href: hyperscale/howto-ingest-azure-stream-analytics.md
724+
- name: Azure Data Factory (ADF)
725+
href: hyperscale/howto-ingest-azure-data-factory.md
724726
- name: Server group size
725727
items:
726728
- name: Pick initial size
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
---
2+
title: Azure Data Factory
3+
description: Step-by-step guide for using Azure Data Factory for ingestion on Hyperscale Citus
4+
ms.author: suvishod
5+
author: sudhanshuvishodia
6+
ms.service: postgresql
7+
ms.subservice: hyperscale-citus
8+
ms.topic: how-to
9+
ms.date: 06/27/2022
10+
---
11+
12+
# How to ingest data using Azure Data Factory
13+
14+
[Azure Data Factory](../../data-factory/introduction.md) (ADF) is a cloud-based
15+
ETL and data integration service. It allows you to create data-driven workflows
16+
to move and transform data at scale.
17+
18+
Using Azure Data Factory, you can create and schedule data-driven workflows
19+
(called pipelines) that ingest data from disparate data stores. Pipelines can
20+
run on-premises, in Azure, or on other cloud providers for analytics and
21+
reporting.
22+
23+
ADF has a data sink for Hyperscale (Citus). The data sink allows you to bring
24+
your data (relational, NoSQL, data lake files) into Hyperscale (Citus) tables
25+
for storage, processing, and reporting.
26+
27+
![Dataflow diagram for Azure Data Factory.](../media/howto-hyperscale-ingestion/azure-data-factory-architecture.png)
28+
29+
## ADF for real-time ingestion to Hyperscale (Citus)
30+
31+
Here are key reasons to choose Azure Data Factory for ingesting data into
32+
Hyperscale (Citus):
33+
34+
* **Easy-to-use** - Offers a code-free visual environment for orchestrating and automating data movement.
35+
* **Powerful** - Uses the full capacity of underlying network bandwidth, up to 5 GiB/s throughput.
36+
* **Built-in Connectors** - Integrates all your data sources, with more than 90 built-in connectors.
37+
* **Cost Effective** - Supports a pay-as-you-go, fully managed serverless cloud service that scales on demand.
38+
39+
## Steps to use ADF with Hyperscale (Citus)
40+
41+
In this article, we'll create a data pipeline by using the Azure Data Factory
42+
user interface (UI). The pipeline in this data factory copies data from Azure
43+
Blob storage to a database in Hyperscale (Citus). For a list of data stores
44+
supported as sources and sinks, see the [supported data
45+
stores](../../data-factory/copy-activity-overview.md#supported-data-stores-and-formats)
46+
table.
47+
48+
In Azure Data Factory, you can use the **Copy** activity to copy data among
49+
data stores located on-premises and in the cloud to Hyperscale Citus. If you're
50+
new to Azure Data Factory, here's a quick guide on how to get started:
51+
52+
1. Once ADF is provisioned, go to your data factory. You'll see the Data
53+
Factory home page as shown in the following image:
54+
55+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-home.png" alt-text="Landing page of Azure Data Factory." border="true":::
56+
57+
2. On the home page, select **Orchestrate**.
58+
59+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-orchestrate.png" alt-text="Orchestrate page of Azure Data Factory." border="true":::
60+
61+
3. In the General panel under **Properties**, specify the desired pipeline name.
62+
63+
4. In the **Activities** toolbox, expand the **Move and Transform** category,
64+
and drag and drop the **Copy Data** activity to the pipeline designer
65+
surface. Specify the activity name.
66+
67+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-pipeline-copy.png" alt-text="Pipeline in Azure Data Factory." border="true":::
68+
69+
5. Configure **Source**
70+
71+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-configure-source.png" alt-text="Configuring Source in of Azure Data Factory." border="true":::
72+
73+
1. Go to the Source tab. Select** + New **to create a source dataset.
74+
2. In the **New Dataset** dialog box, select **Azure Blob Storage**, and then select **Continue**.
75+
3. Choose the format type of your data, and then select **Continue**.
76+
4. Under the **Linked service** text box, select **+ New**.
77+
5. Specify Linked Service name and select your storage account from the **Storage account name** list. Test connection
78+
6. Next to **File path**, select **Browse** and select the desired file from BLOB storage.
79+
7. Select **Ok** to save the configuration.
80+
81+
6. Configure **Sink**
82+
83+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-configure-sink.png" alt-text="Configuring Sink in of Azure Data Factory." border="true":::
84+
85+
1. Go to the Sink tab. Select **+ New** to create a source dataset.
86+
2. In the **New Dataset** dialog box, select **Azure Database for PostgreSQL**, and then select **Continue**.
87+
3. Under the **Linked service** text box, select **+ New**.
88+
4. Specify Linked Service name and select your server group from the list for Hyperscale (Citus) server groups. Add connection details and test the connection
89+
90+
> [!NOTE]
91+
>
92+
> If your server group is not present in the drop down, use the **Enter
93+
> manually** option to add server details.
94+
95+
5. Select the table name where you want to ingest the data.
96+
6. Specify **Write method** as COPY command.
97+
7. Select **Ok** to save the configuration.
98+
99+
7. From the toolbar above the canvas, select **Validate** to validate pipeline
100+
settings. Fix errors (if any), revalidate and ensure that the pipeline has
101+
been successfully validated.
102+
103+
8. Select Debug from the toolbar execute the pipeline.
104+
105+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-execute.png" alt-text="Debug and Execute in of Azure Data Factory." border="true":::
106+
107+
9. Once the pipeline can run successfully, in the top toolbar, select **Publish
108+
all**. This action publishes entities (datasets, and pipelines) you created
109+
to Data Factory.
110+
111+
## Calling a Stored Procedure in ADF
112+
113+
In some specific scenarios, you might want to call a stored procedure/function
114+
to push aggregated data from staging table to summary table. As of today, ADF
115+
doesn't offer Stored Procedure activity for Azure Database for Postgres, but as
116+
a workaround we can use Lookup Activity with query to call a stored procedure
117+
as shown below:
118+
119+
:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-call-procedure.png" alt-text="Calling a procedure in Azure Data Factory." border="true":::
120+
121+
## Next steps
122+
123+
Learn how to create a [real-time
124+
dashboard](tutorial-design-database-realtime.md) with Hyperscale (Citus).
29 KB
Loading
60.7 KB
Loading
61.1 KB
Loading
32.5 KB
Loading
50.9 KB
Loading
49.4 KB
Loading
116 KB
Loading
68.6 KB
Loading

0 commit comments

Comments
 (0)