|
| 1 | +--- |
| 2 | +title: Azure Data Factory |
| 3 | +description: Step-by-step guide for using Azure Data Factory for ingestion on Hyperscale Citus |
| 4 | +ms.author: suvishod |
| 5 | +author: sudhanshuvishodia |
| 6 | +ms.service: postgresql |
| 7 | +ms.subservice: hyperscale-citus |
| 8 | +ms.topic: how-to |
| 9 | +ms.date: 06/27/2022 |
| 10 | +--- |
| 11 | + |
| 12 | +# How to ingest data using Azure Data Factory |
| 13 | + |
| 14 | +[Azure Data Factory](../../data-factory/introduction.md) (ADF) is a cloud-based |
| 15 | +ETL and data integration service. It allows you to create data-driven workflows |
| 16 | +to move and transform data at scale. |
| 17 | + |
| 18 | +Using Azure Data Factory, you can create and schedule data-driven workflows |
| 19 | +(called pipelines) that ingest data from disparate data stores. Pipelines can |
| 20 | +run on-premises, in Azure, or on other cloud providers for analytics and |
| 21 | +reporting. |
| 22 | + |
| 23 | +ADF has a data sink for Hyperscale (Citus). The data sink allows you to bring |
| 24 | +your data (relational, NoSQL, data lake files) into Hyperscale (Citus) tables |
| 25 | +for storage, processing, and reporting. |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +## ADF for real-time ingestion to Hyperscale (Citus) |
| 30 | + |
| 31 | +Here are key reasons to choose Azure Data Factory for ingesting data into |
| 32 | +Hyperscale (Citus): |
| 33 | + |
| 34 | +* **Easy-to-use** - Offers a code-free visual environment for orchestrating and automating data movement. |
| 35 | +* **Powerful** - Uses the full capacity of underlying network bandwidth, up to 5 GiB/s throughput. |
| 36 | +* **Built-in Connectors** - Integrates all your data sources, with more than 90 built-in connectors. |
| 37 | +* **Cost Effective** - Supports a pay-as-you-go, fully managed serverless cloud service that scales on demand. |
| 38 | + |
| 39 | +## Steps to use ADF with Hyperscale (Citus) |
| 40 | + |
| 41 | +In this article, we'll create a data pipeline by using the Azure Data Factory |
| 42 | +user interface (UI). The pipeline in this data factory copies data from Azure |
| 43 | +Blob storage to a database in Hyperscale (Citus). For a list of data stores |
| 44 | +supported as sources and sinks, see the [supported data |
| 45 | +stores](../../data-factory/copy-activity-overview.md#supported-data-stores-and-formats) |
| 46 | +table. |
| 47 | + |
| 48 | +In Azure Data Factory, you can use the **Copy** activity to copy data among |
| 49 | +data stores located on-premises and in the cloud to Hyperscale Citus. If you're |
| 50 | +new to Azure Data Factory, here's a quick guide on how to get started: |
| 51 | + |
| 52 | +1. Once ADF is provisioned, go to your data factory. You'll see the Data |
| 53 | + Factory home page as shown in the following image: |
| 54 | + |
| 55 | + :::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-home.png" alt-text="Landing page of Azure Data Factory." border="true"::: |
| 56 | + |
| 57 | +2. On the home page, select **Orchestrate**. |
| 58 | + |
| 59 | + :::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-orchestrate.png" alt-text="Orchestrate page of Azure Data Factory." border="true"::: |
| 60 | + |
| 61 | +3. In the General panel under **Properties**, specify the desired pipeline name. |
| 62 | + |
| 63 | +4. In the **Activities** toolbox, expand the **Move and Transform** category, |
| 64 | + and drag and drop the **Copy Data** activity to the pipeline designer |
| 65 | + surface. Specify the activity name. |
| 66 | + |
| 67 | + :::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-pipeline-copy.png" alt-text="Pipeline in Azure Data Factory." border="true"::: |
| 68 | + |
| 69 | +5. Configure **Source** |
| 70 | + |
| 71 | + :::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-configure-source.png" alt-text="Configuring Source in of Azure Data Factory." border="true"::: |
| 72 | + |
| 73 | + 1. Go to the Source tab. Select** + New **to create a source dataset. |
| 74 | + 2. In the **New Dataset** dialog box, select **Azure Blob Storage**, and then select **Continue**. |
| 75 | + 3. Choose the format type of your data, and then select **Continue**. |
| 76 | + 4. Under the **Linked service** text box, select **+ New**. |
| 77 | + 5. Specify Linked Service name and select your storage account from the **Storage account name** list. Test connection |
| 78 | + 6. Next to **File path**, select **Browse** and select the desired file from BLOB storage. |
| 79 | + 7. Select **Ok** to save the configuration. |
| 80 | + |
| 81 | +6. Configure **Sink** |
| 82 | + |
| 83 | + :::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-configure-sink.png" alt-text="Configuring Sink in of Azure Data Factory." border="true"::: |
| 84 | + |
| 85 | + 1. Go to the Sink tab. Select **+ New** to create a source dataset. |
| 86 | + 2. In the **New Dataset** dialog box, select **Azure Database for PostgreSQL**, and then select **Continue**. |
| 87 | + 3. Under the **Linked service** text box, select **+ New**. |
| 88 | + 4. Specify Linked Service name and select your server group from the list for Hyperscale (Citus) server groups. Add connection details and test the connection |
| 89 | + |
| 90 | + > [!NOTE] |
| 91 | + > |
| 92 | + > If your server group is not present in the drop down, use the **Enter |
| 93 | + > manually** option to add server details. |
| 94 | +
|
| 95 | + 5. Select the table name where you want to ingest the data. |
| 96 | + 6. Specify **Write method** as COPY command. |
| 97 | + 7. Select **Ok** to save the configuration. |
| 98 | + |
| 99 | +7. From the toolbar above the canvas, select **Validate** to validate pipeline |
| 100 | + settings. Fix errors (if any), revalidate and ensure that the pipeline has |
| 101 | + been successfully validated. |
| 102 | + |
| 103 | +8. Select Debug from the toolbar execute the pipeline. |
| 104 | + |
| 105 | + :::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-execute.png" alt-text="Debug and Execute in of Azure Data Factory." border="true"::: |
| 106 | + |
| 107 | +9. Once the pipeline can run successfully, in the top toolbar, select **Publish |
| 108 | + all**. This action publishes entities (datasets, and pipelines) you created |
| 109 | + to Data Factory. |
| 110 | + |
| 111 | +## Calling a Stored Procedure in ADF |
| 112 | + |
| 113 | +In some specific scenarios, you might want to call a stored procedure/function |
| 114 | +to push aggregated data from staging table to summary table. As of today, ADF |
| 115 | +doesn't offer Stored Procedure activity for Azure Database for Postgres, but as |
| 116 | +a workaround we can use Lookup Activity with query to call a stored procedure |
| 117 | +as shown below: |
| 118 | + |
| 119 | +:::image type="content" source="../media/howto-hyperscale-ingestion/azure-data-factory-call-procedure.png" alt-text="Calling a procedure in Azure Data Factory." border="true"::: |
| 120 | + |
| 121 | +## Next steps |
| 122 | + |
| 123 | +Learn how to create a [real-time |
| 124 | +dashboard](tutorial-design-database-realtime.md) with Hyperscale (Citus). |
0 commit comments