Skip to content

Commit 5a6debf

Browse files
authored
feat: Add Azure Data Factory module (#397)
* Add ADF module * address code review comments * go lint fix * format fix * upgrade azurerm to 2.9
1 parent 4a15d8d commit 5a6debf

File tree

15 files changed

+805
-0
lines changed

15 files changed

+805
-0
lines changed
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Data Factory
2+
3+
This Terrafom based data-factory module grants templates the ability to create Data factory instance along with the main components.
4+
5+
## _More on Data Factory_
6+
7+
Azure Data Factory is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
8+
9+
Additionally, you can publish your transformed data to data stores such as Azure SQL Data Warehouse for business intelligence (BI) applications to consume. Ultimately, through Azure Data Factory, raw data can be organized into meaningful data stores and data lakes for better business decisions.
10+
11+
For more information, Please check Microsoft Azure Data Factory [Documentation](https://docs.microsoft.com/en-us/azure/data-factory/introduction).
12+
13+
## Characteristics
14+
15+
An instance of the `data-factory` module deploys the _**Data Factory**_ in order to provide templates with the following:
16+
17+
- Ability to provision a single Data Factory instance
18+
- Ability to provision a configurable Pipeline
19+
- Ability to configure Trigger
20+
- Ability to configure SQL server Dataset
21+
- Ability to configure SQL server Linked Service
22+
23+
## Out Of Scope
24+
25+
The following are not support in the time being
26+
27+
- Creating Multiple pipelines
28+
- Only SQL server Dataset/Linked Service are implemented.
29+
30+
## Definition
31+
32+
Terraform resources used to define the `data-factory` module include the following:
33+
34+
- [azurerm_data_factory](https://www.terraform.io/docs/providers/azurerm/r/data_factory.html)
35+
- [azurerm_data_factory_integration_runtime_managed](https://www.terraform.io/docs/providers/azurerm/r/data_factory_integration_runtime_managed.html)
36+
- [azurerm_data_factory_pipeline](https://www.terraform.io/docs/providers/azurerm/r/data_factory_pipeline.html)
37+
- [azurerm_data_factory_trigger_schedule](https://www.terraform.io/docs/providers/azurerm/r/data_factory_trigger_schedule.html)
38+
- [azurerm_data_factory_dataset_sql_server](https://www.terraform.io/docs/providers/azurerm/r/data_factory_dataset_sql_server_table.html)
39+
- [azurerm_data_factory_linked_service_sql_server](https://www.terraform.io/docs/providers/azurerm/r/data_factory_linked_service_sql_server.html)
40+
41+
## Usage
42+
43+
Data Factory usage example:
44+
45+
``` yaml
46+
module "data_factory" {
47+
source = "../../modules/providers/azure/data-factory"
48+
data_factory_name = "adf"
49+
resource_group_name = "rg"
50+
data_factory_runtime_name = "adfrt"
51+
node_size = "Standard_D2_v3"
52+
number_of_nodes = 1
53+
edition = "Standard"
54+
max_parallel_executions_per_node = 1
55+
vnet_integration = {
56+
vnet_id = "/subscriptions/resourceGroups/providers/Microsoft.Network/virtualNetworks/testvnet"
57+
subnet_name = "default"
58+
}
59+
data_factory_pipeline_name = "adfpipeline"
60+
data_factory_trigger_name = "adftrigger"
61+
data_factory_trigger_interval = 1
62+
data_factory_trigger_frequency = "Minute"
63+
data_factory_dataset_sql_name = "adfsqldataset"
64+
data_factory_dataset_sql_table_name = "adfsqldatasettable"
65+
data_factory_dataset_sql_folder = ""
66+
data_factory_linked_sql_name = "adfsqllinked"
67+
data_factory_linked_sql_connection_string = "Server=tcp:adfsql..."
68+
}
69+
```
70+
71+
## Outputs
72+
73+
The output values for this module are available in [output.tf](output.tf)
74+
75+
76+
## Argument Reference
77+
78+
Supported arguments for this module are available in [variables.tf](variables.tf)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
resource "azurerm_data_factory_dataset_sql_server_table" "main" {
3+
name = var.data_factory_dataset_sql_name
4+
resource_group_name = data.azurerm_resource_group.main.name
5+
data_factory_name = azurerm_data_factory.main.name
6+
linked_service_name = azurerm_data_factory_linked_service_sql_server.main.name
7+
table_name = var.data_factory_dataset_sql_table_name
8+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
resource "azurerm_data_factory_linked_service_sql_server" "main" {
3+
name = var.data_factory_linked_sql_name
4+
resource_group_name = data.azurerm_resource_group.main.name
5+
data_factory_name = azurerm_data_factory.main.name
6+
connection_string = var.data_factory_linked_sql_connection_string
7+
integration_runtime_name = azurerm_data_factory_integration_runtime_managed.main.name
8+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
module "azure-provider" {
2+
source = "../provider"
3+
}
4+
5+
data "azurerm_resource_group" "main" {
6+
name = var.resource_group_name
7+
}
8+
9+
resource "azurerm_data_factory" "main" {
10+
#required
11+
name = var.data_factory_name
12+
resource_group_name = data.azurerm_resource_group.main.name
13+
location = data.azurerm_resource_group.main.location
14+
15+
# This will be static as "SystemAssigned" is the only identity available now
16+
identity {
17+
type = "SystemAssigned"
18+
}
19+
}
20+
21+
resource "azurerm_data_factory_integration_runtime_managed" "main" {
22+
name = var.data_factory_runtime_name
23+
data_factory_name = azurerm_data_factory.main.name
24+
resource_group_name = data.azurerm_resource_group.main.name
25+
location = data.azurerm_resource_group.main.location
26+
node_size = var.node_size
27+
number_of_nodes = var.number_of_nodes
28+
edition = var.edition
29+
max_parallel_executions_per_node = var.max_parallel_executions_per_node
30+
31+
vnet_integration {
32+
vnet_id = var.vnet_integration.vnet_id
33+
subnet_name = var.vnet_integration.subnet_name
34+
}
35+
}
36+
37+
resource "azurerm_data_factory_pipeline" "main" {
38+
name = var.data_factory_pipeline_name
39+
resource_group_name = data.azurerm_resource_group.main.name
40+
data_factory_name = azurerm_data_factory.main.name
41+
}
42+
43+
resource "azurerm_data_factory_trigger_schedule" "main" {
44+
name = var.data_factory_trigger_name
45+
data_factory_name = azurerm_data_factory.main.name
46+
resource_group_name = data.azurerm_resource_group.main.name
47+
pipeline_name = azurerm_data_factory_pipeline.main.name
48+
49+
interval = var.data_factory_trigger_interval
50+
frequency = var.data_factory_trigger_frequency
51+
}
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
output "resource_group_name" {
2+
description = "The resource group name of the Service Bus namespace."
3+
value = data.azurerm_resource_group.main.name
4+
}
5+
6+
output "data_factory_name" {
7+
description = "The name of the Azure Data Factory created"
8+
value = azurerm_data_factory.main.name
9+
}
10+
11+
output "data_factory_id" {
12+
description = "The ID of the Azure Data Factory created"
13+
value = azurerm_data_factory.main.id
14+
}
15+
16+
output "identity_principal_id" {
17+
description = "The ID of the principal(client) in Azure active directory"
18+
value = azurerm_data_factory.main.identity[0].principal_id
19+
}
20+
21+
output "pipeline_name" {
22+
description = "the name of the pipeline created"
23+
value = azurerm_data_factory_pipeline.main.name
24+
}
25+
26+
output "trigger_interval" {
27+
description = "the trigger interval time for the pipeline created"
28+
value = azurerm_data_factory_trigger_schedule.main.interval
29+
}
30+
31+
output "sql_dataset_id" {
32+
description = "The ID of the SQL server dataset created"
33+
value = azurerm_data_factory_dataset_sql_server_table.main.id
34+
}
35+
36+
output "sql_linked_service_id" {
37+
description = "The ID of the SQL server Linked service created"
38+
value = azurerm_data_factory_linked_service_sql_server.main.id
39+
}
40+
41+
output "adf_identity_principal_id" {
42+
description = "The ID of the principal(client) in Azure active directory"
43+
value = azurerm_data_factory.main.identity[0].principal_id
44+
}
45+
46+
output "adf_identity_tenant_id" {
47+
description = "The Tenant ID for the Service Principal associated with the Managed Service Identity of this App Service."
48+
value = azurerm_data_factory.main.identity[0].tenant_id
49+
}
50+
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
resource_group_name = ""
2+
data_factory_name = ""
3+
data_factory_runtime_name = ""
4+
data_factory_pipeline_name = ""
5+
data_factory_dataset_sql_name = ""
6+
data_factory_dataset_sql_table_name = ""
7+
data_factory_linked_sql_name = ""
8+
data_factory_linked_sql_connection_string = ""
9+
vnet_integration = {
10+
vnet_id = ""
11+
subnet_name = ""
12+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
RESOURCE_GROUP_NAME="..."
2+
STORAGE_ACCOUNT_NAME="..."
3+
CONTAINER_NAME="..."
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
package integraton
2+
3+
import (
4+
"os"
5+
"testing"
6+
7+
"github.com/microsoft/cobalt/infra/modules/providers/azure/data-factory/tests"
8+
"github.com/microsoft/terratest-abstraction/integration"
9+
)
10+
11+
var subscription = os.Getenv("ARM_SUBSCRIPTION_ID")
12+
13+
func TestDataFactory(t *testing.T) {
14+
testFixture := integration.IntegrationTestFixture{
15+
GoTest: t,
16+
TfOptions: tests.DataFactoryTFOptions,
17+
ExpectedTfOutputCount: 8,
18+
TfOutputAssertions: []integration.TerraformOutputValidation{
19+
VerifyCreatedDataFactory(subscription,
20+
"resource_group_name",
21+
"data_factory_name",
22+
),
23+
VerifyCreatedPipeline(subscription,
24+
"resource_group_name",
25+
"data_factory_name",
26+
"pipeline_name",
27+
),
28+
VerifyCreatedDataset(subscription,
29+
"resource_group_name",
30+
"data_factory_name",
31+
"sql_dataset_id",
32+
),
33+
VerifyCreatedLinkedService(subscription,
34+
"resource_group_name",
35+
"data_factory_name",
36+
"sql_linked_service_id",
37+
),
38+
},
39+
}
40+
integration.RunIntegrationTests(&testFixture)
41+
}
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
package integraton
2+
3+
import (
4+
"testing"
5+
6+
"github.com/microsoft/cobalt/test-harness/terratest-extensions/modules/azure"
7+
"github.com/microsoft/terratest-abstraction/integration"
8+
"github.com/stretchr/testify/require"
9+
)
10+
11+
// healthCheck - Asserts that the deployment was successful.
12+
func healthCheck(t *testing.T, provisionState *string) {
13+
require.Equal(t, "Succeeded", *provisionState, "The deployment hasn't succeeded.")
14+
}
15+
16+
// VerifyCreatedDataFactory - validate the created data factory
17+
func VerifyCreatedDataFactory(subscriptionID, resourceGroupOutputName, dataFactoryOutputName string) func(goTest *testing.T, output integration.TerraformOutput) {
18+
return func(goTest *testing.T, output integration.TerraformOutput) {
19+
20+
dataFactory := output[dataFactoryOutputName].(string)
21+
resourceGroup := output[resourceGroupOutputName].(string)
22+
23+
dataFactoryNameFromAzure := azure.GetDataFactoryNameByResourceGroup(
24+
goTest,
25+
subscriptionID,
26+
resourceGroup)
27+
28+
require.Equal(goTest, dataFactoryNameFromAzure, dataFactory, "The data factory does not exist")
29+
}
30+
}
31+
32+
// VerifyCreatedPipeline - validate the pipeline name for the created data factory
33+
func VerifyCreatedPipeline(subscriptionID, resourceGroupOutputName, dataFactoryOutputName, pipelineOutputName string) func(goTest *testing.T, output integration.TerraformOutput) {
34+
return func(goTest *testing.T, output integration.TerraformOutput) {
35+
pipelineNameFromOutput := output[pipelineOutputName].(string)
36+
37+
dataFactory := output[dataFactoryOutputName].(string)
38+
resourceGroup := output[resourceGroupOutputName].(string)
39+
40+
pipelineNameFromAzure := azure.GetPipeLineNameByDataFactory(
41+
goTest,
42+
subscriptionID,
43+
resourceGroup,
44+
dataFactory)
45+
46+
require.Equal(goTest, pipelineNameFromAzure, pipelineNameFromOutput, "The pipeline does not exist in the data factory")
47+
}
48+
}
49+
50+
// VerifyCreatedDataset - validate the SQL dataset for the created pipeline
51+
func VerifyCreatedDataset(subscriptionID, resourceGroupOutputName, dataFactoryOutputName, datasetOutputID string) func(goTest *testing.T, output integration.TerraformOutput) {
52+
return func(goTest *testing.T, output integration.TerraformOutput) {
53+
datasetIDFromOutput := output[datasetOutputID].(string)
54+
55+
dataFactory := output[dataFactoryOutputName].(string)
56+
resourceGroup := output[resourceGroupOutputName].(string)
57+
58+
datasetIDFromAzure := azure.ListDatasetIDByDataFactory(goTest,
59+
subscriptionID,
60+
resourceGroup,
61+
dataFactory)
62+
63+
require.Contains(goTest, *datasetIDFromAzure, datasetIDFromOutput, "The dataset does not exist")
64+
}
65+
}
66+
67+
// VerifyCreatedLinkedService - validate the SQL dataset for the created pipeline
68+
func VerifyCreatedLinkedService(subscriptionID, resourceGroupOutputName, dataFactoryOutputName, linkedServiceIDOutputName string) func(goTest *testing.T, output integration.TerraformOutput) {
69+
return func(goTest *testing.T, output integration.TerraformOutput) {
70+
linkedServiceIDFromOutput := output[linkedServiceIDOutputName].(string)
71+
72+
dataFactory := output[dataFactoryOutputName].(string)
73+
resourceGroup := output[resourceGroupOutputName].(string)
74+
75+
linkedServiceIDFromAzure := azure.ListLinkedServicesIDByDataFactory(goTest,
76+
subscriptionID,
77+
resourceGroup,
78+
dataFactory)
79+
80+
require.Contains(goTest, *linkedServiceIDFromAzure, linkedServiceIDFromOutput, "The Linked Servicee does not exist")
81+
}
82+
}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
resource_group_name = "adftest"
2+
data_factory_name = "adftest"
3+
data_factory_runtime_name = "adfrttest"
4+
data_factory_pipeline_name = "testpipeline"
5+
data_factory_trigger_name = "testtrigger"
6+
data_factory_dataset_sql_name = "testsql"
7+
data_factory_dataset_sql_table_name = "adfsqltableheba"
8+
data_factory_linked_sql_name = "testlinkedsql"
9+
data_factory_linked_sql_connection_string = "connectionstring"
10+
vnet_integration = {
11+
vnet_id = "/subscriptions/resourceGroups/providers/Microsoft.Network/virtualNetworks/testvnet"
12+
subnet_name = "default"
13+
}

0 commit comments

Comments
 (0)