You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+73-52Lines changed: 73 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
15
15
This Terraform module exports Azure cost-related data and forwards to AWS S3. The supported data sets are described below:
16
16
17
-
-**Cost Data**: Daily parquet files containing standardized cost and usage details in FOCUS format
17
+
-**Cost Data**: Daily parquet files containing standardized cost and usage details in FOCUS format; daily schedule requires an end date - defaults to 10 years from deployment but can be changed with module variable `cost_export_daily_schedule_to_years`
-**Carbon Emissions Data**: Monthly JSON reports with carbon footprint metrics across Scope 1 and Scope 3 emissions
20
20
@@ -92,7 +92,7 @@ The module creates three distinct export pipelines for each of the data sets:
92
92
1.**Daily Export**: Cost Management exports daily FOCUS-format cost data (Parquet files) to Azure Storage
93
93
2.**Event Trigger**: Blob creation events trigger the `CostExportProcessor` function via storage queue
94
94
3.**Processing**: Function processes and transforms the data (removes sensitive columns, restructures paths)
95
-
4.**Upload**: Processed data uploaded to S3 in partitioned structure: `billing_period=YYYYMMDD/`
95
+
4.**Upload**: Processed data uploaded to S3 in partitioned structure: `billing_period=YYYYMMDD/`; all billing account cost data written to the same folder each parquet object prefixed with the billing account name
96
96
97
97
#### Azure Advisor Recommendations Pipeline
98
98
1.**Daily Trigger**: `AdvisorRecommendationsExporter` function runs daily at 2 AM (timer trigger)
@@ -106,10 +106,6 @@ The module creates three distinct export pipelines for each of the data sets:
106
106
- Batches the API call per 100 subscriptions, and merges all each of the datasets into one - refer to "subscription batching" below.
107
107
- Processing: Response data formatted as JSON with dynamic date range validation (12-month rolling window)
108
108
- Upload: JSON data uploaded to S3 in partitioned structure: `billing_period=YYYYMMDD/`
109
-
-**Backfill** - called on-demand with a mandatory parameter `start-date` in the format YYYY-MM-DD, called the same API as the monthly trigger but for each month from the
110
-
given start date.
111
-
- If data is not available (see "rolling window" below), will upload a default "zero" dataset.
112
-
- Optionally, takes a parameter called `write_empty_object`, which when set to "False", skips each month with no data.
113
109
114
110
##### Carbon API Date Range Calculation
115
111
The Carbon Optimization API provides a rolling 12-month window of emissions data. The available date range is calculated dynamically based on Microsoft's data availability policy:
@@ -144,7 +140,74 @@ The Carbon Optimization API has a maximum limit of 100 subscriptions per request
144
140
- All data transfers secured with cross-cloud federation (no long-lived AWS credentials)
145
141
- Application Insights provides telemetry and monitoring for all pipelines
146
142
147
-
### Security Features
143
+
## Backfill
144
+
145
+
### FOCUS Cost Data
146
+
**Endpoint**: `POST /api/cost-export-backfill`
147
+
Can be called on-demand with a mandatory parameter `start-date` in the format YYYY-MM-DD.
148
+
149
+
The cost export has two separate lock files; one for the schedule (which creates the backfill of Cost Mgmt Export tasks for each month)
150
+
and the run (the executing of those exports) - in batches of six (half year). Lock objects are created only after successfully creating
151
+
the schedule or once a full run across all tasks has completed successfully.
152
+
153
+
To run the full backfill of tasks, simply repeatedly run this cost export backfill task. If a task is already running, it will not
154
+
interrupting the running task but it will count as one of the batch of six. It takes around 15 minutes for each task to run - and
155
+
will run concurrently.
156
+
157
+
The schedule will be created from the given backfill start date for every month up to until last month.
158
+
159
+
To remove the lock object, contact appvia support.
160
+
161
+
**Query Parameters**:
162
+
-`start_date` - the backifill start date in format YYYY-MM-DD (e.g. 2025-01-01); no default must be given
163
+
-`force_overwrite=true` - Overwrite existing data files (default: false); set `skip_existing` to False
164
+
-`skip_existing=false` - Process all months regardless of existing data (default: true)
165
+
166
+
**Examples**:
167
+
-`POST /api/cost-export-backfill` - Skip months that already have data (idempotent)
168
+
-`POST /api/cost-export-backfill?force_overwrite=true` - Overwrite all existing data
169
+
-`POST /api/cost-export-backfill?skip_existing=false` - Process all months, but don't skip if carbon export already exists
170
+
171
+
172
+
### Carbon Emissions Data
173
+
**Endpoint**: `POST /api/carbon-backfill`
174
+
Can be called on-demand with a mandatory parameter `start-date` in the format YYYY-MM-DD, called the same API as the monthly
175
+
trigger but for each month from the given start date.
176
+
177
+
Uses a "carbon export" lock object on the target S3 bucket as semaphore; the lock object exists
178
+
then Carbon data backfill is skipped. Lock object is created only once a full carbon export backfill has
179
+
completed successfully.
180
+
181
+
The Carbon Mgmt API only provides up to 12 months of archive data; where the backfill start date precedes the 12 months
182
+
it will write an empty file. The backfill will run from start date up until the month prior to current Carbon Export (note
183
+
the 19th of the month - see above).
184
+
185
+
To remove the lock object, contact appvia support.
186
+
187
+
**Query Parameters**:
188
+
-`start_date` - the backifill start date in format YYYY-MM-DD (e.g. 2025-01-01); no default must be given
189
+
-`force_overwrite=true` - Overwrite existing data files (default: false); set `skip_existing` to False
190
+
-`skip_existing=false` - Process all months regardless of existing data (default: true)
191
+
-`write_empty_object` - If no data exists for given month will write an empty export (default: true)
192
+
193
+
**Examples**:
194
+
-`POST /api/carbon-backfill` - Skip months that already have data (idempotent)
195
+
-`POST /api/carbon-backfill?force_overwrite=true` - Overwrite all existing data
196
+
-`POST /api/carbon-backfill?skip_existing=false` - Process all months, but don't skip if carbon export already exists
197
+
198
+
199
+
### Recommendations
200
+
We don't provide a backfill for this dataset.
201
+
202
+
### Backfill timer
203
+
Runs every weekday at 6AM GMT automatically run the backfill for cost exports and carbon exports; first costs then carbon.
204
+
205
+
The appvia analytics teams can delete the associated lockfile for each tenant to force re-running the backfill. And because
206
+
the Cost Export backfill will only run batches of six, it will take multiple days to export a full backfill schedule.
207
+
208
+
The backfill start date ()`backfill_start_date`) module terraform variable must be explicitly set.
209
+
210
+
## Security Features
148
211
149
212
-**Private Networking**: All components use private endpoints and VNet integration
150
213
-**Zero Trust**: No public network access (except during deployment if `deploy_from_external_network=true`)
@@ -196,52 +259,7 @@ module "example" {
196
259
> If you don't have a suitable existing Virtual Network with two subnets (one of which has a delegation to Microsoft.App.environments),
197
260
> please refer to the example configuration [here](examples/existing-infrastructure), which provisions the prerequisite baseline infrastructure before consuming the module.
198
261
199
-
## Backfill
200
-
201
-
### FOCUS Cost Data
202
-
203
-
When the terraform apply has completed, exports in each billing account should appear on the exports blade in Cost Management + Billing. Search for 'focus-backfill', multi-select reports and click 'Run now' in small batches:
> An alert will appear saying 'Failed to run one or more export (1 out of 1 failed)'. Sometimes this message appears to be wrong, other times you may need to retry some of the exports.
209
-
210
-
### Carbon Emissions Data
211
-
212
-
#### Initial Backfill
213
-
For historical carbon emissions data, use the backfill HTTP endpoint instead of running the timer function:
214
-
215
-
**Endpoint**: `POST /api/carbon-backfill`
216
-
217
-
**Query Parameters**:
218
-
-`force_overwrite=true` - Overwrite existing data files (default: false)
219
-
-`skip_existing=false` - Process all months regardless of existing data (default: true)
220
-
221
-
**Examples**:
222
-
-`POST /api/carbon-backfill` - Skip months that already have data (idempotent)
223
-
-`POST /api/carbon-backfill?force_overwrite=true` - Overwrite all existing data
224
-
-`POST /api/carbon-backfill?skip_existing=false` - Process all months, but don't overwrite existing
225
-
226
-
#### Date Range Information
227
-
Check current API availability and existing data:
228
-
229
-
**Endpoint**: `GET /api/carbon-date-range`
230
-
231
-
**Query Parameters**:
232
-
-`check_existing=true` - Also check which months already have data in S3
233
-
234
-
#### Manual Monthly Export
235
-
Run the function named 'CarbonEmissionsExporter' once. Note that you will need to temporarily configure the firewall and CORS rules to allow this (add an entry for https://portal.azure.com).
236
-
237
-
**Idempotency**: Both the timer function and backfill endpoint are idempotent - they will skip processing if data already exists for a given month.
238
-
239
-
### Recommendations
240
-
241
-
We don't provide a backfill for this dataset.
242
-
243
262
## Testing
244
-
245
263
This module includes comprehensive tests for the carbon export functionality, including dynamic date range calculations, idempotency features, and subscription batching logic.
246
264
247
265
### Running Tests Locally
@@ -316,10 +334,13 @@ The `terraform-docs` utility is used to generate this README. Follow the below s
316
334
| <aname="input_virtual_network_resource_group_name"></a> [virtual\_network\_resource\_group\_name](#input\_virtual\_network\_resource\_group\_name)| Name of the existing resource group where the virtual network is located |`string`| n/a | yes |
317
335
| <aname="input_aws_region"></a> [aws\_region](#input\_aws\_region)| AWS region for the S3 bucket |`string`|`"eu-west-2"`| no |
318
336
| <aname="input_aws_s3_bucket_name"></a> [aws\_s3\_bucket\_name](#input\_aws\_s3\_bucket\_name)| Name of the AWS S3 bucket to store cost data |`string`|`"uk-gov-gds-cost-inbound-azure"`| no |
337
+
| <aname="input_backfill_start_date"></a> [backfill\_start\_date](#input\_backfill\_start\_date)| The year and month to start backfill - nin the format 'YYYY-MM-01; defaults to 2022-01-01 |`string`|`"2022-01-01"`| no |
338
+
| <aname="input_cost_export_daily_schedule_to_years"></a> [cost\_export\_daily\_schedule\_to\_years](#input\_cost\_export\_daily\_schedule\_to\_years)| The number of years from initial deployment to set the end date of the daily schedule for cost export |`number`|`15`| no |
319
339
| <aname="input_current_principal_type"></a> [current\_principal\_type](#input\_current\_principal\_type)| Type of the current principal running Terraform. Set to 'ServicePrincipal' when running in CI/CD with a service principal, 'User' for interactive usage. |`string`|`"User"`| no |
320
340
| <aname="input_deploy_from_external_network"></a> [deploy\_from\_external\_network](#input\_deploy\_from\_external\_network)| If you don't have existing GitHub runners in the same virtual network, set this to true. This will enable 'public' access to the function app during deployment. This is added for convenience and is not recommended in production environments |`bool`|`false`| no |
321
341
| <aname="input_focus_dataset_version"></a> [focus\_dataset\_version](#input\_focus\_dataset\_version)| Version of the cost and usage details (FOCUS) dataset to use |`string`|`"1.0r2"`| no |
322
342
| <aname="input_location"></a> [location](#input\_location)| The Azure region where resources will be created |`string`|`"uksouth"`| no |
343
+
| <aname="input_logging_level"></a> [logging\_level](#input\_logging\_level)| Logging level for the app; can be DEBUG or INFO (default) |`string`|`"INFO"`| no |
Copy file name to clipboardExpand all lines: examples/basic/main.tf
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,9 @@ module "example" {
17
17
# Setting to false or omitting this argument assumes that you have private GitHub runners configured in the existing virtual network. It is not recommended to set this to true in production
18
18
deploy_from_external_network=false
19
19
20
+
backfill_start_date="2022-01-01"
21
+
logging_level="DEBUG"# INFO (default) or DEBUG
22
+
20
23
# Uncomment the following line if running in CI/CD with a service principal
0 commit comments