Skip to content

Commit 1f19ff4

Browse files
authored
Merge pull request #229147 from ShawnJackson/troubleshoot-manifest-ingestion
[AQ] edit pass: troubleshoot-manifest-ingestion
2 parents 7685221 + c64655a commit 1f19ff4

File tree

1 file changed

+121
-91
lines changed

1 file changed

+121
-91
lines changed
Lines changed: 121 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,84 @@
11
---
2-
title: Troubleshoot manifest ingestion in Microsoft Azure Data Manager for Energy Preview #Required; this page title is displayed in search results; Always include the word "troubleshoot" in this line.
3-
description: Find out how to troubleshoot manifest ingestion using Airflow task logs #Required; this article description is displayed in search results.
4-
author: bharathim #Required; your GitHub user alias — correct capitalization is needed.
5-
ms.author: bselvaraj #Required; Microsoft alias of the author.
6-
ms.service: energy-data-services #Required; service per approved list. slug assigned by ACOM.
7-
ms.topic: troubleshooting-general #Required; leave this attribute/value as-is.
2+
title: Troubleshoot manifest ingestion in Microsoft Azure Data Manager for Energy Preview
3+
description: Find out how to troubleshoot manifest ingestion by using Airflow task logs.
4+
author: bharathim
5+
ms.author: bselvaraj
6+
ms.service: energy-data-services
7+
ms.topic: troubleshooting-general
88
ms.date: 02/06/2023
99
---
1010

11-
# Troubleshoot manifest ingestion issues using Airflow task logs
12-
This article helps you troubleshoot manifest ingestion workflow issues in Azure Data Manager for Energy Preview instance using the Airflow task logs.
11+
# Troubleshoot manifest ingestion problems by using Airflow task logs
12+
13+
This article helps you troubleshoot workflow problems with manifest ingestion in Azure Data Manager for Energy Preview by using Airflow task logs.
1314

1415
## Manifest ingestion DAG workflow types
15-
The Manifest ingestion workflow is of two types:
16-
- Single manifest
17-
- Batch upload
16+
17+
There are two types of directed acyclic graph (DAG) workflows for manifest ingestion: single manifest and batch upload.
1818

1919
### Single manifest
20+
2021
One single manifest file is used to trigger the manifest ingestion workflow.
2122

22-
|DagTaskName |Description |
23+
|DagTaskName value |Description |
2324
|---------|---------|
24-
|**Update_status_running_task** | Calls Workflow service and marks the status of DAG as running in the database |
25-
|**Check_payload_type** | Validates whether the ingestion is of batch type or single manifest|
26-
|**Validate_manifest_schema_task** | Ensures all the schema kinds mentioned in the manifest are present and there's referential schema integrity. All invalid values will be evicted from the manifest |
27-
|**Provide_manifest_intergrity_task** | Validates references inside the OSDU™ R3 manifest and removes invalid entities. This operator is responsible for parent-child validation. All orphan-like entities will be logged and excluded from the validated manifest. Any external referenced records will be searched and in case not found, the manifest entity will be dropped. All surrogate key references are also resolved |
28-
|**Process_single_manifest_file_task** | Performs ingestion of the final obtained manifest entities from the previous step, data records will be ingested via the storage service |
29-
|**Update_status_finished_task** | Calls workflow service and marks the status of DAG as `finished` or `failed` in the database |
25+
|`update_status_running_task` | Calls the workflow service and marks the status of the DAG as `running` in the database. |
26+
|`check_payload_type` | Validates whether the type of ingestion is batch or single manifest.|
27+
|`validate_manifest_schema_task` | Ensures that all the schema types mentioned in the manifest are present and there's referential schema integrity. All invalid values are evicted from the manifest. |
28+
|`provide_manifest_intergrity_task` | Validates references inside the OSDU™ R3 manifest and removes invalid entities. This operator is responsible for parent/child validation. All orphan-like entities are logged and excluded from the validated manifest. Any external referenced records are searched. If none are found, the manifest entity is dropped. All surrogate key references are also resolved. |
29+
|`process_single_manifest_file_task` | Performs ingestion of the final manifest entities obtained from the previous step. Data records are ingested via the storage service. |
30+
|`update_status_finished_task` | Calls the workflow service and marks the status of the DAG as `finished` or `failed` in the database. |
3031

3132
### Batch upload
32-
Multiple manifest files are part of the same workflow service request, that is, the manifest section in the request payload is a list instead of a dictionary of items.
3333

34-
|DagTaskName |Description |
34+
Multiple manifest files are part of the same workflow service request. The manifest section in the request payload is a list instead of a dictionary of items.
35+
36+
|DagTaskName value |Description |
3537
|---------|---------|
36-
|**Update_status_running_task** | Calls Workflow service and marks the status of DAG as running in the database |
37-
|**Check_payload_type** | Validates whether the ingestion is of batch type or single manifest|
38-
|**Batch_upload** | List of manifests are divided into three batches to be processed in parallel (no task logs are emitted) |
39-
|**Process_manifest_task_(1 / 2 / 3)** | List of manifests is divided into groups of three and processed by these tasks. All the steps performed in Validate_manifest_schema_task, Provide_manifest_intergrity_task, Process_single_manifest_file_task are condensed and performed sequentially in these tasks |
40-
|**Update_status_finished_task** | Calls workflow service and marks the status of DAG as `finished` or `failed` in the database |
41-
42-
Based on the payload type (single or batch), `check_payload_type` task will pick the appropriate branch and the tasks in the other branch will be skipped.
38+
|`update_status_running_task` | Calls the workflow service and marks the status of the DAG as `running` in the database. |
39+
|`check_payload_type` | Validates whether the type of ingestion is batch or single manifest.|
40+
|`batch_upload` | Divides the list of manifests into three batches to be processed in parallel. (No task logs are emitted.) |
41+
|`process_manifest_task_(1 / 2 / 3)` | Divides the list of manifests into groups of three and processes them. All the steps performed in `validate_manifest_schema_task`, `provide_manifest_intergrity_task`, and `process_single_manifest_file_task` are condensed and performed sequentially in these tasks. |
42+
|`update_status_finished_task` | Calls the workflow service and marks the status of the DAG as `finished` or `failed` in the database. |
4343

44+
Based on the payload type (single or batch), the `check_payload_type` task chooses the appropriate branch and skips the tasks in the other branch.
4445

4546
## Prerequisites
46-
You should have integrated airflow task logs with Azure monitor. See [Integrate airflow logs with Azure Monitor](how-to-integrate-airflow-logs-with-azure-monitor.md)
4747

48-
Following columns are exposed in Airflow Task Logs for you to debug the issue:
48+
You should have integrated Airflow task logs with Azure Monitor. See [Integrate Airflow logs with Azure Monitor](how-to-integrate-airflow-logs-with-azure-monitor.md).
4949

50-
|Parameter Name |Description |
50+
The following columns are exposed in Airflow task logs for you to debug the problem:
51+
52+
|Parameter name |Description |
5153
|---------|---------|
52-
|**Run Id** | Unique run ID of the DAG run, which was triggered |
53-
|**Correlation ID** | Unique correlation ID of the DAG run (same as run ID) |
54-
|**DagName** | DAG workflow name. For instance, `Osdu_ingest` for manifest ingestion |
55-
|**DagTaskName** | DAG workflow task name. For instance, `Update_status_running_task` for manifest ingestion |
56-
|**Content** | Contains error log messages (errors/exceptions) emitted by Airflow during the task execution|
57-
|**LogTimeStamp** | Captures the time interval of DAG runs |
58-
|**LogLevel** | DEBUG/INFO/WARNING/ERROR. Mostly all exception and error messages can be seen by filtering at ERROR level |
54+
|`RunID` | Unique run ID of the triggered DAG run. |
55+
|`CorrelationID` | Unique correlation ID of the DAG run (same as the run ID). |
56+
|`DagName` | DAG workflow name. For instance, `Osdu_ingest` is the workflow name for manifest ingestion. |
57+
|`DagTaskName` | Task name for the DAG workflow. For instance, `update_status_running_task` is the task name for manifest ingestion. |
58+
|`Content` | Error log messages (errors or exceptions) that Airflow emits during the task execution.|
59+
|`LogTimeStamp` | Time interval of DAG runs. |
60+
|`LogLevel` | Level of the error. Values are `DEBUG`, `INFO`, `WARNING`, and `ERROR`. You can see most exception and error messages by filtering at the `ERROR` level. |
61+
62+
## Failed DAG run
63+
64+
The workflow run failed in `Update_status_running_task` or `Update_status_finished_task`, and the data records weren't ingested.
65+
66+
### Possible reasons
5967

68+
* The data partition ID is incorrect.
69+
* A key name in the execution context of the request body is incorrect.
70+
* The workflow service isn't running or is throwing 5xx errors.
6071

61-
## Cause 1: A DAG run has failed in the Update_status_running_task or Update_status_finished_task
62-
The workflow run has failed and the data records weren't ingested.
72+
### Workflow status
6373

64-
**Possible reasons**
65-
* Provided incorrect data partition ID
66-
* Provided incorrect key name in the execution context of the request body
67-
* Workflow service isn't running or throwing 5xx errors
74+
The workflow status is marked as `failed`.
6875

69-
**Workflow status**
70-
* Workflow status is marked as `failed`.
76+
### Solution
7177

72-
### Solution: Check the airflow task logs for `update_status_running_task` or `update_status_finished_task`. Fix the payload (pass the correct data partition ID or key name)
78+
Check the Airflow task logs for `update_status_running_task` or `update_status_finished_task`. Fix the payload by passing the correct data partition ID or key name.
79+
80+
Sample Kusto query:
7381

74-
**Sample Kusto query**
7582
```kusto
7683
AirflowTaskLogs
7784
| where DagName == "Osdu_ingest"
@@ -80,7 +87,8 @@ The workflow run has failed and the data records weren't ingested.
8087
| where RunID == '<run_id>'
8188
```
8289

83-
**Sample trace output**
90+
Sample trace output:
91+
8492
```md
8593
[2023-02-05, 12:21:54 IST] {taskinstance.py:1703} ERROR - Task failed with exception
8694
Traceback (most recent call last):
@@ -91,21 +99,27 @@ The workflow run has failed and the data records weren't ingested.
9199
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://contoso.energy.azure.com/api/workflow/v1/workflow/Osdu_ingest/workflowRun/e9a815f2-84f5-4513-9825-4d37ab291264
92100
```
93101

94-
## Cause 2: Schema validation failures
95-
Records weren't ingested due to schema validation failures.
102+
## Failed schema validation
103+
104+
Records weren't ingested because schema validation failed.
105+
106+
### Possible reasons
96107

97-
**Possible reasons**
98-
* Schema not found errors
99-
* Manifest body not conforming to the schema kind
100-
* Incorrect schema references
101-
* Schema service throwing 5xx errors
108+
* The schema service is throwing "Schema not found" errors.
109+
* The manifest body doesn't conform to the schema type.
110+
* The schema references are incorrect.
111+
* The schema service is throwing 5xx errors.
102112

103-
**Workflow Status**
104-
* Workflow status is marked as `finished`. No failure in the workflow status will be observed because the invalid entities are skipped and the ingestion is continued.
113+
### Workflow status
105114

106-
### Solution: Check the airflow task logs for `validate_manifest_schema_task` or `process_manifest_task`. Fix the payload (pass the correct data partition ID or key name)
115+
The workflow status is marked as `finished`. You don't observe a failure in the workflow status because the invalid entities are skipped and the ingestion is continued.
116+
117+
### Solution
118+
119+
Check the Airflow task logs for `validate_manifest_schema_task` or `process_manifest_task`. Fix the payload by passing the correct data partition ID or key name.
120+
121+
Sample Kusto query:
107122

108-
**Sample Kusto query**
109123
```kusto
110124
AirflowTaskLogs
111125
| where DagName has "Osdu_ingest"
@@ -115,7 +129,8 @@ Records weren't ingested due to schema validation failures.
115129
| order by ['time'] asc
116130
```
117131

118-
**Sample trace output**
132+
Sample trace output:
133+
119134
```md
120135
Error traces to look out for
121136
[2023-02-05, 14:55:37 IST] {connectionpool.py:452} DEBUG - https://contoso.energy.azure.com:443 "GET /api/schema-service/v1/schema/osdu:wks:work-product-component--WellLog:2.2.0 HTTP/1.1" 404 None
@@ -139,20 +154,26 @@ Records weren't ingested due to schema validation failures.
139154
'string-value'
140155
```
141156

142-
## Cause 3: Failed reference checks
143-
Records weren't ingested due to failed reference checks.
157+
## Failed reference checks
144158

145-
**Possible reasons**
146-
* Failed to find referenced records
147-
* Parent records not found
148-
* Search service throwing 5xx errors
159+
Records weren't ingested because reference checks failed.
160+
161+
### Possible reasons
162+
163+
* Referenced records weren't found.
164+
* Parent records weren't found.
165+
* The search service is throwing 5xx errors.
149166

150-
**Workflow Status**
151-
* Workflow status is marked as `finished`. No failure in the workflow status will be observed because the invalid entities are skipped and the ingestion is continued.
167+
### Workflow status
168+
169+
The workflow status is marked as `finished`. You don't observe a failure in the workflow status because the invalid entities are skipped and the ingestion is continued.
170+
171+
### Solution
152172

153-
### Solution: Check the airflow task logs for `provide_manifest_integrity_task` or `process_manifest_task`.
173+
Check the Airflow task logs for `provide_manifest_integrity_task` or `process_manifest_task`.
174+
175+
Sample Kusto query:
154176

155-
**Sample Kusto query**
156177
```kusto
157178
AirflowTaskLogs
158179
| where DagName has "Osdu_ingest"
@@ -161,34 +182,40 @@ Records weren't ingested due to failed reference checks.
161182
| where RunID has "<run_id>"
162183
```
163184

164-
**Sample trace output**
165-
Since there are no such error logs specifically for referential integrity tasks, you should watch out for the debug log statements to see whether all external records were fetched using the search service.
185+
Because there are no error logs specifically for referential integrity tasks, check the debug log statements to see whether all external records were fetched via the search service.
186+
187+
For instance, the following sample trace output shows a record queried via the search service for referential integrity:
166188

167-
For instance, the output shows record queried using the Search service for referential integrity
168189
```md
169190
[2023-02-05, 19:14:40 IST] {search_record_ids.py:75} DEBUG - Search query "contoso-dp1:work-product-component--WellLog:5ab388ae0e140838c297f0e6559" OR "contoso-dp1:work-product-component--WellLog:5ab388ae0e1b40838c297f0e6559" OR "contoso-dp1:work-product-component--WellLog:5ab388ae0e1b40838c297f0e6559758a"
170191
```
171-
The records that were retrieved and were in the system are shown in the output. The related manifest object that referenced a record would be dropped and no longer be ingested if we noticed that some of the records weren't present.
192+
193+
The output shows the records that were retrieved and were in the system. The related manifest object that referenced a record would be dropped and no longer be ingested if you noticed that some of the records weren't present.
172194

173195
```md
174196
[2023-02-05, 19:14:40 IST] {search_record_ids.py:141} DEBUG - response ids: ['contoso-dp1:work-product-component--WellLog:5ab388ae0e1b40838c297f0e6559758a:1675590506723615', 'contoso-dp1:work-product-component--WellLog:5ab388ae0e1b40838c297f0e6559758a ']
175197
```
176-
In the coming release, we plan to enhance the logs by appropriately logging skipped records with reasons
177198

178-
## Cause 4: Invalid Legal Tags/ACLs in manifest
179-
Records weren't ingested due to invalid legal tags or ACLs present in the manifest.
199+
## Invalid legal tags or ACLs in the manifest
200+
201+
Records weren't ingested because the manifest contains invalid legal tags or access control lists (ACLs).
180202

181-
**Possible reasons**
182-
* Incorrect ACLs
183-
* Incorrect legal tags
184-
* Storage service throws 5xx errors
203+
### Possible reasons
204+
205+
* ACLs are incorrect.
206+
* Legal tags are incorrect.
207+
* The storage service is throwing 5xx errors.
185208

186-
**Workflow Status**
187-
* Workflow status is marked as `finished`. No failure in the workflow status will be observed.
209+
### Workflow status
210+
211+
The workflow status is marked as `finished`. You don't observe a failure in the workflow status.
212+
213+
### Solution
214+
215+
Check the Airflow task logs for `process_single_manifest_file_task` or `process_manifest_task`.
188216

189-
### Solution: Check the airflow task logs for `process_single_manifest_file_task` or `process_manifest_task`.
217+
Sample Kusto query:
190218

191-
**Sample Kusto query**
192219
```kusto
193220
AirflowTaskLogs
194221
| where DagName has "Osdu_ingest"
@@ -198,14 +225,15 @@ Records weren't ingested due to invalid legal tags or ACLs present in the manife
198225
| order by ['time'] asc
199226
```
200227

201-
**Sample trace output**
228+
Sample trace output:
202229

203230
```md
204231
"PUT /api/storage/v2/records HTTP/1.1" 400 None
205232
[2023-02-05, 16:57:05 IST] {authorization.py:137} ERROR - {"code":400,"reason":"Invalid legal tags","message":"Invalid legal tags: contoso-dp1-R3FullManifest-Legal-Tag-Test779759112"}
206233

207234
```
208-
and the output indicates records that were retrieved. Manifest entity records corresponding to missing search records will get dropped and not ingested.
235+
236+
The output indicates records that were retrieved. Manifest entity records that correspond to missing search records are dropped and not ingested.
209237

210238
```md
211239
"PUT /api/storage/v2/records HTTP/1.1" 400 None
@@ -214,15 +242,17 @@ and the output indicates records that were retrieved. Manifest entity records co
214242
```
215243

216244
## Known issues
217-
- Exception traces weren't exporting with Airflow Task Logs due to a known problem in the logs; the patch has been submitted and will be included in the February release.
218-
- Since there are no specific error logs for referential integrity tasks, you must manually search for the debug log statements to see whether all external records were retrieved via the search service. We intend to improve the logs in the upcoming release by properly logging skipped data with justifications.
219245

246+
- Because there are no specific error logs for referential integrity tasks, you must manually search for the debug log statements to see whether all external records were retrieved via the search service.
220247

221248
## Next steps
222-
Advance to the manifest ingestion tutorial and learn how to perform a manifest-based file ingestion
249+
250+
Advance to the following tutorial and learn how to perform a manifest-based file ingestion:
251+
223252
> [!div class="nextstepaction"]
224253
> [Tutorial: Sample steps to perform a manifest-based file ingestion](tutorial-manifest-ingestion.md)
225254
226-
## Reference
255+
## References
256+
227257
- [Manifest-based ingestion concepts](concepts-manifest-ingestion.md)
228258
- [Ingestion DAGs](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/README.md#operators-description)

0 commit comments

Comments
 (0)