Skip to content

Commit 640c02d

Browse files
Merge pull request #252749 from jonburchel/2023-09-25-merge-public-prs
2023 09 25 merge public prs
2 parents cb79b8e + 355ca85 commit 640c02d

File tree

3 files changed

+85
-74
lines changed

3 files changed

+85
-74
lines changed

articles/data-factory/copy-activity-data-consistency.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ ms.author: yexu
1414

1515
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
1616

17-
When you move data from source to destination store, the copy activity provides an option for you to do additional data consistency verification to ensure the data is not only successfully copied from source to destination store, but also verified to be consistent between source and destination store. Once inconsistent files have been found during the data movement, you can either abort the copy activity or continue to copy the rest by enabling fault tolerance setting to skip inconsistent files. You can get the skipped file names by enabling session log setting in copy activity. You can refer to [session log in copy activity](copy-activity-log.md) for more details.
17+
When you move data from source to destination store, the copy activity provides an option for you to do further data consistency verification to ensure the data is not only successfully copied from source to destination store, but also verified to be consistent between source and destination store. Once inconsistent files have been found during the data movement, you can either abort the copy activity or continue to copy the rest by enabling fault tolerance setting to skip inconsistent files. You can get the skipped file names by enabling session log setting in copy activity. You can refer to [session log in copy activity](copy-activity-log.md) for more details.
1818

1919
## Supported data stores and scenarios
2020

2121
- Data consistency verification is supported by all the connectors except FTP, SFTP, HTTP, Snowflake, Office 365 and Azure Databricks Delta Lake.
22-
- Data consistency verification is not supported in staging copy scenario.
22+
- Data consistency verification isn't supported in staging copy scenario.
2323
- When copying binary files, data consistency verification is only available when 'PreserveHierarchy' behavior is set in copy activity.
2424
- When copying multiple binary files in single copy activity with data consistency verification enabled, you have an option to either abort the copy activity or continue to copy the rest by enabling fault tolerance setting to skip inconsistent files.
2525
- When copying a table in single copy activity with data consistency verification enabled, copy activity fails if the number of rows read from the source is different from the number of rows copied to the destination plus the number of incompatible rows that were skipped.
@@ -29,8 +29,11 @@ When you move data from source to destination store, the copy activity provides
2929
The following example provides a JSON definition to enable data consistency verification in Copy Activity:
3030

3131
```json
32-
"typeProperties": {
33-
"source": {
32+
{
33+
"name":"CopyActivityDataConsistency",
34+
"type":"Copy",
35+
"typeProperties": {
36+
"source": {
3437
"type": "BinarySource",
3538
"storeSettings": {
3639
"type": "AzureDataLakeStoreReadSettings",
@@ -42,7 +45,7 @@ The following example provides a JSON definition to enable data consistency veri
4245
"storeSettings": {
4346
"type": "AzureDataLakeStoreWriteSettings"
4447
}
45-
},
48+
},
4649
"validateDataConsistency": true,
4750
"skipErrorFile": {
4851
"dataInconsistency": true
@@ -66,14 +69,14 @@ The following example provides a JSON definition to enable data consistency veri
6669

6770
Property | Description | Allowed values | Required
6871
-------- | ----------- | -------------- | --------
69-
validateDataConsistency | If you set true for this property, when copying binary files, copy activity will check file size, lastModifiedDate, and MD5 checksum for each binary file copied from source to destination store to ensure the data consistency between source and destination store. When copying tabular data, copy activity will check the total row count after job completes to ensure the total number of rows read from the source is same as the number of rows copied to the destination plus the number of incompatible rows that were skipped. Be aware the copy performance will be affected by enabling this option. | True<br/>False (default) | No
72+
validateDataConsistency | If you set true for this property, when copying binary files, copy activity will check file size, lastModifiedDate, and MD5 checksum for each binary file copied from source to destination store to ensure the data consistency between source and destination store. When copying tabular data, copy activity will check the total row count after job completes, ensuring the total number of rows read from the source is same as the number of rows copied to the destination plus the number of incompatible rows that were skipped. Be aware the copy performance is affected by enabling this option. | True<br/>False (default) | No
7073
dataInconsistency | One of the key-value pairs within skipErrorFile property bag to determine if you want to skip the inconsistent files. <br/> -True: you want to copy the rest by skipping inconsistent files.<br/> - False: you want to abort the copy activity once inconsistent file found.<br/>Be aware this property is only valid when you are copying binary files and set validateDataConsistency as True. | True<br/>False (default) | No
7174
logSettings | A group of properties that can be specified to enable session log to log skipped files. | | No
7275
linkedServiceName | The linked service of [Azure Blob Storage](connector-azure-blob-storage.md#linked-service-properties) or [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md#linked-service-properties) to store the session log files. | The names of an `AzureBlobStorage` or `AzureBlobFS` types linked service, which refers to the instance that you use to store the log files. | No
7376
path | The path of the log files. | Specify the path that you want to store the log files. If you do not provide a path, the service creates a container for you. | No
7477

7578
>[!NOTE]
76-
>- When copying binary files from, or to Azure Blob or Azure Data Lake Storage Gen2, the service does block level MD5 checksum verification leveraging [Azure Blob API](/dotnet/api/microsoft.azure.storage.blob.blobrequestoptions?view=azure-dotnet-legacy&preserve-view=true) and [Azure Data Lake Storage Gen2 API](/rest/api/storageservices/datalakestoragegen2/path/update#request-headers). If ContentMD5 on files exist on Azure Blob or Azure Data Lake Storage Gen2 as data sources, the service does file level MD5 checksum verification after reading the files as well. After copying files to Azure Blob or Azure Data Lake Storage Gen2 as data destination, the service writes ContentMD5 to Azure Blob or Azure Data Lake Storage Gen2 which can be further consumed by downstream applications for data consistency verification.
79+
>- When copying binary files from or to Azure Blob or Azure Data Lake Storage Gen2, the service does block level MD5 checksum verification leveraging [Azure Blob API](/dotnet/api/microsoft.azure.storage.blob.blobrequestoptions?view=azure-dotnet-legacy&preserve-view=true) and [Azure Data Lake Storage Gen2 API](/rest/api/storageservices/datalakestoragegen2/path/update#request-headers). If ContentMD5 on files exist on Azure Blob or Azure Data Lake Storage Gen2 as data sources, the service does file level MD5 checksum verification after reading the files as well. After copying files to Azure Blob or Azure Data Lake Storage Gen2 as data destination, the service writes ContentMD5 to Azure Blob or Azure Data Lake Storage Gen2 which can be further consumed by downstream applications for data consistency verification.
7780
>- The service does file size verification when copying binary files between any storage stores.
7881

7982
## Monitoring
@@ -102,25 +105,25 @@ You can see the details of data consistency verification from "dataConsistencyVe
102105

103106
Value of **VerificationResult**:
104107
- **Verified**: Your copied data has been verified to be consistent between source and destination store.
105-
- **NotVerified**: Your copied data has not been verified to be consistent because you have not enabled the validateDataConsistency in copy activity.
106-
- **Unsupported**: Your copied data has not been verified to be consistent because data consistency verification is not supported for this particular copy pair.
108+
- **NotVerified**: Your copied data hasn't been verified to be consistent because you haven't enabled the validateDataConsistency in copy activity.
109+
- **Unsupported**: Your copied data hasn't been verified to be consistent because data consistency verification isn't supported for this particular copy pair.
107110

108111
Value of **InconsistentData**:
109112
- **Found**: The copy activity has found inconsistent data.
110113
- **Skipped**: The copy activity has found and skipped inconsistent data.
111-
- **None**: The copy activity has not found any inconsistent data. It can be either because your data has been verified to be consistent between source and destination store or because you disabled validateDataConsistency in copy activity.
114+
- **None**: The copy activity hasn't found any inconsistent data. It can be either because your data has been verified to be consistent between source and destination store or because you disabled validateDataConsistency in copy activity.
112115

113116
### Session log from copy activity
114117

115-
If you configure to log the inconsistent file, you can find the log file from this path: `https://[your-blob-account].blob.core.windows.net/[path-if-configured]/copyactivity-logs/[copy-activity-name]/[copy-activity-run-id]/[auto-generated-GUID].csv`. The log files will be the csv files.
118+
If you configure to log the inconsistent file, you can find the log file from this path: `https://[your-blob-account].blob.core.windows.net/[path-if-configured]/copyactivity-logs/[copy-activity-name]/[copy-activity-run-id]/[auto-generated-GUID].csv`. The log files are the csv files.
116119

117120
The schema of a log file is as following:
118121

119122
Column | Description
120123
-------- | -----------
121124
Timestamp | The timestamp when the service skips the inconsistent files.
122-
Level | The log level of this item. It will be in 'Warning' level for the item showing file skipping.
123-
OperationName | The copy activity operational behavior on each file. It will be 'FileSkip' to specify the file to be skipped.
125+
Level | The log level of this item. It is in 'Warning' level for the item showing file skipping.
126+
OperationName | The copy activity operational behavior on each file. It is 'FileSkip' to specify the file to be skipped.
124127
OperationItem | The file name to be skipped.
125128
Message | More information to illustrate why files being skipped.
126129

@@ -137,4 +140,4 @@ From the log file above, you can see sample1.csv has been skipped because it fai
137140
See the other Copy Activity articles:
138141

139142
- [Copy activity overview](copy-activity-overview.md)
140-
- [Copy activity fault tolerance](copy-activity-fault-tolerance.md)
143+
- [Copy activity fault tolerance](copy-activity-fault-tolerance.md)

articles/data-factory/copy-activity-fault-tolerance.md

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -42,42 +42,46 @@ To configure fault tolerance in a Copy activity in a pipeline with UI, complete
4242
When you copy binary files between storage stores, you can enable fault tolerance as followings:
4343

4444
```json
45-
"typeProperties": {
46-
"source": {
47-
"type": "BinarySource",
48-
"storeSettings": {
49-
"type": "AzureDataLakeStoreReadSettings",
50-
"recursive": true
51-
}
52-
},
53-
"sink": {
54-
"type": "BinarySink",
55-
"storeSettings": {
56-
"type": "AzureDataLakeStoreWriteSettings"
57-
}
58-
},
59-
"skipErrorFile": {
60-
"fileMissing": true,
61-
"fileForbidden": true,
62-
"dataInconsistency": true,
63-
"invalidFileName": true
64-
},
65-
"validateDataConsistency": true,
45+
{
46+
"name": "CopyActivityFaultTolerance",
47+
"type": "Copy",
48+
"typeProperties": {
49+
"source": {
50+
"type": "BinarySource",
51+
"storeSettings": {
52+
"type": "AzureDataLakeStoreReadSettings",
53+
"recursive": true
54+
}
55+
},
56+
"sink": {
57+
"type": "BinarySink",
58+
"storeSettings": {
59+
"type": "AzureDataLakeStoreWriteSettings"
60+
}
61+
},
62+
"skipErrorFile": {
63+
"fileMissing": true,
64+
"fileForbidden": true,
65+
"dataInconsistency": true,
66+
"invalidFileName": true
67+
},
68+
"validateDataConsistency": true,
6669
"logSettings": {
67-
"enableCopyActivityLog": true,
68-
"copyActivityLogSettings": {
69-
"logLevel": "Warning",
70-
"enableReliableLogging": false
70+
"enableCopyActivityLog": true,
71+
"copyActivityLogSettings": {
72+
"logLevel": "Warning",
73+
"enableReliableLogging": false
74+
},
75+
"logLocationSettings": {
76+
"linkedServiceName": {
77+
"referenceName": "ADLSGen2",
78+
"type": "LinkedServiceReference"
7179
},
72-
"logLocationSettings": {
73-
"linkedServiceName": {
74-
"referenceName": "ADLSGen2",
75-
"type": "LinkedServiceReference"
76-
},
77-
"path": "sessionlog/"
78-
}
80+
"path": "sessionlog/"
81+
}
7982
}
80-
}
83+
}
84+
}
8185
```
8286
Property | Description | Allowed values | Required
8387
-------- | ----------- | -------------- | --------

articles/data-factory/copy-activity-log.md

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -58,42 +58,46 @@ See below for details of the log output format.
5858
The following example provides a JSON definition to enable session log in Copy Activity:
5959

6060
```json
61-
"typeProperties": {
61+
{
62+
"name": "CopyActivityLog",
63+
"type": "Copy",
64+
"typeProperties": {
6265
"source": {
63-
"type": "BinarySource",
64-
"storeSettings": {
65-
"type": "AzureDataLakeStoreReadSettings",
66-
"recursive": true
67-
},
68-
"formatSettings": {
69-
"type": "BinaryReadSettings"
70-
}
66+
"type": "BinarySource",
67+
"storeSettings": {
68+
"type": "AzureDataLakeStoreReadSettings",
69+
"recursive": true
70+
},
71+
"formatSettings": {
72+
"type": "BinaryReadSettings"
73+
}
7174
},
7275
"sink": {
73-
"type": "BinarySink",
74-
"storeSettings": {
75-
"type": "AzureBlobFSWriteSettings"
76-
}
77-
},
76+
"type": "BinarySink",
77+
"storeSettings": {
78+
"type": "AzureBlobFSWriteSettings"
79+
}
80+
},
7881
"skipErrorFile": {
79-
"fileForbidden": true,
80-
"dataInconsistency": true
82+
"fileForbidden": true,
83+
"dataInconsistency": true
8184
},
8285
"validateDataConsistency": true,
8386
"logSettings": {
84-
"enableCopyActivityLog": true,
85-
"copyActivityLogSettings": {
86-
"logLevel": "Warning",
87-
"enableReliableLogging": false
87+
"enableCopyActivityLog": true,
88+
"copyActivityLogSettings": {
89+
"logLevel": "Warning",
90+
"enableReliableLogging": false
91+
},
92+
"logLocationSettings": {
93+
"linkedServiceName": {
94+
"referenceName": "ADLSGen2",
95+
"type": "LinkedServiceReference"
8896
},
89-
"logLocationSettings": {
90-
"linkedServiceName": {
91-
"referenceName": "ADLSGen2",
92-
"type": "LinkedServiceReference"
93-
},
94-
"path": "sessionlog/"
95-
}
97+
"path": "sessionlog/"
98+
}
9699
}
100+
}
97101
}
98102
```
99103

0 commit comments

Comments
 (0)