|
| 1 | +--- |
| 2 | +title: Data consistency verification in copy activity |
| 3 | +description: 'Learn about how to enable data consistency verification in copy activity in Azure Data Factory.' |
| 4 | +services: data-factory |
| 5 | +documentationcenter: '' |
| 6 | +author: dearandyxu |
| 7 | +manager: |
| 8 | +ms.reviewer: |
| 9 | + |
| 10 | +ms.service: data-factory |
| 11 | +ms.workload: data-services |
| 12 | + |
| 13 | + |
| 14 | +ms.topic: conceptual |
| 15 | +ms.date: 3/27/2020 |
| 16 | +ms.author: yexu |
| 17 | + |
| 18 | +--- |
| 19 | +# Data consistency verification in copy activity (Preview) |
| 20 | + |
| 21 | +When you move data from source to destination store, Azure Data Factory copy activity provides data consistency verification to ensure the data is not only successfully copied from source to destination store, but also verified to be consistent between source and destination store. Once inconsistent data have been found during the data movement, you can either abort the copy activity or continue to copy the rest by enabling fault tolerance setting to skip inconsistent data. You can get the skipped object names by enabling session log setting in copy activity. |
| 22 | + |
| 23 | +## Supported data stores |
| 24 | + |
| 25 | +### Source data stores |
| 26 | + |
| 27 | +- [Azure Blob storage](connector-azure-blob-storage.md) |
| 28 | +- [Azure Data Lake Storage Gen1](connector-azure-data-lake-store.md) |
| 29 | +- [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md) |
| 30 | +- [Azure File Storage](connector-azure-file-storage.md) |
| 31 | +- [Amazon S3](connector-amazon-simple-storage-service.md) |
| 32 | +- [File System](connector-file-system.md) |
| 33 | +- [HDFS](connector-hdfs.md) |
| 34 | + |
| 35 | +### Destination data stores |
| 36 | + |
| 37 | +- [Azure Blob storage](connector-azure-blob-storage.md) |
| 38 | +- [Azure Data Lake Storage Gen1](connector-azure-data-lake-store.md) |
| 39 | +- [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md) |
| 40 | +- [Azure File Storage](connector-azure-file-storage.md) |
| 41 | +- [File System](connector-file-system.md) |
| 42 | + |
| 43 | + |
| 44 | +## Configuration |
| 45 | +The following example provides a JSON definition to enable data consistency in Copy Activity: |
| 46 | + |
| 47 | +```json |
| 48 | +"typeProperties": { |
| 49 | +"source": { |
| 50 | + "type": "BinarySource", |
| 51 | + "storeSettings": { |
| 52 | + "type": "AzureDataLakeStoreReadSettings", |
| 53 | + "recursive": true |
| 54 | + } |
| 55 | + }, |
| 56 | + "sink": { |
| 57 | + "type": "BinarySink", |
| 58 | + "storeSettings": { |
| 59 | + "type": "AzureDataLakeStoreWriteSettings" |
| 60 | + } |
| 61 | +}, |
| 62 | + "validateDataConsistency": true, |
| 63 | + "skipErrorFile": { |
| 64 | + "dataInconsistency": true |
| 65 | + }, |
| 66 | + "logStorageSettings": { |
| 67 | + "linkedServiceName": { |
| 68 | + "referenceName": "ADLSGen2_storage", |
| 69 | + "type": "LinkedServiceReference" |
| 70 | + }, |
| 71 | + "path": "/sessionlog/" |
| 72 | +} |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +Property | Description | Allowed values | Required |
| 77 | +-------- | ----------- | -------------- | -------- |
| 78 | +validateDataConsistency | If you set true for this property, copy activity will check file size, lastModifiedDate, and MD5 checksum for each object copied from source to destination store to ensure the data consistency between source and target. Be aware the copy performance will be impacted by enabling this option. | True<br/>False (default) | No |
| 79 | +dataInconsistency | It is one of the key-value pairs for skipErrorFile property to determine if you want to skip the inconsistent data.<br/> -True: you want to copy the rest by skipping inconsistent data.<br/> - False: you want to abort the copy activity once inconsistent data found.<br/>Be aware this property is only valid when you set validateDataConsistency as True. | True<br/>False (default) | No |
| 80 | +logStorageSettings | A group of properties that can be specified when you want to enable session log to show skipped object names. | | No |
| 81 | +linkedServiceName | The linked service of [Azure Blob Storage](connector-azure-blob-storage.md#linked-service-properties) or [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md#linked-service-properties) to store the session log. | The names of an `AzureBlobStorage` or `AzureBlobFS` types linked service, which refers to the instance that you want to use to store the log file. | No |
| 82 | +path | The path of the log file. | Specify the path that you want to store the log file. If you do not provide a path, the service creates a container for you. | No |
| 83 | + |
| 84 | +>[!NOTE] |
| 85 | +>- Only binary copy between file-based stores with PreserveHierarchy copy behavior supports data consistency verification in copy activity now. |
| 86 | +>- Data consistency is not supported in staging copy scenario. |
| 87 | +>- When copying binary files to Azure Blob Storage or Azure Data Lake Storage Gen2, copy activity does both file size and MD5 checksum verification to ensure the data consistency between source and destination stores. For copying binary files to other storage stores, copy activity does file size verification to ensure the data consistency between source and destination store. |
| 88 | +
|
| 89 | + |
| 90 | +## Monitor data consistency verification |
| 91 | + |
| 92 | +### Activity output |
| 93 | +After the copy activity run completes, you can get the result of data consistency verification from the output of each copy activity run: |
| 94 | + |
| 95 | +```json |
| 96 | +"output": { |
| 97 | + "dataRead": 695, |
| 98 | + "dataWritten": 186, |
| 99 | + "filesRead": 3, |
| 100 | + "filesWritten": 1, |
| 101 | + "filesSkipped": 2, |
| 102 | + "throughput": 297, |
| 103 | + "logPath": "https://myblobstorage.blob.core.windows.net//myfolder/a84bf8d4-233f-4216-8cb5-45962831cd1b/", |
| 104 | + "dataConsistencyVerification": |
| 105 | + { |
| 106 | + "VerificationResult": "Verified", |
| 107 | + "InconsistentData": "Skipped" |
| 108 | + } |
| 109 | + } |
| 110 | + |
| 111 | +``` |
| 112 | + |
| 113 | +Value for **VerificationResult**: |
| 114 | +- **Verified**: Your copied data has been verified to be consistent between source and destination store. |
| 115 | +- **NotVerified**: Your copied data has not been verified to be consistent because you have not enabled the validateDataConsistency setting. |
| 116 | +- **Unsupported**: Your copied data has not been verified to be consistent because data consistency verification is not supported in this copy activity run. |
| 117 | + |
| 118 | +Value for **InconsistentData**: |
| 119 | +- **Found**: ADF copy activity has found Inconsistent data. |
| 120 | +- **Skipped**: ADF copy activity has found and skipped Inconsistent data. |
| 121 | +- **None**: ADF copy activity has not found any inconsistent data because either you disabled validateDataConsistency setting or your data has been verified to be consistent between source and destination store. |
| 122 | + |
| 123 | +### Activity session log |
| 124 | + |
| 125 | +If you configure to log the inconsistent file, you can find the log file at this path: `https://[your-blob-account].blob.core.windows.net/[path-if-configured]/copyactivity-logs/[copy-activity-name]/[copy-activity-run-id]/[auto-generated-GUID].csv`. The log files can only be the csv files. |
| 126 | + |
| 127 | +The schema of the log file is as following: |
| 128 | + |
| 129 | +Column | Description |
| 130 | +-------- | ----------- |
| 131 | +Timestamp | The timestamp when ADF skips the inconsistent data |
| 132 | +Level | The level of log information for this item. It will be in 'Warning' level if this item shows the skipped file names. |
| 133 | +OperationName | The type of ADF copy activity operation against data. It will be 'FileSkip' to specify that particular file has been skipped |
| 134 | +OperationItem | The skipped file names from the source data store |
| 135 | +Message | More information to illustrate what kinds of the inconsistency being skipped |
| 136 | + |
| 137 | +The example of a log file is as following: |
| 138 | +``` |
| 139 | +Timestamp, Level, OperationName, OperationItem, Message |
| 140 | +2020-02-26 06:22:56.3190846, Warning, FileSkip, "sample1.csv", "File is skipped after read 548000000 bytes: ErrorCode=DataConsistencySourceDataChanged,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Source file 'sample1.csv' is changed by other clients during the copy activity run.,Source=,'." |
| 141 | +``` |
| 142 | +From the sample log, you can see sample1.csv has been skipped due to it is inconsistent between source and destination store. You can also get more details that the reason why sample1.csv becomes inconsistent is because it was changed by other clients during the copy activity run. |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | +## Next steps |
| 147 | +See the other Copy Activity articles: |
| 148 | + |
| 149 | +- [Copy activity overview](copy-activity-overview.md) |
| 150 | +- [Copy activity fault tolerance](copy-activity-fault-tolerance.md) |
| 151 | + |
| 152 | + |
0 commit comments