-
Notifications
You must be signed in to change notification settings - Fork 151
Description
Issue
Users of datacompy sometimes have sensitive columns in their data (such as account IDs or other join keys). The comparison report will display these columns as-is leading to potential leakage of this information if not handled correctly. Users currently need to mask the sensitive information either before using datacompy or before sending the report.
Solution
Allow users to pass in a list of column names and mask those column values before outputing the comparison report, e.g.:
| ACCOUNT_ID | BALANCE |
| 123 | 100.00 |
| 456 | 200.00 |
| 789 | 50.00 |
Becomes:
| ACCOUNT_ID | BALANCE |
| ***** | 100.00 |
| ***** | 200.00 |
| ***** | 50.00 |
Alternatives
An alternative to masking is to hash values using a secure hashing algorithm before the performing the comparison. Values that match will be hashed to the same hash value.