-
Notifications
You must be signed in to change notification settings - Fork 4
Description
The economic census county business patterns demonstration dataset has its privacy parameters described incorrectly. The registry entry describes it as zCDP, with a privacy unit of a business establishment, but that is wrong. Both the census and the registry entry notes that this release uses per-record differential privacy, a DP variant defined here. The mechanism is described in more detail in the paper and in the linked webinar, but in brief:
- Thresholds were defined for each attribute of a business establishment. For example (these are not real numbers), number of employees might have a threshold of 100, and payroll might have a threshold of $50,000.
- Each establishment was evaluated against the thresholds. If an establishment exceeded any threshold, it would be split into two or more duplicate establishments whose values were all below the threshold. Following our example above, an establishment with 150 employees and $60,000 payroll would be split into one with 100 employees and $50,000 payroll, and another with 50 employees and $10,000 payroll.
- This new split dataset had zCDP applied with the specified budgets.
It strikes me that this could be described as its own DP flavor, or it could be described as zCDP with a privacy unit of a volume of contribution, similar to (but more complicated than) https://registry.opendp.org/deployments-registry/#historical-pageviews-wikimedia-foundation-2023
Also, I'm not sure whether the splitting thresholds were ever released - the Census site mentions a forthcoming paper which I would have expected to include them, but I don't see any evidence that that paper was ever released, and it doesn't look like (from skimming the slides) the thresholds were announced in the webinar. Without them, the raw budget numbers are pretty incomplete.