-
Notifications
You must be signed in to change notification settings - Fork 9
Update sdc.md #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update sdc.md #1826
Conversation
Deploying opensafely-docs with
|
Latest commit: |
f1336db
|
Status: | ✅ Deploy successful! |
Preview URL: | https://f8b20cd0.opensafely-docs.pages.dev |
Branch Preview URL: | https://andrewscolm-patch-3.opensafely-docs.pages.dev |
|
||
* Redacting counts <=7 in frequency tables. Row and column totals should be recalculated after you have redacted the cell values, to ensure that the redacted values can not be inferred from the totals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rounding to the nearest 5 offers protection against this
@@ -1,7 +1,7 @@ | |||
The assessment of the risk of re-identification attached to a data item or statistical outputs, and the use of appropriate methods to reduce the disclosure risk, is known as **statistical disclosure control (SDC)**. In OpenSAFELY, researchers must apply SDC at the stage where their aggregated results are ready to be released from the results server (the Level 4 environment) for sharing with collaborators for feedback, or for publication as papers, reports, blogs, etc. Examples of SDC techniques to manage the disclosure risk include redacting (suppressing) low values, rounding values, or redesigning outputs so that sparse table cells, for example, are combined. | |||
In general, good SDC is consistent with good statistics: many observations, no influential outliers, well-behaved distributions etc both prevent disclosure and increase confidence in the statistics. The one area to be wary of is where you can say something for certain about entire groups (‘all patients presenting with X also needed treatment for Y’). Be cautious about statements like this. | |||
|
|||
To understand what checks have to be made to outputs it is important to understand the **attribute types** that exist in data and how these could lead to **primary or secondary disclosure**. Importantly, OpenSAFELY requires that researchers redact any outputs based on counts <= 7 before they can be released. | |||
To understand what checks have to be made to outputs it is important to understand the **attribute types** that exist in data and how these could lead to **primary or secondary disclosure**. Importantly, OpenSAFELY requires that researchers redact any outputs that can identify <=5 individuals. In order to achieve this for counts rounded to the nearest 5 counts of 7 or fewer must be redacted before rounding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comma would help:
In order to achieve this for counts rounded to the nearest 5, counts of 7 or fewer must be redacted before rounding.
However, I don't think this is correct. We don't have to redact (= completely remove a value) if the rounding precision doesn't lead to a rounding band with width <5. For example, if I round everything to the nearest 20, then we have [-9, 9], [10, 29], [30, 49],...
mapping to values 0, 20, 40, ...
, which is allowed, and doesn't require any redaction. Similarly for midpoint-5 and above.
Add clarification to rounding requirements and redaction of values <=7