|
| 1 | +# openadapt-privacy |
| 2 | + |
| 3 | +Privacy scrubbing for GUI automation data - PII/PHI detection and redaction. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +```bash |
| 8 | +pip install openadapt-privacy |
| 9 | +``` |
| 10 | + |
| 11 | +For Presidio-based scrubbing (recommended): |
| 12 | + |
| 13 | +```bash |
| 14 | +pip install openadapt-privacy[presidio] |
| 15 | +python -m spacy download en_core_web_trf |
| 16 | +``` |
| 17 | + |
| 18 | +## Quick Start |
| 19 | + |
| 20 | +### Text Scrubbing |
| 21 | + |
| 22 | +```python |
| 23 | +from openadapt_privacy.providers.presidio import PresidioScrubbingProvider |
| 24 | + |
| 25 | +scrubber = PresidioScrubbingProvider() |
| 26 | + |
| 27 | +text = "Contact John Smith at [email protected] or 555-123-4567" |
| 28 | +scrubbed = scrubber.scrub_text(text) |
| 29 | +``` |
| 30 | + |
| 31 | +**Input:** |
| 32 | +``` |
| 33 | +Contact John Smith at [email protected] or 555-123-4567 |
| 34 | +``` |
| 35 | + |
| 36 | +**Output:** |
| 37 | +``` |
| 38 | +Contact <PERSON> at <EMAIL_ADDRESS> or <PHONE_NUMBER> |
| 39 | +``` |
| 40 | + |
| 41 | +### Example Inputs & Outputs |
| 42 | + |
| 43 | +| Input | Output | |
| 44 | +|-------|--------| |
| 45 | +| `My email is [email protected]` | `My email is <EMAIL_ADDRESS>` | |
| 46 | +| `SSN: 923-45-6789` | `SSN: <US_SSN>` | |
| 47 | +| `Card: 4532-1234-5678-9012` | `Card: <CREDIT_CARD>` | |
| 48 | +| `Call me at 555-123-4567` | `Call me at <PHONE_NUMBER>` | |
| 49 | +| `DOB: 01/15/1985` | `DOB: <DATE_TIME>` | |
| 50 | +| `Contact John Smith` | `Contact <PERSON>` | |
| 51 | + |
| 52 | +## Dict Scrubbing |
| 53 | + |
| 54 | +Scrub PII from nested dictionaries (e.g., GUI element trees): |
| 55 | + |
| 56 | +```python |
| 57 | +from openadapt_privacy import scrub_dict |
| 58 | +from openadapt_privacy.providers.presidio import PresidioScrubbingProvider |
| 59 | + |
| 60 | +scrubber = PresidioScrubbingProvider() |
| 61 | +action = { |
| 62 | + "text": "Email: [email protected]", |
| 63 | + "metadata": { |
| 64 | + "title": "User Profile - John Smith", |
| 65 | + "tooltip": "Click to contact [email protected]", |
| 66 | + }, |
| 67 | + "coordinates": {"x": 100, "y": 200}, |
| 68 | +} |
| 69 | +scrubbed = scrub_dict(action, scrubber) |
| 70 | +``` |
| 71 | + |
| 72 | +**Input:** |
| 73 | +```json |
| 74 | +{ |
| 75 | + "text": "Email: [email protected]", |
| 76 | + "metadata": { |
| 77 | + "title": "User Profile - John Smith", |
| 78 | + "tooltip": "Click to contact [email protected]" |
| 79 | + }, |
| 80 | + "coordinates": {"x": 100, "y": 200} |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +**Output:** |
| 85 | +```json |
| 86 | +{ |
| 87 | + "text": "Email: <EMAIL_ADDRESS>", |
| 88 | + "metadata": { |
| 89 | + "title": "User Profile - <PERSON>", |
| 90 | + "tooltip": "Click to contact <EMAIL_ADDRESS>" |
| 91 | + }, |
| 92 | + "coordinates": {"x": 100, "y": 200} |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | +## Recording Pipeline |
| 97 | + |
| 98 | +Process complete GUI automation recordings: |
| 99 | + |
| 100 | +```python |
| 101 | +from openadapt_privacy import DictRecordingLoader |
| 102 | +from openadapt_privacy.providers.presidio import PresidioScrubbingProvider |
| 103 | + |
| 104 | +scrubber = PresidioScrubbingProvider() |
| 105 | +loader = DictRecordingLoader() |
| 106 | + |
| 107 | +recording = loader.load_from_dict({ |
| 108 | + "task_description": "Send email to John Smith at [email protected]", |
| 109 | + "actions": [ |
| 110 | + {"id": 1, "action_type": "click", "text": "Compose", "timestamp": 1000}, |
| 111 | + { "id": 2, "action_type": "type", "text": "[email protected]", "timestamp": 2000}, |
| 112 | + { "id": 3, "action_type": "click", "text": "Send", "window_title": "Email to [email protected]", "timestamp": 3000}, |
| 113 | + ], |
| 114 | +}) |
| 115 | + |
| 116 | +scrubbed = recording.scrub(scrubber) |
| 117 | +``` |
| 118 | + |
| 119 | +**Input Recording:** |
| 120 | +``` |
| 121 | +task_description: "Send email to John Smith at [email protected]" |
| 122 | +
|
| 123 | +actions: |
| 124 | + [1] click: "Compose" |
| 125 | + |
| 126 | + [3] click: "Send" (window: "Email to [email protected]") |
| 127 | +``` |
| 128 | + |
| 129 | +**Output Recording:** |
| 130 | +``` |
| 131 | +task_description: "Send email to <PERSON> at <EMAIL_ADDRESS>" |
| 132 | +
|
| 133 | +actions: |
| 134 | + [1] click: "Compose" |
| 135 | + [2] type: "<EMAIL_ADDRESS>" |
| 136 | + [3] click: "Send" (window: "Email to <EMAIL_ADDRESS>") |
| 137 | +``` |
| 138 | + |
| 139 | +## Image Scrubbing |
| 140 | + |
| 141 | +Redact PII from screenshots using OCR + NER: |
| 142 | + |
| 143 | +```python |
| 144 | +from PIL import Image |
| 145 | +from openadapt_privacy.providers.presidio import PresidioScrubbingProvider |
| 146 | + |
| 147 | +scrubber = PresidioScrubbingProvider() |
| 148 | + |
| 149 | +image = Image.open("screenshot.png") |
| 150 | +scrubbed_image = scrubber.scrub_image(image) |
| 151 | +scrubbed_image.save("screenshot_scrubbed.png") |
| 152 | +``` |
| 153 | + |
| 154 | +**Input Screenshot:** |
| 155 | + |
| 156 | + |
| 157 | + |
| 158 | +**Output Screenshot:** |
| 159 | + |
| 160 | + |
| 161 | + |
| 162 | +The image redactor: |
| 163 | +1. Runs OCR to detect text regions |
| 164 | +2. Analyzes text for PII entities (email, phone, SSN, etc.) |
| 165 | +3. Fills detected PII regions with solid color (configurable, default: red) |
| 166 | + |
| 167 | +## Custom Data Loader |
| 168 | + |
| 169 | +Implement your own loader for custom storage formats: |
| 170 | + |
| 171 | +```python |
| 172 | +from openadapt_privacy import RecordingLoader, Recording |
| 173 | + |
| 174 | +class SQLiteRecordingLoader(RecordingLoader): |
| 175 | + def __init__(self, db_path: str): |
| 176 | + self.db_path = db_path |
| 177 | + |
| 178 | + def load(self, recording_id: str) -> Recording: |
| 179 | + # Load from SQLite database |
| 180 | + ... |
| 181 | + |
| 182 | + def save(self, recording: Recording, recording_id: str) -> None: |
| 183 | + # Save to SQLite database |
| 184 | + ... |
| 185 | + |
| 186 | +# Usage |
| 187 | +loader = SQLiteRecordingLoader("recordings.db") |
| 188 | +scrubber = PresidioScrubbingProvider() |
| 189 | + |
| 190 | +# Load, scrub, and save |
| 191 | +scrubbed = loader.load_and_scrub("recording_001", scrubber) |
| 192 | +loader.save(scrubbed, "recording_001_scrubbed") |
| 193 | +``` |
| 194 | + |
| 195 | +## Configuration |
| 196 | + |
| 197 | +```python |
| 198 | +from openadapt_privacy.config import PrivacyConfig |
| 199 | + |
| 200 | +custom_config = PrivacyConfig( |
| 201 | + SCRUB_CHAR="X", # Character for scrub_text_all |
| 202 | + SCRUB_FILL_COLOR=0xFF0000, # Red for image redaction (BGR) |
| 203 | + SCRUB_KEYS_HTML=[ # Keys to scrub in dicts |
| 204 | + "text", "value", "title", "tooltip", "custom_field" |
| 205 | + ], |
| 206 | + SCRUB_PRESIDIO_IGNORE_ENTITIES=[ # Entity types to skip |
| 207 | + "DATE_TIME", |
| 208 | + ], |
| 209 | +) |
| 210 | +``` |
| 211 | + |
| 212 | +## Supported Entity Types |
| 213 | + |
| 214 | +| Entity | Example Input | Example Output | |
| 215 | +|--------|---------------|----------------| |
| 216 | +| `PERSON` | `John Smith` | `<PERSON>` | |
| 217 | +| `EMAIL_ADDRESS` | `[email protected]` | `<EMAIL_ADDRESS>` | |
| 218 | +| `PHONE_NUMBER` | `555-123-4567` | `<PHONE_NUMBER>` | |
| 219 | +| `US_SSN` | `923-45-6789` | `<US_SSN>` | |
| 220 | +| `CREDIT_CARD` | `4532-1234-5678-9012` | `<CREDIT_CARD>` | |
| 221 | +| `US_BANK_NUMBER` | `635526789012` | `<US_BANK_NUMBER>` | |
| 222 | +| `US_DRIVER_LICENSE` | `A123-456-789-012` | `<US_DRIVER_LICENSE>` | |
| 223 | +| `DATE_TIME` | `01/15/1985` | `<DATE_TIME>` | |
| 224 | +| `LOCATION` | `Toronto, ON` | `<LOCATION>` | |
| 225 | + |
| 226 | +## Architecture |
| 227 | + |
| 228 | +``` |
| 229 | +openadapt_privacy/ |
| 230 | +├── base.py # ScrubbingProvider, TextScrubbingMixin |
| 231 | +├── config.py # PrivacyConfig dataclass |
| 232 | +├── loaders.py # Recording, Action, Screenshot, RecordingLoader |
| 233 | +├── providers/ |
| 234 | +│ ├── __init__.py # ScrubProvider registry |
| 235 | +│ └── presidio.py # PresidioScrubbingProvider |
| 236 | +└── pipelines/ |
| 237 | + └── dicts.py # scrub_dict, scrub_list_dicts |
| 238 | +``` |
| 239 | + |
| 240 | +## License |
| 241 | + |
| 242 | +MIT |
0 commit comments