|
| 1 | +# Duplicate Records & Deduplication Process |
| 2 | + |
| 3 | +This document outlines the concepts, causes, and processes related to duplicate patient records in our system. It explains how duplicate records are identified, merged automatically, and merged manually. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## What Is a Duplicate Record? |
| 8 | + |
| 9 | +A **duplicate record** refers to patients registered with the same BP passport but with different attributes. In our context, duplicate records occur when the same BP passport is used by multiple patient entries, even though their associated data (e.g., name, facility, or other attributes) might differ. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## What Might Cause Deduplication? |
| 14 | + |
| 15 | +Several scenarios can lead to duplicate records: |
| 16 | + |
| 17 | +- **Patients with the same BP passport are registered at different facilities.** |
| 18 | +- **Patients can move blocks.** |
| 19 | +- **Patients referred to a different facility but not assigned in the app.** |
| 20 | +- **Patients can choose to go to two different facilities on the same day, without a sync happening between them.** |
| 21 | +- **The user has reinstalled the app and a full sync isn't complete; a patient on a recurring visit may get registered again.** |
| 22 | +- **The user has cleared the data on the phone without completing the sync, causing recurring patient records to be registered again.** |
| 23 | + |
| 24 | +--- |
| 25 | + |
| 26 | +## Identifying Duplicate Records |
| 27 | + |
| 28 | +We use an asynchronous script, **DuplicatePassportAnalytics**, which is run daily and reports duplicate patients across different regions. The report is sent to Prometheus with the following metrics: |
| 29 | + |
| 30 | +- **duplicate_passports_across_facilities** – Count of patients using the same BP passport across different facilities. |
| 31 | +- **duplicate_passports_in_same_facility** – Count of patients using the same BP passport in the same facility. |
| 32 | +- **duplicate_passports_across_districts** – Count of patients using the same BP passport linked to different districts. |
| 33 | +- **duplicate_passports_across_blocks** – Count of patients using the same BP passport linked to different blocks. |
| 34 | + |
| 35 | +This job is scheduled to run daily at **5 AM local time**. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## Automatic Duplicate Merging |
| 40 | + |
| 41 | +A scheduled job runs at **2 AM every day** to merge patients who have: |
| 42 | +- The same BP passport |
| 43 | +- The same full name |
| 44 | + |
| 45 | +These records are merged automatically by the backend. The process is handled by the deduplication merge logic found in our code base: |
| 46 | + |
| 47 | +- **Merging Logic**: See [PatientDeduplication::Deduplicator](https://github.com/simpledotorg/simple-server/blob/master/app/services/patient_deduplication/deduplicator.rb#L21). |
| 48 | + |
| 49 | +### Merging Steps: |
| 50 | + |
| 51 | +1. **Identify the Latest and Earliest Records:** |
| 52 | + - The patient record with the most recent `recorded_at` is considered the **latest**. |
| 53 | + - The one with the oldest `recorded_at` is the **earliest**. |
| 54 | + |
| 55 | +2. **Create a New Patient Record** with: |
| 56 | + - **Full Name, Gender, and Reminder Consent** from the latest patient. |
| 57 | + - **Recorded At, Registration Facility, Registration User, Device Created/Updated At** from the earliest patient. |
| 58 | + - **Assigned Facility and Status** from the latest patient. |
| 59 | + |
| 60 | +3. **Address Merging:** |
| 61 | + - A new address is created with the latest information. |
| 62 | + - Old address records are archived in the `deduplication_logs` table. |
| 63 | + |
| 64 | +4. **DOB and Age:** |
| 65 | + - If any patient record has a DOB, the latest DOB is used. |
| 66 | + - Otherwise, the latest patient’s age and `age_updated_at` are set on the new record. |
| 67 | + |
| 68 | +5. **Prescription Drugs:** |
| 69 | + - PrescriptionDrug records from the latest patient are retained. |
| 70 | + - Other duplicate patients’ prescription_drugs are marked as deleted. |
| 71 | + - New records are created and logged in `deduplication_logs`. |
| 72 | + |
| 73 | +6. **Medical History:** |
| 74 | + - A new MedicalHistory record is created. |
| 75 | + - For attributes (e.g., prior_heart_attack_boolean, prior_stroke_boolean, chronic_kidney_disease_boolean, etc.), a precedence is applied: |
| 76 | + - `{"yes" => 0, true => 1, "no" => 2, false => 3, "unknown" => 4, nil => 5}` |
| 77 | + - Lower numerical values have higher precedence. |
| 78 | + - Other MedicalHistory records are tracked under `deduplication_logs`. |
| 79 | + |
| 80 | +7. **Phone Numbers:** |
| 81 | + - Distinct phone numbers across duplicate records are gathered. |
| 82 | + - New records are created for each and logged. |
| 83 | + |
| 84 | +8. **BP Passport Records:** |
| 85 | + - Distinct identifiers from `PatientBusinessIdentifier` are collected. |
| 86 | + - New records are created for each and logged. |
| 87 | + |
| 88 | +9. **Visits (Encounters/Observations):** |
| 89 | + - All existing Encounters for the duplicate patients are re-created for the new patient record. |
| 90 | + - Observations linked to these encounters are similarly re-created and logged. |
| 91 | + |
| 92 | +10. **Appointments:** |
| 93 | + - All existing appointments are re-created for the new patient. |
| 94 | + - Appointments with the status `scheduled` are updated to **cancelled**, except for the most recent one that was synced. |
| 95 | + |
| 96 | +11. **Teleconsultations:** |
| 97 | + - All existing teleconsultation records are re-created for the new patient and logged. |
| 98 | + |
| 99 | +12. **Cleanup:** |
| 100 | + - All duplicate patient records and their associated older data (addresses, appointments, etc.) are soft-deleted after merging. |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## Manual Duplicate Merging |
| 105 | + |
| 106 | +Duplicate patient entries can also be managed manually via the dashboard: |
| 107 | + |
| 108 | +- **Dashboard View:** |
| 109 | + The "Merge Duplicate Patients" tab in the left navigation panel displays potential duplicates. |
| 110 | + <img width="277" alt="Screenshot 2025-04-15 at 10 39 25 AM" src="https://github.com/user-attachments/assets/79427b65-8442-4903-a181-f277e2229821" /> |
| 111 | + |
| 112 | +- **Access Control:** |
| 113 | + - **Organization Managers:** Search across all patients (no facility filter). |
| 114 | + - **Facility Managers:** Search is limited to patients within the facilities they manage. |
| 115 | + |
| 116 | + |
| 117 | +- **Limit:** |
| 118 | + The dashboard displays a hard-coded maximum of **250 duplicate records**. |
| 119 | + |
| 120 | +- **Process:** |
| 121 | + - Patients with the same BP passport but different full names are flagged as duplicates. |
| 122 | + - Users can select/deselect records to merge. |
| 123 | + - Alternatively, users can skip to the next set of records. |
| 124 | + |
| 125 | +<img width="1377" alt="Screenshot 2025-04-15 at 10 18 00 AM" src="https://github.com/user-attachments/assets/83e2b52e-7f4e-493b-9bf2-66ca0a727df8" /> |
| 126 | + |
| 127 | +- **Class Responsible:** |
| 128 | + |
| 129 | + The class `PatientDeduplication::Strategies` handles identifying potential duplicates. |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +## Error Handling |
| 134 | + |
| 135 | +During the merging process, if any error occurs (due to data issues or other reasons), the error is reported through Sentry. This helps in diagnosing and resolving issues promptly. |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +## Conclusion |
| 140 | + |
| 141 | +This deduplication process ensures that patient records remain accurate and consistent by: |
| 142 | +- Identifying duplicates through asynchronous analytics. |
| 143 | +- Automatically merging duplicates with matching names. |
| 144 | +- Allowing manual intervention for duplicates with slight differences. |
| 145 | +- Logging and handling errors for auditability and troubleshooting. |
| 146 | + |
| 147 | +For further details on the merge logic, refer to the [PatientDeduplication::Deduplicator](https://github.com/simpledotorg/simple-server/blob/master/app/services/patient_deduplication/deduplicator.rb#L21) class in our repository. |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +*This README is part of our documentation on managing duplicate patient records and the deduplication process in our system.* |
0 commit comments