Skip to content

Commit 29ebf21

Browse files
authored
Merge pull request #504 from NHSDigital/dtoss-11163-audit-functionality-adr
Add ADR-004 custom audit logging
2 parents 24085ea + 2303293 commit 29ebf21

File tree

1 file changed

+154
-0
lines changed

1 file changed

+154
-0
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# ADR 004: Custom audit logging implementation
2+
3+
## Context
4+
5+
The manage breast screening service handles sensitive personal data that requires robust audit capabilities. We need the ability to track changes to data stored by the application, enabling us to quickly understand what happened in the event of an unauthorised change, data corruption, or bug in the application.
6+
7+
As a healthcare application, we have specific audit requirements:
8+
9+
- Track WHO made changes (user or system process)
10+
- Track WHAT changed (model, operation, and field-level snapshots)
11+
- Track WHEN changes occurred
12+
- Support both user-driven and automated system updates
13+
- Enable investigation of data integrity issues
14+
15+
We are conscious that this data can grow rapidly and need to consider future archiving or pruning.
16+
17+
We initially explored using existing Django audit libraries but found they did not fully meet our needs.
18+
19+
## Decision
20+
21+
We decided to implement a custom audit logging solution consisting of:
22+
23+
1. A single `AuditLog` model that captures audit events across all models
24+
2. An `Auditor` service class that explicitly logs audit events
25+
3. Explicit audit calls at the application/service layer rather than automatic tracking via model signals
26+
27+
### Key characteristics of our implementation:
28+
29+
- **Explicit auditing**: Developers must explicitly call `auditor.audit_create()`, `auditor.audit_update()`, or `auditor.audit_delete()` to create audit logs
30+
- **Single audit table**: All audit records are stored in one `AuditLog` table using Django's `ContentType` framework to identify the audited model
31+
- **JSON snapshots**: Field-level data is captured as JSON snapshots (excluding configured fields like passwords)
32+
- **Actor tracking**: Records either the authenticated user or a system update identifier
33+
- **Bulk operation support**: Provides `audit_bulk_create()`, `audit_bulk_update()`, and `audit_bulk_delete()` methods
34+
- **Integration at service layer**: Audit calls are made in form handlers and service methods, not at the ORM level
35+
36+
### Alternatives considered
37+
38+
We initially attempted to use `django-simple-history` (PR #109), which was ultimately rejected in favour of the custom implementation.
39+
40+
#### django-simple-history approach
41+
42+
django-simple-history works by:
43+
44+
- Creating a separate history table for each model (e.g. `HistoricalClinic`, `HistoricalPatient`)
45+
- Automatically capturing changes via Django's `post_save`, `pre_save`, and `post_delete` signals
46+
- Using middleware to capture the current user and attach it to history records
47+
- Providing a `HistoricalRecords` manager on each model
48+
49+
**Pros:**
50+
51+
- Well-maintained, established library (48 releases, tested against Django 5.2)
52+
- Automatic audit trail with minimal boilerplate
53+
- Built-in Django admin integration via `SimpleHistoryAdmin`
54+
- Can revert models to previous versions
55+
- Good documentation
56+
57+
**Cons:**
58+
59+
- **Opaque**: Signal-based auditing can miss certain operations (bulk creates, bulk updates, raw SQL, direct database updates)
60+
- **Implicit**: Developers might not realise when audit logging is or isn't happening
61+
- **Database complexity**: Creates a separate history table for each audited model, leading to many migration files and complex schema
62+
- **Limited control**: Difficult to audit operations that don't go through standard model save/delete paths
63+
- **Performance overhead**: Signals fire on every save, even when you might not need auditing
64+
- **Migration burden**: Each model change requires migrations for both the model and its history table
65+
66+
#### Other libraries evaluated
67+
68+
- **django-auditlog**: Similar to django-simple-history but less well maintained (11 releases vs 48, tested against Django 5.1 vs 5.2). Uses a single audit table but still relies on automatic signal-based tracking.
69+
- **django-reversion**: More focused on version control and rollback functionality rather than audit logging. Lighter documentation and not tested against latest Django (5.0 vs 5.2).
70+
- **pgAudit**: SQL-level solution that would require PostgreSQL-specific configuration and wouldn't integrate well with Django ORM.
71+
72+
### Rationale
73+
74+
We chose the custom implementation over django-simple-history and other libraries because:
75+
76+
1. **Reliability**: Explicit audit calls ensure we don't accidentally miss auditing important operations. With signal-based approaches, bulk operations, raw SQL, and certain ORM methods can bypass audit logging entirely.
77+
78+
2. **Transparency**: Developers can see exactly when and what is being audited. There's no "magic" happening behind the scenes that might fail silently.
79+
80+
3. **Simplicity**: A single audit table is easier to query and maintain than per-model history tables. This reduces database complexity and migration overhead.
81+
82+
4. **Flexibility**: We can audit operations that happen outside the normal Django ORM flow, such as:
83+
- Bulk imports from external systems
84+
- Data transformations that don't use standard save/delete methods
85+
- System-driven updates that need to be tracked with a system update ID
86+
87+
5. **Performance control**: We only create audit logs when explicitly needed, avoiding unnecessary overhead from signal handlers firing on every model save.
88+
89+
6. **Healthcare context**: For a healthcare application, we need confidence that our audit trail is complete. The explicit approach makes it easier to verify that all critical operations are audited.
90+
91+
The trade-off is that developers must remember to add audit calls, but we consider this acceptable because:
92+
93+
- It makes auditing a deliberate, reviewed decision
94+
- It's easier to catch missing audit calls in code review than to debug why signal-based auditing failed
95+
- We don't need automatic version tracking or rollback functionality - audit logs are for investigation only
96+
97+
## Consequences
98+
99+
### Positive consequences
100+
101+
- Audit logging is visible and explicit in the codebase
102+
- Single audit table is easier to query for investigations (e.g. "show me all changes by this user" or "show me all changes to this patient")
103+
- Bulk operations are properly supported
104+
- No risk of missing audit logs due to ORM quirks or bulk operations
105+
- Simpler database schema with fewer migration files
106+
- Can audit system updates that don't have an authenticated user
107+
- Flexible enough to handle edge cases (e.g. auditing before delete to capture the object ID)
108+
109+
### Negative consequences
110+
111+
- Developers must remember to add explicit audit calls when implementing new features
112+
- More boilerplate code compared to automatic solutions
113+
- No built-in rollback/revert functionality (though we don't need this)
114+
- Need to maintain custom audit code rather than relying on a third-party library
115+
116+
### Mitigation strategies
117+
118+
To address the risk of forgetting to add audit calls:
119+
120+
- Code review checklist should include verifying audit calls for data modifications
121+
- Integration tests should verify that audit logs are created for critical operations
122+
- The `Auditor` class validates that either a user or system update ID is provided (currently commented out until authentication is implemented)
123+
124+
### Future considerations
125+
126+
- Once authentication is fully implemented, uncomment the `AnonymousAuditError` check in the `Auditor` constructor
127+
- Consider audit log retention policies (how long to keep audit records)
128+
- May want to add helper methods to common model operations to reduce boilerplate
129+
- Could add linting rules to detect model saves without corresponding audit calls
130+
131+
## Compliance
132+
133+
Success will be measured by:
134+
135+
- All data modification operations in production have corresponding audit log entries
136+
- Audit logs can be successfully used to investigate data integrity issues
137+
- Audit trail is complete enough to satisfy information governance requirements
138+
139+
## Notes
140+
141+
Related code:
142+
143+
- `manage_breast_screening/core/models.py` - `AuditLog` model definition
144+
- `manage_breast_screening/core/services/auditor.py` - `Auditor` service class
145+
- `manage_breast_screening/config/settings.py` - `AUDIT_EXCLUDED_FIELDS` configuration
146+
147+
Related pull requests:
148+
149+
- PR #109: Initial attempt with django-simple-history (closed)
150+
- PR #120: Final implementation with custom audit logging (merged)
151+
152+
## Tags
153+
154+
`#security` `#data` `#maintainability` `#reliability`

0 commit comments

Comments
 (0)