-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Please describe the feature
As a deploying organisation running SDA under Swedish regulations (MSBFS 2020:7),
I need the download service to emit structured audit events so I can route them
to my log management system and satisfy mandatory logging requirements.
Regulatory basis (MSBFS 2020:7 §16): Swedish government agencies MUST log
access to information assessed as needing enhanced protection. Sensitive research
data in SDA meets this threshold — every file download is a loggable access event.
Proposed approach
Emit audit events as structured JSON to stdout, tagged for log routing.
Why stdout over database:
- Easily routed to a dedicated audit index via fluentd/logstash
- DB storage would require truncating tables AND removing backups for GDPR
compliance — operationally much harder than index lifecycle management - Clean separation from operational logs and from the existing
file_event_log
table (which tracks file processing state for ingestion, not download audit)
This approach needs an ADR before implementation — see below.
Event types
| Event | When |
|---|---|
download.completed |
Successful file download (200/206) |
download.denied |
Access denied (auth failure, permission denied) |
download.failed |
Technical failure (storage error, etc.) |
Note: these are NOT the file_events DB enum values. A failed download is
not a file error — it's an access denial or transport failure. The audit log
needs its own event vocabulary.
Proposed required fields
| Field | Type | Description | Regulatory basis |
|---|---|---|---|
type |
string | Always "audit" — routing tag |
MSB guidance: collect in dedicated system |
event |
string | Event type (see above) | MSBFS §16 — what happened |
timestamp |
string | ISO 8601 UTC | MSBFS §17 — when, common time source |
user_id |
string | Authenticated user identifier | MSBFS §17 — who acted |
file_id |
string | File being accessed | MSBFS §16.4 — which protected information |
correlation_id |
string | Request ID for cross-log correlation | MSBFS §17.2 — comparability between logs |
http_status |
int | Response status code | MSBFS §17.1 — enable investigation |
Proposed recommended fields
| Field | Type | Description |
|---|---|---|
dataset_id |
string | Dataset context |
endpoint |
string | API path called |
bytes_transferred |
int | Bytes sent (0 on failure) |
range |
string | Range header if partial download |
auth_type |
string | Auth method used (e.g. "visa", "jwt", "opaque") |
error_reason |
string | Structured reason for denied/failed events |
MUST NOT log
- Bearer tokens, JWT contents, or API keys
- Raw public keys or private keys
- Full HTTP headers or cookie values
- Any credential material
Only log auth method type for traceability (e.g. "visa", not the visa token).
Example output
{
"type": "audit",
"event": "download.completed",
"timestamp": "2026-02-18T14:30:00.000Z",
"user_id": "researcher@example.org",
"file_id": "urn:neic:001-002-003",
"dataset_id": "EGAD00001000001",
"correlation_id": "req-abc-123",
"endpoint": "/files/{fileId}",
"http_status": 200,
"bytes_transferred": 1048576
}Semantics
- One audit event per request — a single download produces exactly one log line
- Log both success and failure (§16.1 requires logging attempted access)
- Log at request completion so http_status and bytes_transferred are accurate
- Audit logging must not block the HTTP response path
Prerequisite: ADR
The stdout-vs-DB decision and the event schema should be documented as an
ADR before implementation. Key questions for the ADR:
- stdout vs DB vs hybrid — confirm the approach
- Event schema — which fields are required vs recommended
- Scope — download events only, or all SDA data-access events
Acceptance criteria
- ADR for audit logging architecture accepted
- Successful downloads (200/206) emit exactly one
download.completedevent - Denied downloads emit exactly one
download.deniedevent witherror_reason - Failed downloads emit exactly one
download.failedevent witherror_reason - All required fields present in every event
- Timestamps are UTC ISO 8601
- No credentials, tokens, or keys appear in audit output
- Events are valid JSON, one line per event
- Events tagged with
"type": "audit"for log routing - Tests for all of the above
What is NOT in scope (deployer responsibility)
- Log routing/forwarding (fluentd/logstash config)
- Log retention policy (index lifecycle management)
- Log protection (access control, append-only storage)
- Log analysis and alerting
- Time synchronisation (NTP)
- Operational documentation per MSBFS §17
Estimation of size
medium
Estimation of priority
high — mandatory regulatory requirement for production deployment