Skip to content

[download] Add structured audit logging for file download events #2245

@jhagberg

Description

@jhagberg

Please describe the feature

As a deploying organisation running SDA under Swedish regulations (MSBFS 2020:7),
I need the download service to emit structured audit events so I can route them
to my log management system and satisfy mandatory logging requirements.

Regulatory basis (MSBFS 2020:7 §16): Swedish government agencies MUST log
access to information assessed as needing enhanced protection. Sensitive research
data in SDA meets this threshold — every file download is a loggable access event.

Proposed approach

Emit audit events as structured JSON to stdout, tagged for log routing.

Why stdout over database:

  • Easily routed to a dedicated audit index via fluentd/logstash
  • DB storage would require truncating tables AND removing backups for GDPR
    compliance — operationally much harder than index lifecycle management
  • Clean separation from operational logs and from the existing file_event_log
    table (which tracks file processing state for ingestion, not download audit)

This approach needs an ADR before implementation — see below.

Event types

Event When
download.completed Successful file download (200/206)
download.denied Access denied (auth failure, permission denied)
download.failed Technical failure (storage error, etc.)

Note: these are NOT the file_events DB enum values. A failed download is
not a file error — it's an access denial or transport failure. The audit log
needs its own event vocabulary.

Proposed required fields

Field Type Description Regulatory basis
type string Always "audit" — routing tag MSB guidance: collect in dedicated system
event string Event type (see above) MSBFS §16 — what happened
timestamp string ISO 8601 UTC MSBFS §17 — when, common time source
user_id string Authenticated user identifier MSBFS §17 — who acted
file_id string File being accessed MSBFS §16.4 — which protected information
correlation_id string Request ID for cross-log correlation MSBFS §17.2 — comparability between logs
http_status int Response status code MSBFS §17.1 — enable investigation

Proposed recommended fields

Field Type Description
dataset_id string Dataset context
endpoint string API path called
bytes_transferred int Bytes sent (0 on failure)
range string Range header if partial download
auth_type string Auth method used (e.g. "visa", "jwt", "opaque")
error_reason string Structured reason for denied/failed events

MUST NOT log

  • Bearer tokens, JWT contents, or API keys
  • Raw public keys or private keys
  • Full HTTP headers or cookie values
  • Any credential material

Only log auth method type for traceability (e.g. "visa", not the visa token).

Example output

{
  "type": "audit",
  "event": "download.completed",
  "timestamp": "2026-02-18T14:30:00.000Z",
  "user_id": "researcher@example.org",
  "file_id": "urn:neic:001-002-003",
  "dataset_id": "EGAD00001000001",
  "correlation_id": "req-abc-123",
  "endpoint": "/files/{fileId}",
  "http_status": 200,
  "bytes_transferred": 1048576
}

Semantics

  • One audit event per request — a single download produces exactly one log line
  • Log both success and failure (§16.1 requires logging attempted access)
  • Log at request completion so http_status and bytes_transferred are accurate
  • Audit logging must not block the HTTP response path

Prerequisite: ADR

The stdout-vs-DB decision and the event schema should be documented as an
ADR before implementation. Key questions for the ADR:

  1. stdout vs DB vs hybrid — confirm the approach
  2. Event schema — which fields are required vs recommended
  3. Scope — download events only, or all SDA data-access events

Acceptance criteria

  • ADR for audit logging architecture accepted
  • Successful downloads (200/206) emit exactly one download.completed event
  • Denied downloads emit exactly one download.denied event with error_reason
  • Failed downloads emit exactly one download.failed event with error_reason
  • All required fields present in every event
  • Timestamps are UTC ISO 8601
  • No credentials, tokens, or keys appear in audit output
  • Events are valid JSON, one line per event
  • Events tagged with "type": "audit" for log routing
  • Tests for all of the above

What is NOT in scope (deployer responsibility)

  • Log routing/forwarding (fluentd/logstash config)
  • Log retention policy (index lifecycle management)
  • Log protection (access control, append-only storage)
  • Log analysis and alerting
  • Time synchronisation (NTP)
  • Operational documentation per MSBFS §17

Estimation of size

medium

Estimation of priority

high — mandatory regulatory requirement for production deployment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions