Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 24 additions & 8 deletions CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,18 +88,19 @@ Think of this as the secrets layer. Encrypts and decrypts credential blobs trans

### `audit/` — Structured event recording

Think of this as the append-only ledger. Records who did what and when.
Think of this as the audit instrumentation layer. Defines what happened; `server/` decides where it goes.

**Owns:**
- `AuditEvent` model
- `log()` / `alog()` — append to a structured JSON-lines log file
- `setup()` / `clear()` — log file lifecycle (called by server at startup/shutdown)
- `AuditEvent` domain model — mandatory fields: `identity`, `principal_id`, `provider`, `connection`; optional: `method`, `path`, `status`, `metadata`
- `log()` / `alog()` — emit an `AuditEvent` as an OTel `LogRecord` via `get_logger_provider()`
- Translation from `AuditEvent` → OTel `LogRecord`

**Does not own:**
- Business logic
- Any storage beyond the append-only log file
- Storage — no file I/O, no database
- Provider lifecycle (`setup()` / `clear()` removed — owned by `server/`)
- Knowledge of where events are routed

**Imports nothing from this codebase.** Imported by: `auth/`, `server/`
**Imports:** `opentelemetry-api` only (no SDK, no storage). **Imports nothing from this codebase.** Imported by: `server/`, `proxy/`

---

Expand All @@ -120,6 +121,9 @@ Think of this as the daemon process. Wires identity + auth + vault + audit toget
- `server/app.py` — FastAPI application factory and lifespan
- `server/routes/` — HTTP API surface
- `server/schemas.py` — API response schemas
- `server/audit_store.py` — `SQLiteLogExporter` (OTel `LogExporter` impl) + `AuditStore` query interface; `LoggerProvider` lifecycle (setup at startup, shutdown at teardown)
- `server/routes/audit.py` — `GET /audit/events` (filtered, paginated admin read)
- `POST /audit/events` — ingest endpoint for proxy-side external AuditEvents; server enriches `principal_id` from PoP JWT

**All filesystem interaction for server-owned state lives here.** No other module writes to server-owned paths.

Expand All @@ -141,6 +145,7 @@ A mitmproxy-based HTTPS proxy. Intercepts outgoing agent requests and injects au
- Credential loading (asks the server)
- Route catalog construction (asks the server)
- Provider definitions
- Audit storage — ships External AuditEvents to server via `POST /audit/events` (fire-and-forget); does not call `audit.log()` directly

**Imported by:** `cli/`

Expand Down Expand Up @@ -178,12 +183,15 @@ Click-based CLI and HTTP client. Everything here is a client to the server HTTP

**PoP JWT**: Short-lived (60 s) Proof-of-Possession token signed with the Identity's Ed25519 private key. Bound to `htm`, `htu`, `body_sha256`. Sent as `Authorization: PoP <token>`.

**Principal**: Non-cryptographic logical partition (human or team) that owns Vaults. Identified by an opaque **PrincipalId** (e.g., `principal_abc123def456`). Has no cryptographic key.
**Principal**: Non-cryptographic logical partition (human or team) that owns Vaults. Identified by an opaque **PrincipalId** (e.g., `principal_abc123def456`). Has no cryptographic key. Carries exactly one **PrincipalRole**.
_Avoid_: User, account, PrincipalHandle, profile

**PrincipalId**: Opaque stable identifier for a Principal. Never the email or handle — those can change; the PrincipalId cannot.
_Avoid_: principal_handle, principal_name, username

**PrincipalRole**: Authorization tier for a Principal. Either `admin` or `user`. The first Principal created on a server is always `admin`; all subsequent Principals are `user`. Stored as a column on the Principal record — not in environment variables or a separate table.
_Avoid_: permission level, access level, user type

**Vault**: Named credential store owned by exactly one Principal. Identified by an opaque **VaultId** (e.g., `vault_a1b2c3d4e5f6`). All credential store keys are prefixed `vault:<vault_id>:...`.
_Avoid_: credential store, token store, secret store, profile store

Expand Down Expand Up @@ -240,6 +248,14 @@ AuthService does not query registries, does not know about server filesystem pat

Every `AuditEvent` carries `identity` (the agent Handle) and `principal_id` (the PrincipalId). Both are required — every auditable action has an acting agent and an owning principal.

**External AuditEvent**: An event produced by the proxy layer — records an outbound HTTP call an agent made through the proxy to a third-party API (e.g., a call to `api.github.com`). Classified by provider and connection. Mandatory fields: identity, principal_id, provider, connection. Optional fields: HTTP method, path, response status.
_Avoid_: proxy event, API event, outbound event

**Internal AuditEvent**: An event produced by the server layer — records credential lifecycle operations (login, logout, token refresh, revocation) and auth flow steps.
_Avoid_: server event, auth event, lifecycle event

**Audit delivery**: External AuditEvents are shipped from the proxy to the server via `POST /audit/events` (fire-and-forget, best-effort). The proxy does not write to a local audit file. The server is the single source of truth for all audit events. `principal_id` is resolved server-side from the PoP JWT on the ingest request — the proxy does not need to supply it.

---

## Flagged Ambiguities
Expand Down
22 changes: 22 additions & 0 deletions docs/adr/0005-audit-otel-sqlite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Audit events use OTel Logs API with a SQLite exporter owned by the server

The proxy runs on the client machine and the server runs remotely, so writing audit events to a local file produces two disjoint logs that an IT admin cannot view in one place. We need a single server-owned audit store that both the proxy and the server write into.

**Decision:** `audit/` is a pure leaf that imports only `opentelemetry-api`. It defines `AuditEvent`, translates it to an OTel `LogRecord`, and emits via the globally registered `LoggerProvider` — with no knowledge of where events go. `server/` owns a custom `SQLiteLogExporter` (implementing the OTel `LogExporter` interface), registers a `LoggerProvider` with a `BatchLogRecordProcessor` at daemon startup, and exposes `GET /audit/events` for admin queries. The proxy ships External AuditEvents to the server fire-and-forget via `POST /audit/events` rather than writing to a local file; the server enriches each inbound event with `principal_id` resolved from the PoP JWT.

**Considered alternatives:**

- *Flat JSON-lines file per process* — the current approach. Rejected because it produces two disjoint audit logs in the client/server topology, with no queryable interface for the admin view.
- *Proxy hosted on server* — rejected because it routes all agent traffic through the server machine, adding a network round-trip to every API call and making the server a traffic bottleneck.
- *Pure OTLP to an external collector* — rejected as the primary store because it requires operator-provisioned infrastructure. OTLP remains a valid future second exporter on the same `LoggerProvider` for teams that already run Grafana or Datadog.
- *SQLite owned by `audit/`* — rejected to preserve `audit/` as a dependency-free leaf. Storage decisions belong to `server/`, consistent with how all other server-owned state is managed.

**Not considered:** replacing loguru with OTel for operational logging. Loguru (68 call sites) serves developers debugging live systems — free-form, level-filtered, short-retention. Audit serves IT admins answering compliance questions — structured, required fields, long-retention, queryable. Routing loguru through the SQLite exporter would fill the admin view with operational noise. They are different things with different audiences and must stay separate.

**Consequences:**

- `audit/` gains `opentelemetry-api` as a dependency; `server/` gains `opentelemetry-sdk` and `aiosqlite` (or `sqlite3`).
- `audit.setup()` / `audit.clear()` are removed; `server/app.py` lifespan manages the `LoggerProvider`.
- The proxy's two existing direct `audit.log()` calls (`proxy_no_credentials`, `proxy_deny`) move to fire-and-forget HTTP posts to the server.
- A second OTLP exporter can be added to the `LoggerProvider` at any time without touching `audit/` or `proxy/`.
- All audit events — Internal (server) and External (proxy) — are queryable from a single `GET /audit/events` endpoint.
18 changes: 18 additions & 0 deletions docs/adr/0006-principal-roles-admin-user.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Principal roles: admin / user, first-created principal is admin

Principals need an authorization tier to gate deployment-level operations (audit log access, provider registration/deletion, cross-vault credential revocation) from per-principal operations (own connections, claim accept/reject). We store a `role` column (`admin` | `user`) directly on the `principals` table. The first Principal created on a server is assigned `admin`; all subsequent Principals receive `user`. Role assignment is immutable at creation time (mutation is deferred to a future milestone).

## Considered options

**Environment variable (`AUTHSOME_ADMIN_PRINCIPALS`)** — the prior approach. Rejected because it requires knowing the PrincipalId before the server starts, cannot be changed without a restart, and is invisible to the UI and route layer.

**Separate `principal_roles` table** — considered for future extensibility (multiple roles per principal). Rejected as premature: the role model is binary and a join adds complexity without benefit today.

**Default admin account created at server init** — would require deciding on credentials before any real user exists. Rejected in favour of first-user-becomes-admin, which is zero-config and correct for both local and hosted deployments.

## Consequences

- `AUTHSOME_ADMIN_PRINCIPALS` env var and `is_admin_principal()` are removed entirely.
- Admin enforcement at the route level uses a `get_admin_auth_service` FastAPI dependency (parallel to `get_protected_auth_service`) that raises `HTTP 403` for non-admin principals.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate get_admin_auth_service fastapi dependency? The guardrails are only in two places - one while mutating the provider client credentials and the other when displaying the provider UI (which is not an auth service property). Maybe we can just check for principal role in these two places

- Admin-only routes: `GET /audit/events`, `POST /providers`, `DELETE /providers/{provider}`, `POST /connections/{provider}/revoke`.
- Schema migration: `ALTER TABLE principals ADD COLUMN role TEXT NOT NULL DEFAULT 'user'`, followed by setting the earliest-created principal to `'admin'`.
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ dependencies = [
"base58>=2.1.1",
"posthog>=3.0",
"browser-cookie3>=0.19",
"opentelemetry-api>=1.42.1",
"opentelemetry-sdk>=1.42.1",
]

[project.optional-dependencies]
Expand Down
111 changes: 50 additions & 61 deletions src/authsome/audit/__init__.py
Original file line number Diff line number Diff line change
@@ -1,85 +1,74 @@
"""Structured server-side event logging helpers."""
"""Structured audit event emission.

The audit package is intentionally storage-free. It turns Authsome audit
events into OpenTelemetry log records and leaves export decisions to the
server composition root.
"""

from __future__ import annotations

import json
import threading
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any
from typing import Any, Literal

from opentelemetry._logs import SeverityNumber, get_logger
from pydantic import BaseModel, Field

from authsome.utils import utc_now

AuditSource = Literal["internal", "external"]


class AuditEvent(BaseModel):
"""Structured server-side event record."""
"""Structured audit event record emitted through OpenTelemetry logs."""

event_id: str = Field(default_factory=lambda: f"audit_{uuid.uuid4().hex}")
timestamp: datetime = Field(default_factory=utc_now)
event: str
source: AuditSource = "internal"
provider: str | None = None
connection: str | None = None
identity: str | None = None
principal_id: str | None = None
status: str | None = None
metadata: dict[str, Any] = Field(default_factory=dict)


_log_path: Path | None = None
_lock = threading.Lock()


def _build_event(event_type: str, **kwargs: Any) -> AuditEvent:
filtered_kwargs = {k: v for k, v in kwargs.items() if v is not None}
return AuditEvent(
event=event_type,
provider=filtered_kwargs.pop("provider", None),
connection=filtered_kwargs.pop("connection", None),
identity=filtered_kwargs.pop("identity", None),
status=filtered_kwargs.pop("status", None),
metadata=filtered_kwargs,
def emit(event: AuditEvent) -> AuditEvent:
"""Emit an audit event through the globally registered OTel logger."""
logger = get_logger("authsome.audit")
logger.emit(
timestamp=int(event.timestamp.timestamp() * 1_000_000_000),
severity_number=SeverityNumber.INFO,
severity_text="INFO",
body=event.event,
attributes=event.model_dump(mode="json"),
event_name=event.event,
)
return event


def emit_event(
event: str,
*,
source: AuditSource = "internal",
identity: str | None = None,
principal_id: str | None = None,
provider: str | None = None,
connection: str | None = None,
status: str | None = None,
**metadata: Any,
) -> AuditEvent:
"""Build and emit a structured audit event."""
return emit(
AuditEvent(
event=event,
source=source,
identity=identity,
principal_id=principal_id,
provider=provider,
connection=connection,
status=status,
metadata={key: value for key, value in metadata.items() if value is not None},
)
)


def setup(path: Path) -> None:
"""Configure the server-side structured log path."""
global _log_path
path.parent.mkdir(parents=True, exist_ok=True)
if not path.exists():
path.touch()
_log_path = path


def clear() -> None:
"""Clear configured server-side log state."""
global _log_path
_log_path = None


def _serialize_event(event: AuditEvent) -> str:
payload = event.model_dump(mode="json")
metadata = payload.pop("metadata", {})
if isinstance(metadata, dict):
payload.update(metadata)
return json.dumps(payload, separators=(",", ":"))


# TODO: Better to use an audit library: otel or something similar
def log(event_type: str, **kwargs: Any) -> None:
"""Append a structured server event to the configured log file."""
if _log_path is None:
return
line = _serialize_event(_build_event(event_type, **kwargs))
with _lock:
_log_path.parent.mkdir(parents=True, exist_ok=True)
with _log_path.open("a", encoding="utf-8") as handle:
handle.write(line)
handle.write("\n")


# FIXME: Why is there a log and alog ?
async def alog(event_type: str, **kwargs: Any) -> None:
"""Async wrapper around structured server event logging."""
log(event_type, **kwargs)
6 changes: 6 additions & 0 deletions src/authsome/cli/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,12 @@ async def get_provider(self, provider: str) -> dict[str, Any]:
async def register_provider(self, definition_dict: dict[str, Any], force: bool = False) -> None:
await self._post("/providers", {"definition": definition_dict, "force": force})

async def list_audit_events(self, *, limit: int = 50) -> dict[str, Any]:
return await self._get(f"/audit/events?limit={limit}")

async def record_audit_event(self, event: dict[str, Any]) -> None:
await self._post("/audit/events", {"event": event})

async def register_identity(self, handle: str, did: str) -> dict[str, Any]:
return await self._post("/identities/register", {"handle": handle, "did": did}, protected=False)

Expand Down
22 changes: 4 additions & 18 deletions src/authsome/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
auth_command,
setup_logging,
)
from authsome.paths import get_client_log_path, get_server_log_path
from authsome.paths import get_client_log_path
from authsome.utils import connection_is_active, format_error_code, redact


Expand Down Expand Up @@ -645,23 +645,9 @@ async def log_cmd(ctx_obj: ContextObj, lines: int, raw: bool) -> None:
ctx_obj.print_json({"log_file": str(log_path), "entries": raw_lines})
return

audit_path = get_server_log_path(home)
try:
raw_lines = audit_path.read_text(encoding="utf-8", errors="replace").splitlines()[-lines:]
except FileNotFoundError:
raw_lines = []

parsed: list[dict] = []
for line in raw_lines:
line = line.strip()
if not line:
continue
try:
parsed.append(json.loads(line))
except Exception:
parsed.append({"raw": line})

ctx_obj.print_json({"log_file": str(audit_path), "entries": parsed})
actx = await ctx_obj.initialize()
events = await actx.runtime_client.list_audit_events(limit=lines)
ctx_obj.print_json(events)


@cli.group(name="daemon")
Expand Down
8 changes: 8 additions & 0 deletions src/authsome/identity/principal.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,19 @@ class ClaimStatus(StrEnum):
REJECTED = "rejected"


class PrincipalRole(StrEnum):
"""Authorization tier assigned to a Principal."""

ADMIN = "admin"
USER = "user"


class PrincipalRecord(BaseModel):
"""Principal account record."""

principal_id: str
email: str
role: PrincipalRole = PrincipalRole.USER
password_hash: str | None = None
created_at: datetime = Field(default_factory=utc_now)
updated_at: datetime = Field(default_factory=utc_now)
Expand Down
5 changes: 5 additions & 0 deletions src/authsome/paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,8 @@ def get_client_log_path(home: Path | None = None) -> Path:
def get_server_log_path(home: Path | None = None) -> Path:
"""Return the default server log file path."""
return get_server_home(home) / "logs" / "authsome.log"


def get_server_audit_db_path(home: Path | None = None) -> Path:
"""Return the server-owned audit event database path."""
return get_server_home(home) / "audit" / "events.sqlite3"
2 changes: 2 additions & 0 deletions src/authsome/proxy/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ async def proxy_routes(self, scope: str = "connected") -> Any: ...

async def list_providers_by_source(self) -> Any: ...

async def record_audit_event(self, event: dict[str, Any]) -> Any: ...


class ProxyRunner:
"""Launch a subprocess behind the Authsome local auth proxy."""
Expand Down
Loading
Loading