Skip to content

http: graceful sticky drain + access-log session_id/session_action#11

Merged
rustyconover merged 1 commit into
mainfrom
sticky-sessions-drain
May 18, 2026
Merged

http: graceful sticky drain + access-log session_id/session_action#11
rustyconover merged 1 commit into
mainfrom
sticky-sessions-drain

Conversation

@rustyconover
Copy link
Copy Markdown
Collaborator

Summary

  • PR 3 of 3 in the sticky-sessions series. Stacked on http: sticky echo headers + Fly.io quickstart helpers #10 (PR2 echo headers); base is `sticky-sessions-echo-headers` so it'll cleanly rebase down to `main` once http: opt-in sticky sessions (HTTP-only, header transport, DELETE endpoint) #9 and http: sticky echo headers + Fly.io quickstart helpers #10 merge.
  • New operator-facing API `vgi_rpc.http.drain_handle(app) -> DrainHandle | None` for wiring graceful sticky-session shutdown into any WSGI launcher.
  • `serve_http(enable_sticky=True)` now auto-installs SIGTERM/SIGINT handlers that drain → wait `drain_grace_seconds` (default 30) → close all live sessions → exit. Double-signal skips grace and force-exits.
  • Access-log records carry `session_id` (24-char hex) + `session_action` (none/open/resume/close) on sticky-touching records; absent on non-sticky servers.
  • Canonical conformance gains `TestSticky::test_drain_rejects_new_opens` (capability-gated on the conformance server exposing `POST/DELETE /test_drain`).

Operator API

`drain_handle(app)`

```python
from vgi_rpc.http import drain_handle, make_wsgi_app

app = make_wsgi_app(server, enable_sticky=True)
handle = drain_handle(app) # → DrainHandle | None
if handle is not None:
handle.drain() # subsequent ctx.open_session raises ServerDrainingError
# ... wait for in-flight calls ...
handle.shutdown() # state.close() on every live session, registry cleared
```

Returns `None` for non-sticky apps so operator code can branch with `if (h := drain_handle(app)) is not None: ...`.

Pre-fork servers

```python

gunicorn config (gunicorn.conf.py)

import time
from vgi_rpc.http import drain_handle

def worker_exit(server, worker):
if (h := drain_handle(worker.app.callable)) is not None:
h.drain()
time.sleep(30) # operator-tuned grace period
h.shutdown()
```

`serve_http` built-in SIGTERM handling

```python
from vgi_rpc.http import serve_http

serve_http(
server,
enable_sticky=True,
drain_grace_seconds=30.0, # default
# install_signal_handlers=False to opt out (rare)
)
```

First SIGTERM/SIGINT: flip drain, schedule shutdown after grace, log it. Second signal during grace: skip grace, exit immediately.

Access-log fields

Field Type When present
`session_id` string (24-char hex) When `session_action` ∈ `{open, resume, close}` — i.e. the request actually touched a session. Stable across the lifecycle for a given session id.
`session_action` enum `none` (sticky middleware ran but no session interaction) / `open` / `resume` / `close`. Absent on non-sticky servers.

Both fields documented in `docs/access-log-spec.md` §4.7 and in the JSON schema. Known gap: middleware-short-circuit cases (token validation failed) currently don't produce access-log records — the typed error on the wire is the operator-facing signal.

Test plan

  • `uv run ruff format .` / `ruff check` / `mypy` / `ty` → all clean
  • `uv run pytest --timeout=50` → 3324 passed, 0 failed in ~82s
  • All 11 canonical `TestSticky` tests pass (was 10 in PR2; +1 drain)
  • All 33 Python-only sticky tests pass (was 26; +7 for drain + access-log)
  • Stateless conformance + non-sticky wire path still byte-identical

What's NOT in this PR

  • Middleware-short-circuit access-log records. Token-validation failures (lost / expired / wrong-server) currently bypass the dispatch path where the access log lives. Documented as a follow-up; operators monitoring for misroutes can rely on the typed `SessionLostError` on the wire for now.
  • Cookie emission for AWS ALB / CloudFront. Header-only multiplexes cleanly; cookie emission can be added later if a real user asks.

🤖 Generated with Claude Code

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@rustyconover rustyconover force-pushed the sticky-sessions-echo-headers branch from ec11d3f to 726173c Compare May 18, 2026 03:09
Operator-facing ``vgi_rpc.http.drain_handle(app)`` returns a
``DrainHandle(drain, shutdown, is_draining)`` for sticky-enabled apps;
``serve_http(enable_sticky=True)`` auto-installs SIGTERM/SIGINT handlers
that flip drain → wait ``drain_grace_seconds`` (default 30) → invoke
``state.close()`` on every live session → exit. Pre-fork servers
(gunicorn) wire equivalent shutdown hooks against ``drain_handle(app)``
in ``worker_exit``. Access log gains ``session_id`` (24-char hex) and
``session_action`` (none/open/resume/close) on sticky-touching records;
absent on non-sticky servers. Canonical conformance gains
``TestSticky::test_drain_rejects_new_opens``, capability-gated on the
conformance server exposing ``POST/DELETE /__test_drain__``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rustyconover rustyconover force-pushed the sticky-sessions-drain branch from e06b255 to c5af449 Compare May 18, 2026 03:12
@rustyconover rustyconover changed the base branch from sticky-sessions-echo-headers to main May 18, 2026 03:12
@rustyconover rustyconover merged commit 4bda30f into main May 18, 2026
1 check was pending
rustyconover added a commit that referenced this pull request May 18, 2026
API Reference entries for surfaces introduced in #9/#10/#11:

* http.md gains ``serve_http`` (was missing), ``DrainHandle``,
  ``drain_handle``, and the ``vgi_rpc.http.fly`` quickstart helpers.
* core.md "Errors" gains a "Typed marker errors" subsection covering
  ``MethodNotImplementedError`` (pre-existing miss), ``SessionLostError``,
  and ``ServerDrainingError`` — the three classes carrying
  ``error_kind`` for wire-side pattern matching.

mkdocs --strict builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rustyconover added a commit that referenced this pull request May 18, 2026
Feature release. Adds opt-in sticky sessions for the HTTP transport
(PRs #9, #10, #11). Non-sticky wire path byte-identical to 0.16.1;
existing callers see no behaviour change.

* ctx.open_session / ctx.close_session runtime API on CallContext
* with conn.with_session_token() as sess: client view
* VGI-Session header transport with AEAD-sealed token, principal-bound AAD
* DELETE /vgi/__session__ idempotent teardown endpoint
* sticky_echo_headers — server-issued headers the client replays
  (Fly.io quickstart helpers in vgi_rpc.http.fly)
* drain_handle(app) operator API + serve_http SIGTERM graceful drain
* Access-log session_id + session_action fields
* Typed errors SessionLostError + ServerDrainingError
* Canonical TestSticky cross-language conformance group (11 tests,
  capability-gated)
* Full spec at docs/sticky-sessions-spec.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants