|
| 1 | +# sleap-app E2E Encryption for Relay Transport — Design |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Add end-to-end encryption to sleap-app's `RelayTransport` so all relay messages between sleap-app (desktop and web) and workers are encrypted, matching the dashboard's encryption from PR #77. The signaling server cannot read message payloads. |
| 6 | + |
| 7 | +## Background |
| 8 | + |
| 9 | +- PR #77 added E2E encryption for the dashboard's relay path (ECDH P-256 + AES-256-GCM) |
| 10 | +- Worker-side encryption (`_handle_key_exchange`, `_decrypt_if_encrypted`, `RelayChannel` encryption) is already deployed |
| 11 | +- Signaling server already forwards `key_exchange_response` and `encrypted_relay` message types |
| 12 | +- sleap-app has a fully working `RelayTransport` class in `src/lib/transport.ts` with REST + SSE |
| 13 | +- sleap-app supports both desktop (Tauri) and web (app.sleap.ai) — both use the same TypeScript code |
| 14 | + |
| 15 | +## Architecture |
| 16 | + |
| 17 | +**Approach:** Independent TypeScript implementation of the same crypto operations used by the dashboard. Encryption integrated directly into the existing `RelayTransport` class — transparent to the rest of the app. |
| 18 | + |
| 19 | +**New file:** `src/lib/e2e.ts` (~100 lines) — ECDH P-256 keypair gen, HKDF-SHA256 key derivation, AES-256-GCM encrypt/decrypt. Pure typed functions, no side effects. |
| 20 | + |
| 21 | +**Modified file:** `src/lib/transport.ts` — `RelayTransport` class gets key exchange in `open()`, encrypt in `send()`, decrypt in SSE handler. |
| 22 | + |
| 23 | +**No changes to:** `connectStore.ts`, `inferenceStore.ts`, `trainingStore.ts`, or any UI components. Encryption is invisible to the rest of the app. |
| 24 | + |
| 25 | +**Works on both platforms:** Web Crypto API is native in all browsers and Tauri's WebView (WebKit/Chromium). |
| 26 | + |
| 27 | +**No new dependencies.** Zero npm packages added. |
| 28 | + |
| 29 | +**Crypto parameters (must match Python worker exactly):** |
| 30 | +- Curve: P-256 (secp256r1) |
| 31 | +- KDF: HKDF-SHA256, no salt, info = `"sleap-rtc-relay-e2e-v1"` |
| 32 | +- Cipher: AES-256-GCM, 12-byte nonce |
| 33 | +- Public key format: uncompressed point (65 bytes), URL-safe base64 no padding |
| 34 | + |
| 35 | +## Key Exchange Flow |
| 36 | + |
| 37 | +Triggered automatically when `RelayTransport.open()` is called (after WebRTC 10-second timeout fallback). |
| 38 | + |
| 39 | +``` |
| 40 | +1. RelayTransport.open() called |
| 41 | +2. Opens SSE connection to worker:{peerId} channel |
| 42 | +3. Generates ephemeral P-256 keypair + sessionId (UUID) |
| 43 | +4. Sends key_exchange via POST /api/worker/message: |
| 44 | + { type: "key_exchange", session_id, public_key } |
| 45 | +5. Waits for key_exchange_response on SSE (5s timeout, 1 retry) |
| 46 | +6. Derives shared AES-256 key via ECDH + HKDF |
| 47 | +7. Sets _e2eReady = true |
| 48 | +8. All subsequent send/receive is encrypted |
| 49 | +``` |
| 50 | + |
| 51 | +**Failure handling:** 5-second timeout, retry once, then throw. `connectStore.ts` catches the error and shows "Could not establish secure connection with worker." |
| 52 | + |
| 53 | +**Encryption state:** Private fields on the `RelayTransport` instance (`_sessionId`, `_sharedKey`, `_e2eReady`). Garbage collected when transport is destroyed. No persistence. |
| 54 | + |
| 55 | +**WebRTC path unaffected:** When WebRTC succeeds, `RelayTransport` is never created, key exchange never fires. DTLS handles encryption automatically on the P2P path. |
| 56 | + |
| 57 | +## Code Changes |
| 58 | + |
| 59 | +### New: `src/lib/e2e.ts` |
| 60 | + |
| 61 | +```typescript |
| 62 | +generateKeypair(): Promise<{ privateKey: CryptoKey; publicKeyRaw: ArrayBuffer }> |
| 63 | +deriveSharedKey(privateKey: CryptoKey, peerPublicKey: ArrayBuffer): Promise<CryptoKey> |
| 64 | +encrypt(key: CryptoKey, payload: object): Promise<{ nonce: string; ciphertext: string }> |
| 65 | +decrypt(key: CryptoKey, nonce: string, ciphertext: string): Promise<object | null> |
| 66 | +publicKeyToB64(raw: ArrayBuffer): string |
| 67 | +publicKeyFromB64(b64: string): ArrayBuffer |
| 68 | +``` |
| 69 | + |
| 70 | +### Modified: `src/lib/transport.ts` — `RelayTransport` |
| 71 | + |
| 72 | +**Private fields added:** |
| 73 | +- `_sessionId: string | null` |
| 74 | +- `_sharedKey: CryptoKey | null` |
| 75 | +- `_e2eReady: boolean` |
| 76 | + |
| 77 | +**`open()` updated:** After opening SSE, calls `_initKeyExchange()`. Transport is not marked ready until key exchange completes. |
| 78 | + |
| 79 | +**`send()` updated:** When `_e2eReady`, all outbound messages are encrypted and routed through `POST /api/worker/message` instead of dedicated endpoints: |
| 80 | +- `FS_LIST_DIR` → encrypted `{type: "fs_list_req", path, req_id, offset}` |
| 81 | +- `JOB_SUBMIT` → encrypted `{type: "job_assigned", job_id, config}` with client-generated `job_id` |
| 82 | +- `JOB_CANCEL` → encrypted `{type: "job_cancel", job_id, mode: "cancel"}` |
| 83 | +- `JOB_STOP` → encrypted `{type: "job_cancel", job_id, mode: "stop"}` |
| 84 | +- `FS_GET_MOUNTS` → encrypted `{type: "fs_get_mounts"}` |
| 85 | +- `CONTROL_COMMAND` → encrypted `{type: "job_cancel", job_id, mode: "stop"}` |
| 86 | + |
| 87 | +**`_handleSSEEvent()` updated:** When receiving `encrypted_relay` events, checks `session_id`, decrypts, then processes the inner message through the existing switch statement. |
| 88 | + |
| 89 | +**`_postJobSubmit()` updated:** When E2E is active, generates `job_id` client-side (`job_${crypto.randomUUID().slice(0,8)}`), sends encrypted via `_sendWorkerMessage`, and opens job SSE channel immediately. |
| 90 | + |
| 91 | +### No changes to other files |
| 92 | + |
| 93 | +`connectStore.ts`, `inferenceStore.ts`, `trainingStore.ts`, and all UI components are untouched. They call `transport.send()` with protocol messages and receive protocol messages back — encryption is invisible. |
| 94 | + |
| 95 | +## Message Flow Comparison |
| 96 | + |
| 97 | +**What the worker's RelayChannel sends (already filtered by PR #77):** |
| 98 | + |
| 99 | +| Message type | Sent during training? | Sent during inference? | |
| 100 | +|---|---|---| |
| 101 | +| `job_status` (accepted/rejected/complete/failed) | Yes | Yes | |
| 102 | +| `job_progress` (train_begin, epoch_end, train_end) | Yes | No | |
| 103 | +| `MODEL_TYPE::` switch | Yes | No | |
| 104 | +| `TRAIN_JOB_START` / `TRAINING_JOBS_DONE` | Yes | No | |
| 105 | +| `INFERENCE_BEGIN/COMPLETE/FAILED` | Yes | Yes | |
| 106 | +| `[stderr]` lines | No | Yes | |
| 107 | +| `CR::` tqdm progress | No | Yes | |
| 108 | +| Catch-all text lines | No | Yes | |
| 109 | + |
| 110 | +All of these will be encrypted as `encrypted_relay` envelopes with the `job_id` in plaintext for SSE channel routing. |
| 111 | + |
| 112 | +## Rollout |
| 113 | + |
| 114 | +**No phased rollout needed.** Worker and signaling server already support encryption (PR #77 + webRTC-connect `amick/relay-server-encryption`). Only the sleap-app frontend changes: |
| 115 | +- **Web:** GitHub Pages redeploy → immediate effect |
| 116 | +- **Desktop:** next Tauri app release includes encryption automatically |
| 117 | + |
| 118 | +**Backward compatibility:** Workers without PR #77 will cause key exchange timeout → error message shown to user. |
| 119 | + |
| 120 | +## Testing |
| 121 | + |
| 122 | +**Unit tests (`src/lib/e2e.test.ts`):** |
| 123 | +- Keypair generation, base64 round-trip, ECDH key derivation, encrypt/decrypt round-trip, wrong key failure |
| 124 | + |
| 125 | +**Integration tests:** |
| 126 | +- Mock fetch/SSE, verify key exchange flow, verify `send()` produces encrypted envelopes, verify SSE handler decrypts |
| 127 | + |
| 128 | +**Cross-language test vector:** |
| 129 | +- Hardcoded Python-encrypted message → TypeScript decrypts (verifies parameter alignment) |
| 130 | + |
| 131 | +**E2E manual testing:** |
| 132 | +- Connect via relay, browse filesystem, submit training/inference, verify worker logs show `[E2E]`, verify signaling server shows `encrypted_relay` |
0 commit comments