Skip to content

Commit bd698ef

Browse files
alicup29claude
andcommitted
Add design doc for sleap-app E2E relay encryption
Design for extending E2E encryption to sleap-app's RelayTransport. Independent TypeScript implementation using Web Crypto API, integrated transparently into the existing transport abstraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent cf5c5cd commit bd698ef

File tree

1 file changed

+132
-0
lines changed

1 file changed

+132
-0
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# sleap-app E2E Encryption for Relay Transport — Design
2+
3+
## Goal
4+
5+
Add end-to-end encryption to sleap-app's `RelayTransport` so all relay messages between sleap-app (desktop and web) and workers are encrypted, matching the dashboard's encryption from PR #77. The signaling server cannot read message payloads.
6+
7+
## Background
8+
9+
- PR #77 added E2E encryption for the dashboard's relay path (ECDH P-256 + AES-256-GCM)
10+
- Worker-side encryption (`_handle_key_exchange`, `_decrypt_if_encrypted`, `RelayChannel` encryption) is already deployed
11+
- Signaling server already forwards `key_exchange_response` and `encrypted_relay` message types
12+
- sleap-app has a fully working `RelayTransport` class in `src/lib/transport.ts` with REST + SSE
13+
- sleap-app supports both desktop (Tauri) and web (app.sleap.ai) — both use the same TypeScript code
14+
15+
## Architecture
16+
17+
**Approach:** Independent TypeScript implementation of the same crypto operations used by the dashboard. Encryption integrated directly into the existing `RelayTransport` class — transparent to the rest of the app.
18+
19+
**New file:** `src/lib/e2e.ts` (~100 lines) — ECDH P-256 keypair gen, HKDF-SHA256 key derivation, AES-256-GCM encrypt/decrypt. Pure typed functions, no side effects.
20+
21+
**Modified file:** `src/lib/transport.ts``RelayTransport` class gets key exchange in `open()`, encrypt in `send()`, decrypt in SSE handler.
22+
23+
**No changes to:** `connectStore.ts`, `inferenceStore.ts`, `trainingStore.ts`, or any UI components. Encryption is invisible to the rest of the app.
24+
25+
**Works on both platforms:** Web Crypto API is native in all browsers and Tauri's WebView (WebKit/Chromium).
26+
27+
**No new dependencies.** Zero npm packages added.
28+
29+
**Crypto parameters (must match Python worker exactly):**
30+
- Curve: P-256 (secp256r1)
31+
- KDF: HKDF-SHA256, no salt, info = `"sleap-rtc-relay-e2e-v1"`
32+
- Cipher: AES-256-GCM, 12-byte nonce
33+
- Public key format: uncompressed point (65 bytes), URL-safe base64 no padding
34+
35+
## Key Exchange Flow
36+
37+
Triggered automatically when `RelayTransport.open()` is called (after WebRTC 10-second timeout fallback).
38+
39+
```
40+
1. RelayTransport.open() called
41+
2. Opens SSE connection to worker:{peerId} channel
42+
3. Generates ephemeral P-256 keypair + sessionId (UUID)
43+
4. Sends key_exchange via POST /api/worker/message:
44+
{ type: "key_exchange", session_id, public_key }
45+
5. Waits for key_exchange_response on SSE (5s timeout, 1 retry)
46+
6. Derives shared AES-256 key via ECDH + HKDF
47+
7. Sets _e2eReady = true
48+
8. All subsequent send/receive is encrypted
49+
```
50+
51+
**Failure handling:** 5-second timeout, retry once, then throw. `connectStore.ts` catches the error and shows "Could not establish secure connection with worker."
52+
53+
**Encryption state:** Private fields on the `RelayTransport` instance (`_sessionId`, `_sharedKey`, `_e2eReady`). Garbage collected when transport is destroyed. No persistence.
54+
55+
**WebRTC path unaffected:** When WebRTC succeeds, `RelayTransport` is never created, key exchange never fires. DTLS handles encryption automatically on the P2P path.
56+
57+
## Code Changes
58+
59+
### New: `src/lib/e2e.ts`
60+
61+
```typescript
62+
generateKeypair(): Promise<{ privateKey: CryptoKey; publicKeyRaw: ArrayBuffer }>
63+
deriveSharedKey(privateKey: CryptoKey, peerPublicKey: ArrayBuffer): Promise<CryptoKey>
64+
encrypt(key: CryptoKey, payload: object): Promise<{ nonce: string; ciphertext: string }>
65+
decrypt(key: CryptoKey, nonce: string, ciphertext: string): Promise<object | null>
66+
publicKeyToB64(raw: ArrayBuffer): string
67+
publicKeyFromB64(b64: string): ArrayBuffer
68+
```
69+
70+
### Modified: `src/lib/transport.ts``RelayTransport`
71+
72+
**Private fields added:**
73+
- `_sessionId: string | null`
74+
- `_sharedKey: CryptoKey | null`
75+
- `_e2eReady: boolean`
76+
77+
**`open()` updated:** After opening SSE, calls `_initKeyExchange()`. Transport is not marked ready until key exchange completes.
78+
79+
**`send()` updated:** When `_e2eReady`, all outbound messages are encrypted and routed through `POST /api/worker/message` instead of dedicated endpoints:
80+
- `FS_LIST_DIR` → encrypted `{type: "fs_list_req", path, req_id, offset}`
81+
- `JOB_SUBMIT` → encrypted `{type: "job_assigned", job_id, config}` with client-generated `job_id`
82+
- `JOB_CANCEL` → encrypted `{type: "job_cancel", job_id, mode: "cancel"}`
83+
- `JOB_STOP` → encrypted `{type: "job_cancel", job_id, mode: "stop"}`
84+
- `FS_GET_MOUNTS` → encrypted `{type: "fs_get_mounts"}`
85+
- `CONTROL_COMMAND` → encrypted `{type: "job_cancel", job_id, mode: "stop"}`
86+
87+
**`_handleSSEEvent()` updated:** When receiving `encrypted_relay` events, checks `session_id`, decrypts, then processes the inner message through the existing switch statement.
88+
89+
**`_postJobSubmit()` updated:** When E2E is active, generates `job_id` client-side (`job_${crypto.randomUUID().slice(0,8)}`), sends encrypted via `_sendWorkerMessage`, and opens job SSE channel immediately.
90+
91+
### No changes to other files
92+
93+
`connectStore.ts`, `inferenceStore.ts`, `trainingStore.ts`, and all UI components are untouched. They call `transport.send()` with protocol messages and receive protocol messages back — encryption is invisible.
94+
95+
## Message Flow Comparison
96+
97+
**What the worker's RelayChannel sends (already filtered by PR #77):**
98+
99+
| Message type | Sent during training? | Sent during inference? |
100+
|---|---|---|
101+
| `job_status` (accepted/rejected/complete/failed) | Yes | Yes |
102+
| `job_progress` (train_begin, epoch_end, train_end) | Yes | No |
103+
| `MODEL_TYPE::` switch | Yes | No |
104+
| `TRAIN_JOB_START` / `TRAINING_JOBS_DONE` | Yes | No |
105+
| `INFERENCE_BEGIN/COMPLETE/FAILED` | Yes | Yes |
106+
| `[stderr]` lines | No | Yes |
107+
| `CR::` tqdm progress | No | Yes |
108+
| Catch-all text lines | No | Yes |
109+
110+
All of these will be encrypted as `encrypted_relay` envelopes with the `job_id` in plaintext for SSE channel routing.
111+
112+
## Rollout
113+
114+
**No phased rollout needed.** Worker and signaling server already support encryption (PR #77 + webRTC-connect `amick/relay-server-encryption`). Only the sleap-app frontend changes:
115+
- **Web:** GitHub Pages redeploy → immediate effect
116+
- **Desktop:** next Tauri app release includes encryption automatically
117+
118+
**Backward compatibility:** Workers without PR #77 will cause key exchange timeout → error message shown to user.
119+
120+
## Testing
121+
122+
**Unit tests (`src/lib/e2e.test.ts`):**
123+
- Keypair generation, base64 round-trip, ECDH key derivation, encrypt/decrypt round-trip, wrong key failure
124+
125+
**Integration tests:**
126+
- Mock fetch/SSE, verify key exchange flow, verify `send()` produces encrypted envelopes, verify SSE handler decrypts
127+
128+
**Cross-language test vector:**
129+
- Hardcoded Python-encrypted message → TypeScript decrypts (verifies parameter alignment)
130+
131+
**E2E manual testing:**
132+
- Connect via relay, browse filesystem, submit training/inference, verify worker logs show `[E2E]`, verify signaling server shows `encrypted_relay`

0 commit comments

Comments
 (0)