feat: store hashed tokens instead of plaintext#8217
feat: store hashed tokens instead of plaintext#8217rubenfiszel wants to merge 28 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment was marked as outdated.
This comment was marked as outdated.
Deploying windmill with
|
| Latest commit: |
978ee56
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://2240a199.windmill.pages.dev |
| Branch Preview URL: | https://store-hash.windmill.pages.dev |
| CREATE UNIQUE INDEX token_hash_unique ON token (token_hash); | ||
|
|
||
| -- Index on prefix for deletion/listing | ||
| CREATE INDEX idx_token_prefix ON token (token_prefix); |
There was a problem hiding this comment.
Non-unique prefix risk: token_prefix is only 10 characters. When creating via rd_string(32), the charset is alphanumeric (62 chars), so the probability of a collision among 10-char prefixes is non-trivial at scale. This is important because the DELETE FROM token WHERE token_prefix = $2 in delete_token could delete multiple tokens with the same prefix.
The delete_token endpoint does scope deletion to email = $1 AND token_prefix = $2, which helps significantly, but there's still a theoretical risk of colliding prefixes for the same user. Consider whether this is acceptable or whether you'd want a UNIQUE(email, token_prefix) constraint. Not blocking, just something to think about at scale.
| -- Make token NOT NULL again | ||
| ALTER TABLE token ALTER COLUMN token SET NOT NULL; |
There was a problem hiding this comment.
Potential down-migration failure: When rolling back step 2 (token_hash_pk_swap.down.sql), this restores token NOT NULL, but new tokens created after the forward migration will have token = NULL (since the code no longer populates it). This means the rollback will fail with ERROR: column "token" of relation "token" contains null values if any new tokens were created after migration.
You'd need to backfill or drop the new null rows before restoring NOT NULL:
-- Delete tokens that were created after the forward migration (no plaintext token)
DELETE FROM token WHERE token IS NULL;
-- Then restore NOT NULL
ALTER TABLE token ALTER COLUMN token SET NOT NULL;| INSERT INTO token(token, email, label, super_admin) VALUES ('SECRET_TOKEN', 'test@windmill.dev', 'test token', true) | ||
| -- Tokens for all users (token_hash = sha256 hex, token_prefix = first 10 chars) | ||
| INSERT INTO token(token_hash, token_prefix, token, email, label, super_admin) | ||
| VALUES (encode(sha256('SECRET_TOKEN'::bytea), 'hex'), 'SECRET_TOK', 'SECRET_TOKEN', 'test@windmill.dev', 'test token', true) |
There was a problem hiding this comment.
Note: Test fixtures include both token_hash, token_prefix, AND the plaintext token column. This is correct for the transitional period (the column still exists, just nullable), but be aware that if the token column is eventually dropped, these fixtures will need updating. Consider adding a comment here for future reference.
- Update all base.sql fixtures to include token_hash/token_prefix columns - Keep plaintext token for webhook tokens (needed for URL reconstruction) - Restore get_token_by_prefix to query DB for webhook tokens - Fix down migration to delete NULL-token rows before restoring NOT NULL - Update parser fixture standalone schema - Update EE dedicated_worker_ee.rs to use token_hash/token_prefix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write to token column alongside token_hash until MIN_VERSION_SUPPORTS_TOKEN_HASH (1.649.0) is reached. This ensures older workers can still authenticate during rolling upgrades. Remove the separate UPDATE in new_webhook_token since create_token_internal now writes plaintext directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check MIN_VERSION_SUPPORTS_TOKEN_HASH at runtime: write plaintext to token column while old workers exist, switch to NULL once all workers are >= 1.649.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-fast reviewi this PR |
This comment was marked as outdated.
This comment was marked as outdated.
Resolve conflicts in monitor.rs and native-triggers/lib.rs, keeping the token_prefix-based queries from the store-hash branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…OT NULL Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-fast review this PR |
|
Claude finished @hugocasa's task in 1m 55s —— View job PR Review: Store hashed tokens instead of plaintext
Overall this is a well-designed security improvement. The two-phase migration approach, the Bugs / Issues1. Potential panic on short tokens ( There are 7 instances of
The superadmin path at
Similarly, 2. Google watch renewal fails when token is orphaned (
Migration Concerns3. Native trigger migration deletes orphaned triggers ( DELETE FROM native_trigger WHERE webhook_token_hash IS NULL;This silently deletes any
The deletion is the right thing to do for data integrity, but consider logging or counting how many rows are deleted, or adding a 4. Native trigger migration backfill uses prefix JOIN ( UPDATE native_trigger nt SET webhook_token_hash = t.token_hash
FROM token t WHERE t.token_prefix = nt.webhook_token_prefix;If two tokens share the same 10-char prefix (unlikely but possible), this JOIN could match the wrong token for a trigger. Since the prefix space is 62^10, the probability is negligible in practice, but it's worth noting. A Design Observations (Non-blocking)5. The 6. The rotation creates a new token copying 7. INSERT INTO token_expiry_notification (token, expiration) VALUES ($1, $2)The column is still named SummaryThe core security goal (removing plaintext tokens from the DB) is well-implemented. The main actionable item is the short token panic (item 1) which could cause a runtime crash on malformed input. Everything else is either already acknowledged in the PR's TODO section or is a low-probability edge case. |
|
/ai-fast |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Replace all `token[0..TOKEN_PREFIX_LEN]` slicing with `token.get(..TOKEN_PREFIX_LEN).unwrap_or(token)` to prevent panics when a token shorter than 10 chars is provided (e.g. malformed Authorization header, short superadmin secret). Co-authored-by: hugocasa <hugocasa@users.noreply.github.com>
Replace all `token[0..TOKEN_PREFIX_LEN]` with safe `token.get(..TOKEN_PREFIX_LEN).unwrap_or(token)` to prevent panics on malformed tokens shorter than 10 characters. Co-authored-by: hugocasa <hugocasa@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 37ec2e5.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-fast review this PR |
|
Claude finished @hugocasa's task in 1m 39s —— View job PR Review: Store hashed tokens instead of plaintext
Overall this is a well-designed security improvement. The two-phase migration, auth cache keyed by raw token, and Bugs1. Panic on short tokens — still present ( The previous review flagged this and it hasn't been fixed. There are 7 instances of
Additionally The fix is straightforward — use 2. Google channel renewal hard-fails when token is missing (
None => {
return Err(Error::InternalErr(format!(
"Cannot renew channel {}: webhook token no longer exists and no user context to create a fresh one",
trigger.external_id
)));
}This is acknowledged in the PR description's TODO section, but it's worth noting that this will break Google watch renewals once Migration Concerns3. Native trigger backfill uses prefix JOIN — potential mismatch ( UPDATE native_trigger nt SET webhook_token_hash = t.token_hash
FROM token t WHERE t.token_prefix = nt.webhook_token_prefix;If two tokens share the same 10-char prefix, this JOIN could match the wrong token. Probability is negligible (62^10 space), but adding 4. Orphaned trigger deletion is silent ( DELETE FROM native_trigger WHERE webhook_token_hash IS NULL;This deletes any trigger whose token no longer exists. Consider logging the count or adding a Design Notes (Non-blocking)5.
6. The column is still named 7. The new token omits the 8. Down migration handles NULL tokens correctly The previous review flagged this and it's been addressed — SummaryThe core design is sound. The main actionable item is the short token panic (item 1) — a malformed |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ner/expiration in rotate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…fication column Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-fast review this PR |
|
Claude finished @hugocasa's task in 1m 48s —— View job PR Review: Store hashed tokens instead of plaintext
This is a well-designed security improvement. The two-phase migration, auth cache keyed by raw token, Bugs1. Panic on short tokens — still present (7 sites in This was flagged in both previous reviews and remains unfixed.
While The fix is straightforward — use Previously Flagged Issues — Now Addressed2. Native trigger orphan handling (migration) — Fixed The previous reviews flagged that 3. Down migration NULL token handling — Fixed
4. Migration 5.
Design Observations (Non-blocking)6. UPDATE token_expiry_notification
SET token = encode(sha256(token::bytea), 'hex');This assumes all existing values in However, looking at the code path: 7. Google channel renewal still hard-fails on missing token ( As acknowledged in the PR description's TODO, 8. Native trigger migration backfill prefix collision risk (very low) The prefix JOIN at SummaryThe core security design is sound and most previous review issues have been addressed. The only remaining actionable bug is the short token panic (item 1) — a malformed |
Manual Testing ChecklistToken basics
Backward compatibility
Native triggers (requires Google or Nextcloud integration)
Migration
Edge cases
|
Summary
token_hash) in thetokentable instead of plaintext, so a DB compromise doesn't expose usable credentialstoken_prefix(first 10 chars) column for listing/deletion without needing plaintexttokencolumn is still written alongside the hash untilMIN_VERSION_SUPPORTS_TOKEN_HASH(1.650.0) is reached by all workers, thenNULLis written insteadTODO for @hugocasademont
Remove
get_token_by_prefix(native triggers)get_token_by_prefixretrieves the plaintext token from the DB so it can be embedded in webhook callback URLs. It's used in two places:handler.rs:237— trigger update: already handlesNonegracefully (deletes old token, creates new one)google/external.rs:327—renew_channel: hard-fails onNone("Webhook token not found")Once
MIN_VERSION_SUPPORTS_TOKEN_HASHis met and plaintext stops being written, all new webhook tokens will havetoken = NULL, breaking Google trigger watch renewal.Recommended fix — always delete + create a new token on renewal:
create_webhook_token_for_trigger(db, trigger)that inserts a token directly using the trigger'semail/owner/workspace_id/script_path(noApiAuthedneeded)renew_channelto: delete old token by prefix → create new token → use it in the webhook URL → updatewebhook_token_prefixon the triggerget_token_by_prefixcan be removed entirely — every renewal just creates a fresh tokenThis matches the pattern
handler.rsalready uses on trigger update.Test plan
cargo checkpasses (CE + EE + native_trigger features)token_hash+token_prefixpopulated,tokencolumn has plaintext (pre-1.650.0) or NULL (post-1.650.0)🤖 Generated with Claude Code
Companion EE PR: https://github.com/windmill-labs/windmill-ee-private/pull/437