feat(temporal): add payload encryption codec with fail-open semantics#2297
feat(temporal): add payload encryption codec with fail-open semantics#2297daryllimyt wants to merge 15 commits intomainfrom
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
c5e5d08 to
35c2788
Compare
2e221d2 to
b356b85
Compare
35c2788 to
2ad828b
Compare
b356b85 to
d1915a5
Compare
d1915a5 to
b367063
Compare
This comment has been minimized.
This comment has been minimized.
…ass-codec-server # Conflicts: # docker-compose.dev.yml # docker-compose.local.yml # docker-compose.yml # tracecat/agent/session/service.py
|
✅ No security or compliance issues detected. Reviewed everything up to 9806b1f. Security Overview
Detected Code Changes
|
Move AWS Secrets Manager call to asyncio.to_thread to avoid blocking the event loop. Add double-checked locking to get_key for safe concurrent cache access across await boundaries. Fix pre-existing type error in alembic/env.py.
Update the codec authentication function to use FastAPI's dependency injection for improved handling of the authorization header. This change simplifies the decode endpoint by removing the direct authorization parameter and integrating the authentication check as a dependency.
…role Simplify Temporal payload encryption scope resolution by deriving the workspace scope directly from ctx_role.workspace_id instead of maintaining a separate ContextVar. Roles without a workspace_id (e.g. registry sync) now fall back to the global encryption scope automatically.
…r tests Cover edge cases for EncryptionPayloadCodec (binary/null passthrough, empty data, missing workspace/nonce, tampered ciphertext, cross-workspace isolation), keyring caching (TTL expiry, max-item eviction, missing config), CompressionPayloadCodec (roundtrip, threshold skip, disabled passthrough), CompositePayloadCodec factory, and wrong-bearer-token router rejection.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f26b1148a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
2 issues found across 29 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tracecat/agent/session/service.py">
<violation number="1">
P1: Workflow update submission also lost explicit role context propagation, which can cause encrypted update payloads to use global scope instead of workspace scope.</violation>
<violation number="2">
P1: Temporal workflow start no longer enforces role context for payload encryption scope, so calls can silently fall back to global (`__global__`) key scope when `ctx_role` is unset.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Encode and decode now catch all exceptions and pass payloads through unmodified instead of killing the workflow execution. Error logs use only the exception class name to avoid leaking payload data or key material via stack traces.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tracecat/temporal/codec.py">
<violation number="1" location="tracecat/temporal/codec.py:338">
P1: Fail-open on `encode` silently disables encryption that the operator explicitly enabled. A transient key-retrieval error (AWS outage, network blip, misconfiguration) causes sensitive workflow payloads to be persisted unencrypted in Temporal — exactly the scenario encryption was meant to prevent. Consider fail-closed here (re-raise the exception so the Temporal activity/workflow fails and retries) and reserve fail-open for the read-path (`decode`) where the breakglass use case needs it.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24db12701e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…l payloads The decode path was gated by config flags (compression_enabled, TEMPORAL__PAYLOAD_ENCRYPTION_ENABLED), meaning historical payloads written with compression or encryption became unreadable after config changes or rollbacks. Both codecs' decode methods are marker-driven (check encoding metadata, not config), so they are safe to always include. Encode remains gated by each codec's enabled flag.
There was a problem hiding this comment.
1 issue found across 6 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tracecat/temporal/codec.py">
<violation number="1" location="tracecat/temporal/codec.py:479">
P2: `decode_payloads` forces `compression_enabled=True`, which can raise `ValueError` on invalid compression config and prevent all payload decoding. Decode should not depend on encode-time compression validation.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00234c2cb8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
The ValueError in CompressionPayloadCodec.__init__ is inconsistent with fail-open design — the encode path already handles invalid algorithms gracefully via the match/case default branch. Removing it prevents decode_payloads from crashing on stale algorithm config.
The test_workflow_wait_until_past sleep mock counted asyncio.sleep(0) calls from cooperative() as timer sleeps. Now that the payload codec chain always runs (for historical payload decode support), cooperative yields inflate the count. Only count non-zero sleeps (actual timers).

Checklist
uv run pytest tests)?pre-commit run --all-files)?Description
Related Issues
Screenshots / Recordings
Steps to QA
Summary by cubic
Adds a protected break‑glass Temporal codec server and end‑to‑end AES‑GCM encryption for Temporal payloads with optional compression. Decode always runs both codecs for backward compatibility; compression init-time validation was removed to keep decode fail‑open on stale config.
New Features
/codec/decoderoute to decode Temporal payloads; secured with BearerTEMPORAL__CODEC_SERVER_SHARED_SECRET, returns 401/503 when unauthorized or unconfigured; optionalX-Namespacefor logs; mounted in the main API.ctx_role.workspace_id; platform ops use__global__.TEMPORAL__PAYLOAD_ENCRYPTION_KEYor...__ARN; async retrieval viaboto3Secrets Manager with in‑memory cache (TEMPORAL__PAYLOAD_ENCRYPTION_CACHE_TTL_SECONDS,..._CACHE_MAX_ITEMS); versioning viaTEMPORAL__PAYLOAD_ENCRYPTION_KEY_VERSION. Infra adds secret ARNstemporal_payload_encryption_key_arnandtemporal_codec_server_shared_secret_arnwith IAM updates;docker-compose*exposes the new env vars.Migration
TEMPORAL__PAYLOAD_ENCRYPTION_ENABLED=trueand provide a root key viaTEMPORAL__PAYLOAD_ENCRYPTION_KEYor..._KEY__ARN.TEMPORAL__CODEC_SERVER_SHARED_SECRETand call/codec/decodewithAuthorization: Bearer <secret>;X-Namespaceis optional.Written for commit c663184. Summary will update on new commits.