Skip to content

feat(temporal): add payload encryption codec with fail-open semantics#2297

Open
daryllimyt wants to merge 15 commits intomainfrom
feat/temporal-breakglass-codec-server
Open

feat(temporal): add payload encryption codec with fail-open semantics#2297
daryllimyt wants to merge 15 commits intomainfrom
feat/temporal-breakglass-codec-server

Conversation

@daryllimyt
Copy link
Copy Markdown
Contributor

@daryllimyt daryllimyt commented Mar 8, 2026

Checklist

  • Read CONTRIBUTING.md.
  • PR title is short and non-generic (see previously merged PRs for examples).
  • PR only implements a single feature or fixes a single bug.
  • Tests passing (uv run pytest tests)?
  • Lint / pre-commits passing (pre-commit run --all-files)?

Description

Related Issues

Screenshots / Recordings

Steps to QA


Summary by cubic

Adds a protected break‑glass Temporal codec server and end‑to‑end AES‑GCM encryption for Temporal payloads with optional compression. Decode always runs both codecs for backward compatibility; compression init-time validation was removed to keep decode fail‑open on stale config.

  • New Features

    • /codec/decode route to decode Temporal payloads; secured with Bearer TEMPORAL__CODEC_SERVER_SHARED_SECRET, returns 401/503 when unauthorized or unconfigured; optional X-Namespace for logs; mounted in the main API.
    • Unified payload codec: AES‑GCM with HKDF‑derived per‑workspace keys and optional compression; encode runs compress→encrypt and decode reverses; decode always includes both codecs regardless of flags so historical payloads remain readable; encryption is fail‑open with minimal error logging; failure converter uses encoded attributes when encryption is on; memos, child workflow memos, and history payloads decode via the shared codec.
    • Workspace scoping: derived from ctx_role.workspace_id; platform ops use __global__.
    • Key management: root key from TEMPORAL__PAYLOAD_ENCRYPTION_KEY or ...__ARN; async retrieval via boto3 Secrets Manager with in‑memory cache (TEMPORAL__PAYLOAD_ENCRYPTION_CACHE_TTL_SECONDS, ..._CACHE_MAX_ITEMS); versioning via TEMPORAL__PAYLOAD_ENCRYPTION_KEY_VERSION. Infra adds secret ARNs temporal_payload_encryption_key_arn and temporal_codec_server_shared_secret_arn with IAM updates; docker-compose* exposes the new env vars.
  • Migration

    • Off by default. To enable encryption, set TEMPORAL__PAYLOAD_ENCRYPTION_ENABLED=true and provide a root key via TEMPORAL__PAYLOAD_ENCRYPTION_KEY or ..._KEY__ARN.
    • To enable the codec server, set TEMPORAL__CODEC_SERVER_SHARED_SECRET and call /codec/decode with Authorization: Bearer <secret>; X-Namespace is optional.

Written for commit c663184. Summary will update on new commits.

Copy link
Copy Markdown
Contributor Author

daryllimyt commented Mar 8, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@daryllimyt daryllimyt force-pushed the feat/temporal-agent-history-encryption branch from c5e5d08 to 35c2788 Compare March 8, 2026 01:24
@daryllimyt daryllimyt force-pushed the feat/temporal-breakglass-codec-server branch 2 times, most recently from 2e221d2 to b356b85 Compare March 8, 2026 01:45
@daryllimyt daryllimyt force-pushed the feat/temporal-agent-history-encryption branch from 35c2788 to 2ad828b Compare March 8, 2026 01:45
@daryllimyt daryllimyt changed the base branch from feat/temporal-agent-history-encryption to main March 18, 2026 18:05
@daryllimyt daryllimyt force-pushed the feat/temporal-breakglass-codec-server branch from b356b85 to d1915a5 Compare March 18, 2026 18:05
@daryllimyt daryllimyt changed the base branch from main to feat/temporal-agent-history-encryption March 18, 2026 18:33
@daryllimyt daryllimyt changed the base branch from feat/temporal-agent-history-encryption to main March 18, 2026 18:38
@daryllimyt daryllimyt force-pushed the feat/temporal-breakglass-codec-server branch from d1915a5 to b367063 Compare March 18, 2026 18:38
@blacksmith-sh

This comment has been minimized.

…ass-codec-server

# Conflicts:
#	docker-compose.dev.yml
#	docker-compose.local.yml
#	docker-compose.yml
#	tracecat/agent/session/service.py
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci March 27, 2026 18:19 — with GitHub Actions Inactive
@daryllimyt daryllimyt had a problem deploying to internal-registry-ci March 27, 2026 18:19 — with GitHub Actions Failure
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci March 30, 2026 15:02 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci March 30, 2026 15:02 — with GitHub Actions Inactive
@zeropath-ai
Copy link
Copy Markdown

zeropath-ai bot commented Mar 30, 2026

No security or compliance issues detected. Reviewed everything up to 9806b1f.

Security Overview
Detected Code Changes
Change Type Relevant files
Configuration changes ► deployments/fargate/main.tf
    Add Temporal payload encryption and codec server variables
► deployments/fargate/modules/ecs/iam.tf
    Add Temporal secrets to IAM policy
► deployments/fargate/modules/ecs/locals.tf
    Map new Temporal variables to environment variables
► deployments/fargate/modules/ecs/secrets.tf
    Add secrets for Temporal payload encryption and codec server
► deployments/fargate/modules/ecs/variables.tf
    Add variables for Temporal payload encryption and codec server
► deployments/fargate/variables.tf
    Add variables for Temporal payload encryption and codec server
► docker-compose.dev.yml
    Add Temporal payload encryption and codec server environment variables
► docker-compose.local.yml
    Add Temporal payload encryption and codec server environment variables
► docker-compose.yml
    Add Temporal payload encryption and codec server environment variables
► tracecat/config.py
    Define configuration for Temporal payload encryption and codec server
► tracecat/contexts.py
    Add context variable for Temporal workspace ID
    Add context manager for Temporal workspace ID override
► tracecat/dsl/_converter.py
    Switch to DefaultFailureConverterWithEncodedAttributes when encryption is enabled
    Use get_payload_codec for PayloadCodec
► tracecat/dsl/common.py
    Decode memo payloads using decode_payloads
► tracecat/dsl/compression.py
    Re-export CompressionPayloadCodec from temporal.codec
Enhancement ► tests/unit/test_dsl_converter.py
    Add anyio mark to tests
    Call AgentActionMemo.from_temporal asynchronously
► tests/unit/test_temporal_codec.py
    Add tests for Temporal payload codec encryption and decryption
    Add tests for Temporal payload codec requiring explicit workspace scope
    Add test for get_data_converter switching failure converters
► tests/unit/test_temporal_codec_router.py
    Add tests for codec router decoding encrypted payloads
    Add test for codec router rejecting unauthorized requests
    Add test for codec router requiring shared secret configuration
► tracecat/agent/session/service.py
    Use with_temporal_workspace_id context manager for workflow start and update
► tracecat/api/app.py
    Include Temporal codec router in the FastAPI app
► tracecat/temporal/codec.py
    Implement Temporal payload codec for encryption and decryption
    Implement get_payload_codec function
► tracecat/temporal/router.py
    Implement codec router API endpoints for decoding payloads
► tracecat/temporal/codec_router.py
    Implement codec router API endpoints for decoding payloads

Move AWS Secrets Manager call to asyncio.to_thread to avoid blocking
the event loop. Add double-checked locking to get_key for safe
concurrent cache access across await boundaries. Fix pre-existing
type error in alembic/env.py.
Update the codec authentication function to use FastAPI's dependency injection for improved handling of the authorization header. This change simplifies the decode endpoint by removing the direct authorization parameter and integrating the authentication check as a dependency.
…role

Simplify Temporal payload encryption scope resolution by deriving the
workspace scope directly from ctx_role.workspace_id instead of
maintaining a separate ContextVar. Roles without a workspace_id
(e.g. registry sync) now fall back to the global encryption scope
automatically.
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 1, 2026 20:36 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 1, 2026 20:36 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 15:07 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 15:07 — with GitHub Actions Inactive
@daryllimyt daryllimyt marked this pull request as ready for review April 2, 2026 15:15
…r tests

Cover edge cases for EncryptionPayloadCodec (binary/null passthrough,
empty data, missing workspace/nonce, tampered ciphertext, cross-workspace
isolation), keyring caching (TTL expiry, max-item eviction, missing config),
CompressionPayloadCodec (roundtrip, threshold skip, disabled passthrough),
CompositePayloadCodec factory, and wrong-bearer-token router rejection.
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 15:20 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 15:20 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 25 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f26b1148a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 29 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/agent/session/service.py">

<violation number="1">
P1: Workflow update submission also lost explicit role context propagation, which can cause encrypted update payloads to use global scope instead of workspace scope.</violation>

<violation number="2">
P1: Temporal workflow start no longer enforces role context for payload encryption scope, so calls can silently fall back to global (`__global__`) key scope when `ctx_role` is unset.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Encode and decode now catch all exceptions and pass payloads through
unmodified instead of killing the workflow execution. Error logs use
only the exception class name to avoid leaking payload data or key
material via stack traces.
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 16:07 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 16:07 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/temporal/codec.py">

<violation number="1" location="tracecat/temporal/codec.py:338">
P1: Fail-open on `encode` silently disables encryption that the operator explicitly enabled. A transient key-retrieval error (AWS outage, network blip, misconfiguration) causes sensitive workflow payloads to be persisted unencrypted in Temporal — exactly the scenario encryption was meant to prevent. Consider fail-closed here (re-raise the exception so the Temporal activity/workflow fails and retries) and reserve fail-open for the read-path (`decode`) where the breakglass use case needs it.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@daryllimyt daryllimyt changed the title feat(temporal): add breakglass codec server feat(temporal): add payload encryption codec with fail-open semantics Apr 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24db12701e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…l payloads

The decode path was gated by config flags (compression_enabled,
TEMPORAL__PAYLOAD_ENCRYPTION_ENABLED), meaning historical payloads
written with compression or encryption became unreadable after config
changes or rollbacks. Both codecs' decode methods are marker-driven
(check encoding metadata, not config), so they are safe to always
include. Encode remains gated by each codec's enabled flag.
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 16:52 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 16:53 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/temporal/codec.py">

<violation number="1" location="tracecat/temporal/codec.py:479">
P2: `decode_payloads` forces `compression_enabled=True`, which can raise `ValueError` on invalid compression config and prevent all payload decoding. Decode should not depend on encode-time compression validation.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00234c2cb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@blacksmith-sh

This comment has been minimized.

The ValueError in CompressionPayloadCodec.__init__ is inconsistent
with fail-open design — the encode path already handles invalid
algorithms gracefully via the match/case default branch. Removing
it prevents decode_payloads from crashing on stale algorithm config.
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 17:42 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 17:42 — with GitHub Actions Inactive
The test_workflow_wait_until_past sleep mock counted asyncio.sleep(0)
calls from cooperative() as timer sleeps. Now that the payload codec
chain always runs (for historical payload decode support), cooperative
yields inflate the count. Only count non-zero sleeps (actual timers).
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 17:59 — with GitHub Actions Inactive
@daryllimyt daryllimyt temporarily deployed to internal-registry-ci April 2, 2026 17:59 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant