feat: support per-route timeout overrides for ingress path rules by djeebus · Pull Request #2250 · e2b-dev/infra

djeebus · 2026-03-28T03:14:38Z

Changes the additional_api_paths_handled_by_ingress variable from list(string) to list(object({ paths, timeout_sec? })). Each entry now creates its own path_rule, and when timeout_sec is set, it adds a route_action.timeout that overrides the ingress backend default (80s).

Also move traefik configuration to a config file, and support extra config files.

Backwards compatible: existing callers just need to wrap their path lists, e.g.:

Before

additional_api_paths_handled_by_ingress = ["/v1/foo", "/v1/bar"]

After

additional_api_paths_handled_by_ingress = [
{ paths = ["/v1/foo", "/v1/bar"] }
]

Or with a custom timeout

additional_api_paths_handled_by_ingress = [
{ paths = ["/v1/foo"], timeout_sec = 300 },
{ paths = ["/v1/bar"] }
]

Changes the additional_api_paths_handled_by_ingress variable from list(string) to list(object({ paths, timeout_sec? })). Each entry now creates its own path_rule, and when timeout_sec is set, it adds a route_action.timeout that overrides the ingress backend default (80s). Backwards compatible: existing callers just need to wrap their path lists, e.g.: # Before additional_api_paths_handled_by_ingress = ["/v1/foo", "/v1/bar"] # After additional_api_paths_handled_by_ingress = [ { paths = ["/v1/foo", "/v1/bar"] } ] # Or with a custom timeout additional_api_paths_handled_by_ingress = [ { paths = ["/v1/foo"], timeout_sec = 300 }, { paths = ["/v1/bar"] } ]

cursor · 2026-03-28T03:14:44Z

PR Summary

Medium Risk
Changes GCP load balancer routing to support optional per-path timeouts and refactors ingress/Traefik configuration inputs, which can break existing Terraform callers and alter request behavior if misconfigured. Risk is mainly deployment/configuration regressions rather than code security issues.

Overview
Adds support for per-route timeout overrides on GCP ingress URL map path rules by changing additional_api_paths_handled_by_ingress from list(string) to a list of objects with optional timeout_sec, and introduces a configurable ingress_timeout_seconds for the ingress backend service. Refactors the Nomad ingress job to run Traefik from a rendered traefik.toml plus optional additional dynamic config files (traefik_config_files), replacing the previous CLI-argument-based configuration and wiring the new variable through both AWS and GCP Terraform modules.

^{Written by Cursor Bugbot for commit 7b8b8d2. This will update automatically on new commits. Configure here.}

claude · 2026-03-28T03:16:47Z

iac/provider-gcp/nomad-cluster/network/main.tf

        service = google_compute_backend_service.ingress.self_link
+
+        dynamic "route_action" {
+          for_each = path_rule.value.timeout_sec != null ? [path_rule.value.timeout_sec] : []


The != null guard allows timeout_sec = 0 to pass through, producing timeout { seconds = 0 }. GCP rejects URL map timeouts of 0 seconds at apply time. Consider adding a variable validation rule:

validation { condition = alltrue([ for r in var.additional_api_paths_handled_by_ingress : r.timeout_sec == null || r.timeout_sec > 0 ]) error_message = "timeout_sec must be greater than 0 when specified." }

technically true, but not meaningfully true. gcp will happily provide an error message.

claude · 2026-03-28T03:16:47Z

iac/provider-gcp/nomad-cluster/network/main.tf

+          for_each = path_rule.value.timeout_sec != null ? [path_rule.value.timeout_sec] : []
+
+          content {
+            timeout {


The timeout block inside route_action is missing the nanos field. While nanos is optional in the provider (defaults to 0), the GCP API may still reject the configuration for certain provider versions. Explicitly setting nanos = 0 avoids any ambiguity:

timeout { seconds = route_action.value nanos = 0 }

technically true, but not meaningfully true. gcp will happily provide an error message.

claude · 2026-03-28T14:35:07Z

iac/provider-gcp/nomad-cluster/network/variables.tf

 variable "additional_api_paths_handled_by_ingress" {
-  type = list(string)
+  type = list(object({
+    paths       = list(string)
+    timeout_sec = optional(number)
+  }))
 }


🔴 The new list(object({paths, timeout_sec})) variable type lacks validation in all three variable files, leaving three input classes unguarded: an entry with paths = [] generates a path_rule with an empty paths list that GCP's URL map API rejects at apply time; timeout_sec = 0 (already flagged in a PR inline comment) passes the null guard but is rejected by GCP; and timeout_sec = 1.5 is accepted by Terraform's number type but the provider's seconds field is int64, causing a plan-time type error or silent truncation. A single validation block in each of the three variable files would address all cases: condition = alltrue([for r in var.additional_api_paths_handled_by_ingress : length(r.paths) > 0 && (r.timeout_sec == null || (r.timeout_sec > 0 && floor(r.timeout_sec) == r.timeout_sec))]).

Extended reasoning...

What the bug is and how it manifests

The PR changes additional_api_paths_handled_by_ingress from list(string) to list(object({ paths = list(string), timeout_sec = optional(number) })). This structural change introduces three new unguarded input classes, none of which are caught by Terraform before reaching the GCP API or provider type system:

Empty paths list — Terraform's list(string) type permits zero-element lists, so { paths = [], timeout_sec = null } is syntactically valid. The dynamic path_rule block at main.tf:272 iterates directly over the variable (for_each = var.additional_api_paths_handled_by_ingress) and emits a path_rule block with paths = []. GCP's URL map API requires at least one path per path_rule and will return a 400 error at apply time.

timeout_sec = 0 — The \!= null guard at main.tf:280 (for_each = path_rule.value.timeout_sec \!= null ? [...] : []) allows zero through, producing timeout { seconds = 0 }. GCP rejects URL map route timeouts of zero seconds. This was noted in an existing inline comment on the PR but the fix was suggested in main.tf; the correct place to enforce it is in the variable definitions.

Fractional timeout_sec — optional(number) is Terraform's float64. A caller can pass timeout_sec = 1.5. The GCP Terraform provider defines the seconds field inside route_action.timeout as TypeInt (int64). Depending on the provider version, this either causes a plan-time type coercion error with a confusing message, or silently truncates to 1 second — both unexpected outcomes for the caller.

The specific code path

variables.tf:106-111 (all three files) declares the variable with no validation blocks. At plan time, Terraform accepts any list(object({...})) that satisfies the type constraint. The value flows to main.tf:273 (for_each = var.additional_api_paths_handled_by_ingress), then path_rule.value.paths is emitted as the paths attribute and route_action.value (the float64 timeout_sec) is emitted as seconds.

Why existing code doesn't prevent it

The old code used for_each = length(var.additional_api_paths_handled_by_ingress) > 0 ? [{}] : [] with a flat list(string), meaning an empty outer list produced no path_rule at all — the empty-paths case was structurally impossible. The new per-entry iteration moves the responsibility of ensuring non-empty paths to each individual entry, but no validation block enforces it. Similarly, the timeout_sec sub-field is entirely new; there is no prior guard anywhere.

Step-by-step proof for empty paths

Caller sets: additional_api_paths_handled_by_ingress = [{ paths = [], timeout_sec = null }]

Terraform type-checks: list(object({ paths = list(string), ... })) — ✅ valid (empty list is a valid list(string))

No validation block exists → no plan-time error

for_each = var.additional_api_paths_handled_by_ingress iterates once with path_rule.value = { paths = [], timeout_sec = null }

Emitted HCL: path_rule { paths = [] service = ... }

GCP API call at apply time → 400 Bad Request: "path_rule must have at least one path"

Impact

The empty-paths case produces a confusing apply-time GCP error rather than a clear Terraform validation message, blocking the entire apply. The fractional timeout_sec case can silently truncate (e.g., a 1.5s timeout becomes 1s) without any warning, leading to subtly wrong routing behavior.

How to fix

Add a single validation block to each of the three variable definitions (and remove the separate timeout_sec > 0 suggestion from the inline main.tf comment):

validation { condition = alltrue([ for r in var.additional_api_paths_handled_by_ingress : length(r.paths) > 0 && (r.timeout_sec == null || (r.timeout_sec > 0 && floor(r.timeout_sec) == r.timeout_sec)) ]) error_message = "Each entry must have at least one path, and timeout_sec (when set) must be a positive integer." }

Addressing the refutation on duplicate paths (bug_003)

One verifier argued that cross-entry duplicate paths is not a meaningful regression since the old list(string) also allowed duplicate paths within the single flat path_rule. This is correct — duplicate-path detection across entries is a separate, lower-priority concern and would require nontrivial setproduct/flatten logic. The core bugs here (empty paths and invalid timeout_sec) are independent of this and clearly should be addressed.

technically true, but not meaningfully true. gcp will happily provide an error message.

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T00:23:25Z

iac/modules/job-ingress/jobs/traefik.toml

+    exposedByDefault = false
+    [providers.consulCatalog.endpoint]
+      address = "${consul_endpoint}"
+      token = "${consul_token}"


Access logging silently dropped in Traefik config migration

Medium Severity

The original CLI args included --accesslog=true, but the new traefik.toml has no [accessLog] section. Traefik disables HTTP access logging by default, so this migration silently turns off access logs for all ingress traffic. Every other original CLI flag maps to an equivalent TOML setting, but accessLog is the only one missing.

cursor · 2026-04-01T00:23:25Z

iac/provider-aws/nomad/variables.tf


-variable "additional_traefik_arguments" {
-  type    = list(string)
-  default = []


Missing default breaks AWS nomad module consumers

Low Severity

The old additional_traefik_arguments variable in provider-aws/nomad had default = [], making it optional. The replacement traefik_config_files has no default, making it required. Any direct caller of this module that previously relied on the implicit empty default now gets a Terraform error unless they explicitly pass traefik_config_files = {}.

iac/modules/job-ingress/jobs/traefik.toml

e2b-request-same-site-reviewers bot assigned ValentaTomas Mar 28, 2026

claude bot reviewed Mar 28, 2026

View reviewed changes

djeebus marked this pull request as ready for review March 28, 2026 14:25

djeebus requested review from ValentaTomas, dobrac and jakubno as code owners March 28, 2026 14:25

claude bot reviewed Mar 28, 2026

View reviewed changes

djeebus and others added 6 commits March 31, 2026 13:11

use a traefik config file, support more files

197721e

support more ingress overrides

f6c1f74

chore: auto-commit generated changes

a1c15a5

clean up

f5aa7cf

remove a few more unused variables

5c87346

reduce diff

1d2ad7b

cursor bot reviewed Apr 1, 2026

View reviewed changes

create a dummy file to ensure the directoy exists

7b8b8d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support per-route timeout overrides for ingress path rules#2250

feat: support per-route timeout overrides for ingress path rules#2250
djeebus wants to merge 8 commits intomainfrom
joe/per-route-ingress-timeouts

djeebus commented Mar 28, 2026 •

edited

Loading

Uh oh!

cursor bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

claude bot Mar 28, 2026

Uh oh!

djeebus Mar 31, 2026

Uh oh!

claude bot Mar 28, 2026

Uh oh!

djeebus Mar 31, 2026

Uh oh!

claude bot Mar 28, 2026

Uh oh!

djeebus Mar 31, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

djeebus commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Or with a custom timeout

Uh oh!

cursor bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

claude bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

djeebus Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

djeebus Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

djeebus Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Access logging silently dropped in Traefik config migration

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Missing default breaks AWS nomad module consumers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djeebus commented Mar 28, 2026 •

edited

Loading

cursor bot commented Mar 28, 2026 •

edited

Loading