Conversation
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
|
d1f3229 to
f91f7e9
Compare
| return chiprouter.EnsureStarted(ctx, relativePathToRepoRoot, "") | ||
| } | ||
|
|
||
| func loadPersistedBeholderState(relativePathToRepoRoot string) (*envconfig.ChipIngressConfig, error) { |
There was a problem hiding this comment.
why you ask? so that when you have Beholder running and you restart the env (which includes the router) we will detect that B is running and subscribe it to the router
f91f7e9 to
06cdbef
Compare
There was a problem hiding this comment.
Pull request overview
This PR unifies workflow-telemetry ingress in Local CRE + system tests by making Chip Router the single ingress owner, so both lightweight per-test sinks and real ChIP/Beholder subscribe downstream via the same router-based fanout topology.
Changes:
- Replace per-test “own the ingress port” / optional fanout sink logic with ephemeral-port sinks registered behind Chip Router.
- Wire Chip Router into Local CRE startup/setup/state handling (incl. image setup, env overrides, persisted state).
- Update system/regression tests and CI workflows to use the new router-based topology and safer teardown patterns.
Reviewed changes
Copilot reviewed 49 out of 52 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| system-tests/tests/test-helpers/chip_testsink_helpers.go | Per-test sink registration behind Chip Router + new drain/teardown helper. |
| system-tests/tests/test-helpers/chip-testsink/server.go | Adds PublishBatch and tweaks upstream forwarding + listener addr reporting. |
| system-tests/tests/test-helpers/beholder_provider.go | Serializes Beholder startup to avoid parallel start races. |
| system-tests/tests/test-helpers/before_suite.go | Ensures Chip Router is started; updates default Beholder gRPC port config. |
| system-tests/tests/smoke/cre/v2_vault_don_test.go | Uses new sink shutdown helper + parallel gating update. |
| system-tests/tests/smoke/cre/v2_sharding_test.go | Uses new sink shutdown helper. |
| system-tests/tests/smoke/cre/v2_http_action_test.go | Uses new sink shutdown helper + parallel gating update. |
| system-tests/tests/smoke/cre/v2_grpc_source_test.go | Uses new sink shutdown helper. |
| system-tests/tests/smoke/cre/v2_evm_capability_test.go | Uses new sink shutdown helper + parallel gating update. |
| system-tests/tests/smoke/cre/v2_dontime_test.go | Uses new sink shutdown helper. |
| system-tests/tests/smoke/cre/v2_consensus_capability_test.go | Uses new sink shutdown helper. |
| system-tests/tests/smoke/cre/v2_aptos_capability_test.go | Uses new sink shutdown helper + parallel gating update. |
| system-tests/tests/smoke/cre/cre_suite_test.go | Removes fanout flag usage; enables broader parallelization. |
| system-tests/tests/regression/cre/v2_http_trigger_regression_test.go | Uses new sink shutdown helper. |
| system-tests/tests/regression/cre/v2_http_action_regression_test.go | Uses new sink shutdown helper. |
| system-tests/tests/regression/cre/v2_evm_regression_test.go | Uses new sink shutdown helper. |
| system-tests/tests/regression/cre/v2_cron_beholder_regression_test.go | Minor formatting cleanup. |
| system-tests/tests/regression/cre/v2_consensus_regression_test.go | Uses new sink shutdown helper. |
| system-tests/tests/regression/cre/cre_regression_suite_test.go | Removes fanout flag usage; enables broader parallelization. |
| system-tests/tests/go.mod | Adds chiprouter component dependency; adjusts dockercompose dep to indirect. |
| system-tests/tests/go.sum | Records new chiprouter component checksums. |
| system-tests/lib/go.mod | Adds chiprouter component dependency; bumps chipingress indirect version. |
| system-tests/lib/go.sum | Records new chiprouter component + chipingress version checksums. |
| system-tests/lib/cre/types.go | Removes startup timing logs. |
| system-tests/lib/cre/environment/environment.go | Starts Chip Router as part of environment setup. |
| system-tests/lib/cre/environment/config/config.go | Adds chip_router to env config + router state loading helper; shifts validations to tags. |
| system-tests/lib/cre/don.go | Removes timing logs and unused rounding helper. |
| system-tests/lib/cre/chiprouter/router.go | New helper client for register/unregister subscribers with router. |
| go.md | Updates dependency graph to include chiprouter component. |
| docs/local-cre/getting-started/index.md | Updates docs to reflect router-owned ingress and ECR env vars. |
| docs/local-cre/environment/index.md | Documents router topology, ports, and image override precedence. |
| core/scripts/go.mod | Adds chiprouter component dependency. |
| core/scripts/go.sum | Records new chiprouter component checksums. |
| core/scripts/cre/environment/environment/setup.go | Adds Chip Router image setup; supports MAIN/SDLC ECR env vars; refactors image ensure logic. |
| core/scripts/cre/environment/environment/environment.go | Applies router image override; ensures router image exists; persists state earlier; stage count updates. |
| core/scripts/cre/environment/environment/beholder.go | Registers/unregisters Beholder as a router subscriber; changes default Beholder port semantics. |
| core/scripts/cre/environment/configs/workflow-gateway-sharded-don.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-sharded-5-dons.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-mock-don.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-legacy-vault-don.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-don.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-don-grpc-source.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-don-aptos.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-gateway-capabilities-don.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-don-tron.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/workflow-don-solana.toml | Adds chip_router image config. |
| core/scripts/cre/environment/configs/setup.toml | Adds chip_router image build/pull config; switches ECR templating to MAIN/SDLC vars. |
| .github/workflows/devenv-compat.yml | Adds CTF_CHIP_ROUTER_IMAGE for CI runs. |
| .github/workflows/cre-system-tests.yaml | Adds CTF_CHIP_ROUTER_IMAGE; removes fanout env; adjusts test timeouts + artifact steps on cancel. |
| .github/workflows/cre-soak-memory-leak.yml | Adds CTF_CHIP_ROUTER_IMAGE; switches obs startup to go run. |
| .github/workflows/cre-regression-system-tests.yaml | Adds CTF_CHIP_ROUTER_IMAGE; removes fanout env; adjusts test timeouts + artifact steps on cancel. |
| .github/workflows/cre-local-env-tests.yaml | Updates AWS role + replaces AWS_ECR with MAIN/SDLC ECR env vars; adds router image env. |
Risk Rating: HIGH
Scrupulous human review recommended for:
system-tests/tests/test-helpers/chip-testsink/server.go: upstream forwarding context semantics (async goroutine + request context cancellation).system-tests/lib/cre/chiprouter/router.go: client initialization/caching strategy (sync.Once) and failure recovery behavior.core/scripts/cre/environment/environment/environment.go+beholder.go: persisted Beholder/router reconciliation and state write ordering.
Comments suppressed due to low confidence (1)
system-tests/tests/test-helpers/chip-testsink/server.go:120
- In the async upstream-forwarding goroutine,
context.WithTimeout(ctx, ...)uses the request context. BecausePublishreturns immediately after spawning the goroutine, the gRPC request context will typically be cancelled as soon as the handler returns, causing the forward to upstream to be cancelled almost immediately. Use a context that outlives the handler return (e.g.,context.WithTimeout(context.WithoutCancel(ctx), ...)orcontext.WithTimeout(context.Background(), ...)) so forwarding can complete reliably.
chainchad
left a comment
There was a problem hiding this comment.
Approving for .github/ changes.
|




This branch simplifies ChIP integration across local CRE and system tests by making Chip Router the single ingress entrypoint for workflow telemetry. Instead of having different tests reason about different ChIP paths, both lightweight test sinks and real ChIP / Beholder now sit behind the same router, which removes the old ingress ownership split and makes the topology much easier to understand.
Before this change, the ChIP path depended on the test. Most tests started their own test sink and, when running in parallel, relied on a programmatic fanout to avoid port collisions. A smaller set of tests used real ChIP / Beholder instead, with different startup, different ports, and different reasoning about where workflow events were actually going. That meant there was no single mental model for “how ChIP works in tests”, and understanding failures required knowing which path a given test had taken and how ingress ownership was being managed in that case.
It also meant that combining the two modes was awkward. If you wanted to run a test against the lightweight sink for assertions but also have real Beholder running so you could explore the same events in Red Panda, you had to work around ingress-port ownership manually. In practice that made “test sink for assertions + real Beholder for exploration” much harder than it should have been.
After this change, there is one ingress owner: Chip Router on 50051. Nodes always publish to that same entrypoint. Test sinks no longer pretend to be ingress owners; they start on ephemeral ports and subscribe behind the router. Real ChIP / Beholder also no longer competes for the ingress port; it runs downstream and subscribes through the same mechanism. The result is that the transport topology is the same regardless of whether a test uses a lightweight sink or the full ChIP stack, so the distinction between those modes is reduced to what the downstream consumer does with the events, not how telemetry reaches it.
That also makes mixed usage straightforward. A test can keep using its lightweight sink for assertions while real Beholder subscribes at the same time and receives the same event stream for interactive inspection. So one concrete benefit of this branch is that it becomes easy to use Beholder as an observability tool alongside normal sink-based tests, instead of treating those two modes as mutually exclusive.
It also turns Chip Router into standard environment infrastructure rather than a one-off extra. The router is now wired into normal environment startup, persisted in the main local CRE state, configurable through the env TOMLs, and managed like the other required images. That includes setup support, local image aliasing, runtime image override via environment variable, and updated docs in both chainlink and CTF so the architecture and operational flow match the code.
Before, the router path was operationally special: it had separate lifecycle handling, weaker setup guarantees, and more room for local drift between code, config, and docs. After this change, the router follows the same environment contract as the rest of local CRE. You can see it in the env TOML, set it up through env setup, override its image explicitly when needed, and rely on it being started as part of the environment itself rather than as an ad hoc side path. So the main gain is not just fewer port conflicts, but a more uniform, more composable, and much more explainable ChIP model across local development and tests.
Other changes:
go-validatorfanoutEnabledflag since now we can always fanoutt.Context()int.Cleanup()PublishBatchfor ChIP test sinkRequires:
go testtimeouts in system and regression jobs (go testsufficiently lower than job, so that we still get logs)Tests:
setupis run ✅