chore: added/updated tests for backend #72

HardMax71 · 2026-01-06T22:20:19Z

Summary by cubic

Migrated Kubernetes integrations to kubernetes_asyncio and tightened event/replay/DLQ flows for reliability. Modernized backend tests and CI; cancellation now blocks all terminal states, and schema/DB init moved to the FastAPI server (workers no longer init Beanie or use EventStore).

Refactors
- Switched all K8s clients/services to kubernetes_asyncio; create/close are async and providers updated.
- Added ExecutionStatus.is_terminal and updated cancel route to use it.
- UnifiedProducer now receives Settings and uses the configured topic prefix; PodMonitor container includes RedisProvider.
- Aligned replay documents with domain (ReplayFilter fields).
- Removed unused repository methods; simplified EventRepository surface.
- Moved schema/DB initialization to the FastAPI server; workers no longer initialize Beanie; K8s worker dropped EventStore.
- CI workflows updated to use Confluent Kafka/Zookeeper and shared docker-bake; enabled typing checks in tests; added kubernetes-asyncio and fakeredis.
Bug Fixes
- DLQ manager storage is idempotent and concurrency-safe (atomic upsert/duplicate skip); supports group_id_suffix for per-test consumer groups.
- Serialized Kafka Consumer initialization with a global lock to avoid a librdkafka race.
- DLQ processing skips already-terminal messages.

^{Written for commit 229e473. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Replay requests accept richer filters (event IDs, correlation/aggregate, exclusions) and enum-keyed topic mapping.
Bug Fixes
- Cancellation now rejects any terminal execution; DLQ processing is idempotent and skips already-final messages; consumer init serialized to reduce races.
Refactor
- Kubernetes/pod-monitoring and related tooling moved to async; event storage APIs simplified and archival handling improved.
Tests
- Large test modernization: stricter typing, shared fixtures, deterministic IDs, backoff waiters, many updated integration/e2e/unit suites.
Chores
- CI/build tooling updated (bake-based builds, compose/image changes, dependency tweaks).
Documentation
- Added tests.md summarizing test coverage gaps.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-06T22:20:27Z

📝 Walkthrough

Walkthrough

Centralized time-filter helpers; removed multiple EventRepository batch/query/cleanup/replay helpers; DLQManager made _store_message idempotent with early-exit; UnifiedProducer now receives Settings; Kubernetes and PodMonitor code migrated to kubernetes_asyncio (async); large test refactor adding fixtures, typing, and async fakes; CI switched to buildx/bake and image-based compose.

Changes

Cohort / File(s)	Summary
Event repository `\`backend/app/db/repositories/event_repository.py``	Removed batch insert and several query/cleanup/replay helpers; archival now uses `model_dump(exclude={"id","revision_id"})`; added private `_time_conditions`/`_build_time_filter`; `get_events_by_aggregate` now maps `EventType` -> `.value`. Review callers and aggregation logic.
Replay models & repo `\`backend/app/db/docs/replay.py``,` `backend/app/db/repositories/replay_repository.py``	`ReplayFilter` gains selectors (`event_ids`, `correlation_id`, `aggregate_id`, `exclude_event_types`, `custom_query`); `ReplayConfig.target_topics` keys are now `EventType`; removed `count_sessions`. Check consumers/serializers.
DLQ manager & tests `\`backend/app/dlq/manager.py``,` `backend/tests/integration/dlq/*``,` `backend/tests/integration/dlq/conftest.py``	`_store_message` now returns `bool` (idempotent, skips terminal duplicates); `_process_dlq_message` early-exits on False; `create_dlq_manager` accepts `group_id_suffix`; tests use DLQManager fixture and backoff waiters. Inspect DuplicateKeyError/race handling and state transitions.
Producer init / DI `\`backend/app/events/core/producer.py``,` `backend/app/core/providers.py``	`UnifiedProducer.__init__` adds `settings: Settings`; topic prefix derived from injected settings; providers updated accordingly. Update constructors/call sites.
Kubernetes async migration `\`backend/app/core/k8s_clients.py``,` `backend/app/services/k8s_worker/``,` `backend/app/services/pod_monitor/``,` `backend/app/services/result_processor/resource_cleaner.py``,` `backend/app/services/k8s_worker/pod_builder.py``	Replaced `kubernetes` with `kubernetes_asyncio`; client init/close and watch/list/read/delete converted to async; PodMonitor and PodEventMapper made async; ResourceCleaner gained async close. Review lifecycle, ApiClient ownership, and tests using sync fakes.
PodMonitor DI `\`backend/app/core/container.py``	Added `RedisProvider()` to PodMonitor container.
Execution status / API `\`backend/app/domain/enums/execution.py``,` `backend/app/api/routes/execution.py``	Added `ExecutionStatus.is_terminal` property; `cancel_execution` now uses `status.is_terminal` to decide cancellation acceptance.
Test infra — fixtures & typing `\`backend/tests/conftest.py``,` `backend/tests/integration/conftest.py``, many tests	Large test refactor: `unique_id` fixture, `MakeUser`, `authenticated_client`/`authenticated_admin_client`, broad typing and AsyncContainer annotations; many helpers removed (e.g., `eventually`) in favor of `backoff`; many tests converted to async and to use new fixtures. Verify fixture lifecycles and parallel isolation.
Test helpers & protocols `\`backend/tests/helpers/protocols.py``,` `backend/tests/helpers/k8s_fakes.py``,` `backend/tests/helpers/sse.py``,` `backend/tests/helpers/init.py``	Added runtime-checkable Protocols; converted k8s and SSE fakes to async-aware implementations; strengthened typing and set `__all__`. Confirm test imports and compatibility with new async fakes.
Tests — add/remove modules `\`backend/tests/integration/services/sse/test_redis_bus.py` (removed)`,` `backend/tests/unit/services/sse/test_redis_bus.py` (added)`	Removed integration Redis SSE tests, added unit-level SSE Redis tests. Check test coverage intent and CI impact.
Consumer init safety `\`backend/app/events/core/consumer.py``	Added module-level threading lock to serialize Kafka consumer initialization (mitigates librdkafka race).
DI / worker startup cleanup `\`backend/workers/*.py``	Removed inline Beanie/DB and schema-registry initialization from multiple worker entrypoints; workers now rely on DI-managed lifecycles.
ResultProcessor DI `\`backend/workers/run_result_processor.py``,` `backend/app/services/result_processor/processor.py``	ResultProcessor constructed via DI; constructor now requires explicit dependencies (`execution_repo`, `producer`, `schema_registry`, `settings`, `idempotency_manager`, `logger`). Update callers.
Dev tooling & deps `\`backend/pyproject.toml``	Replaced `kubernetes` with `kubernetes-asyncio`; added `kubernetes-asyncio` and `fakeredis` (dev); bumped `urllib3`; added pydantic.mypy plugin config.
Build & CI `\`.github/workflows/*.yml``,` `docker-bake.hcl``,` `docker-compose.yaml``, removed` `docker-compose.ci.yaml``	Introduced `docker-bake.hcl`, migrated GH Actions to buildx/bake and image-based compose; removed CI-specific compose file; updated compose images/healthchecks; added k3s/cert steps for E2E.
Misc & formatting assorted files	Widespread typing, formatting, fixture signature updates, and small behavioral tweaks across tests and some services.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Kafka as Kafka (DLQ topic)
  participant DLQ as DLQManager
  participant Mongo as MongoDB (DLQ collection)
  participant Worker as Retry/Processor

  Kafka->>DLQ: deliver DLQ message
  DLQ->>DLQ: _process_dlq_message(message)
  DLQ->>Mongo: _store_message(insert/update)
  alt _store_message returns False (duplicate terminal or DuplicateKeyError)
    Mongo-->>DLQ: existing terminal / DuplicateKeyError
    DLQ-->>Worker: abort processing (early exit)
  else _store_message returns True
    Mongo-->>DLQ: stored/updated as PENDING
    DLQ->>Worker: continue processing (retry/emit)
    Worker->>Kafka: produce retry/archival event
    Worker->>Mongo: update DLQ doc -> RETRIED/DISCARDED
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

chore: tests fix/update #60 — overlaps DLQManager idempotency, group_id handling, and DI/lifecycle refactors.
chore: frontend updates #44 — related EventRepository edits around EventType handling and aggregate queries.
chore: speed-up of CI tests #46 — related edits to EventRepository query/aggregation and replay retrieval logic.

Suggested labels

enhancement

Poem

🐇
I hopped through queues and async streams,
I nudged duplicates out of sleepy dreams.
Producers donned Settings like a crown,
Pods woke async — no more blocking town.
Tests found names and fakes took flight; we debugged till night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.05% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'chore: added/updated tests for backend' directly describes the main change in the PR—a significant overhaul of backend tests and related infrastructure updates.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-01-06T22:21:25Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 76.12903% with 37 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/services/k8s_worker/worker.py	50.00%	14 Missing ⚠️
.../app/services/result_processor/resource_cleaner.py	69.56%	7 Missing ⚠️
backend/app/core/k8s_clients.py	44.44%	5 Missing ⚠️
backend/app/dlq/manager.py	69.23%	4 Missing ⚠️
backend/app/api/routes/execution.py	0.00%	2 Missing ⚠️
backend/app/core/providers.py	33.33%	2 Missing ⚠️
backend/app/domain/enums/execution.py	66.66%	1 Missing ⚠️
backend/app/services/pod_monitor/event_mapper.py	95.23%	1 Missing ⚠️
backend/app/services/pod_monitor/monitor.py	96.87%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Flag	Coverage Δ
backend-e2e	`52.98% <45.80%> (-1.12%)`	⬇️
backend-integration	`73.89% <39.35%> (+1.98%)`	⬆️
backend-unit	`60.61% <51.61%> (+0.67%)`	⬆️
frontend	`54.86% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
backend/app/core/container.py	`65.00% <ø> (ø)`
backend/app/db/docs/replay.py	`92.95% <100.00%> (+0.31%)`	⬆️
backend/app/db/repositories/event_repository.py	`99.06% <100.00%> (+34.68%)`	⬆️
backend/app/db/repositories/replay_repository.py	`43.75% <ø> (-0.19%)`	⬇️
backend/app/events/core/consumer.py	`81.98% <100.00%> (+1.60%)`	⬆️
backend/app/events/core/producer.py	`85.40% <100.00%> (ø)`
backend/app/services/k8s_worker/pod_builder.py	`100.00% <100.00%> (ø)`
backend/app/settings.py	`100.00% <100.00%> (ø)`
backend/app/domain/enums/execution.py	`92.30% <66.66%> (-7.70%)`	⬇️
backend/app/services/pod_monitor/event_mapper.py	`89.83% <95.23%> (+0.04%)`	⬆️
... and 7 more

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cubic-dev-ai

No issues found across 1 file

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI Agents

In @backend/app/db/repositories/event_repository.py:
- Around line 287-293: The aggregation returns nested event dicts that still
include MongoDB’s internal "_id", causing Event(**e) to raise TypeError; in the
EventDocument.aggregate handling (where doc = await
anext(EventDocument.aggregate(pipeline), None) and you build
EventReplayInfo(events=[Event(**e) for e in doc["events"]], ...)), filter out
MongoDB-specific keys from each event dict before constructing Event instances
(e.g., drop "_id" and any other DB-only fields like "id"/"revision_id"), or
modify the pipeline to exclude "_id" in the $push stage; implement the simpler
option: sanitize each e in doc["events"] by removing "_id" (and other DB-only
keys) before calling Event(**e) so EventReplayInfo and Event construction no
longer receive unexpected keys.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd418c4 and f5fabe3.

📒 Files selected for processing (1)

backend/app/db/repositories/event_repository.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/db/repositories/event_repository.py (1)

backend/app/domain/events/event_models.py (3)

SortDirection (20-22)

EventReplayInfo (183-190)

Event (44-61)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build Frontend
GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (8)

backend/app/db/repositories/event_repository.py (8)

1-25: LGTM!

Imports are correctly updated for Beanie 2.0.1 (LTE replacing LT), and the timezone addition supports the UTC-aware datetime usage throughout the file.

27-37: LGTM!

Clean helper methods that centralize time filtering logic. The list comprehension and dict comprehension patterns effectively filter out None values.

39-56: LGTM!

The method correctly ensures stored_at is always populated and filters None values to allow document defaults to apply.

58-88: LGTM!

Consistent use of model_dump(exclude={"id", "revision_id"}) for Event construction, and proper enum-to-value conversion in get_events_by_aggregate.

90-179: LGTM!

Well-structured aggregation pipeline with proper use of $facet for efficient multi-faceted statistics. The fallback to empty EventStatistics (line 179) handles the no-results case correctly.

181-209: LGTM!

Clean usage of the _time_conditions helper and consistent pagination pattern.

211-245: LGTM!

Standard utility methods with correct implementations.

247-268: LGTM!

Cleaner implementation using model_dump(exclude={"id", "revision_id"}) instead of explicit field construction. The archive-then-delete ordering ensures no data loss on failure.

backend/app/db/repositories/event_repository.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI Agents

In @backend/app/dlq/manager.py:
- Around line 279-295: _in _store_message_, avoid resetting state for reconsumed
records: if a DLQMessageDocument already exists, do not overwrite its
status/timestamps or clear scheduling info like next_retry_at; instead log and
return False to ignore redeliveries. Locate the existing lookup using
DLQMessageDocument.find_one(...) and replace the current flow that sets
message.status = DLQMessageStatus.PENDING and overwrites the document when
existing is truthy—keep that branch only for truly new messages and preserve
existing.status (including SCHEDULED) so scheduled retry metadata isn’t lost.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c0ea42 and 6ddf1e5.

📒 Files selected for processing (1)

backend/app/dlq/manager.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/dlq/manager.py (1)

backend/app/dlq/models.py (2)

DLQMessage (30-55)

DLQMessageStatus (10-16)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (1)

backend/app/dlq/manager.py (1)

254-256: LGTM! Good idempotency guard.

The early return when _store_message returns False correctly prevents re-processing of messages that have already reached a terminal state. This is a solid improvement for handling Kafka message redelivery scenarios.

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/dlq/manager.py">

<violation number="1" location="backend/app/dlq/manager.py:283">
P2: Consider including `SCHEDULED` status in the skip check. Messages in SCHEDULED status are already queued for retry by the monitor task. Re-consuming them could reset their status to PENDING and interfere with the scheduled retry process.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/dlq/manager.py

…ly new ones

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/dlq/manager.py">

<violation number="1" location="backend/app/dlq/manager.py:295">
P2: Race condition: `DuplicateKeyError` is not caught when two concurrent processes try to insert the same `event_id`. Since `event_id` has a unique index, catch the exception and return `False` to maintain idempotency without unnecessary errors.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/dlq/manager.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI Agents

In @backend/app/dlq/manager.py:
- Around line 279-296: In _store_message, guard the final insert against
concurrent duplicate-key races by wrapping the await doc.insert() call in a
try/except that catches the duplicate-key error (e.g.,
pymongo.errors.DuplicateKeyError or the underlying DB driver exception used by
your ODM), log a debug message including message.event_id and the exception, and
return False to preserve idempotency instead of letting the exception bubble;
add the necessary import for the exception class.

🧹 Nitpick comments (1)

backend/app/dlq/manager.py (1)
288-288: Minor: Avoid f-string in debug logging for efficiency.

The f-string at line 288 is evaluated even when debug logging is disabled. Use lazy formatting instead.
🔎 Proposed refactor
-            self.logger.debug(f"Skipping already tracked message: {message.event_id} (status={existing.status})")
+            self.logger.debug("Skipping already tracked message: %s (status=%s)", message.event_id, existing.status)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ddf1e5 and 4afcf95.

📒 Files selected for processing (1)

backend/app/dlq/manager.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/dlq/manager.py (1)

backend/app/dlq/models.py (2)

DLQMessage (30-55)

DLQMessageStatus (10-16)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (1)

backend/app/dlq/manager.py (1)

254-256: LGTM! Idempotency check correctly prevents reprocessing.

The early return when _store_message returns False ensures that duplicate Kafka deliveries (due to rebalance, restart, etc.) are safely ignored without reprocessing.

backend/app/dlq/manager.py

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/dlq/manager.py">

<violation number="1" location="backend/app/dlq/manager.py:306">
P1: When `DuplicateKeyError` occurs with non-terminal status, the function now returns `True` even though the save failed. This will cause duplicate message processing since both the winning consumer and this losing consumer will proceed with the message. The original code correctly returned `False` to defer to the winner.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/dlq/manager.py

cubic-dev-ai

2 issues found across 115 files (changes from recent commits).

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/tests/integration/app/test_main_app.py">

<violation number="1" location="backend/tests/integration/app/test_main_app.py:27">
P2: This assertion will pass even if CORS middleware is not working. The `or resp.status_code == 200` condition means the test passes whenever the endpoint returns 200, regardless of whether CORS headers are present. Consider testing CORS headers more directly, or if 200 without CORS headers is expected for some origins, add a comment explaining the logic.</violation>
</file>

<file name="backend/tests/integration/services/idempotency/test_redis_repository.py">

<violation number="1" location="backend/tests/integration/services/idempotency/test_redis_repository.py:159">
P2: Using `>=` instead of `==` weakens test precision. The test creates exactly 2 PROCESSING and 1 COMPLETED records with unique keys, so the counts should be exact. If test isolation is a concern (shared Redis state), consider using a unique `key_prefix` per test instead of weakening assertions, which could hide bugs where records are over-counted.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/tests/integration/app/test_main_app.py

backend/tests/integration/services/idempotency/test_redis_repository.py

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/tests/integration/notifications/test_notification_sse.py (1)

37-51: Add await sub.close() after assertions to clean up the Redis subscription.

The SSERedisSubscription class has a close() method that unsubscribes from the channel and closes the PubSub connection. The test should call await sub.close() after the assertions to properly release the Redis subscription and prevent connection leaks.
backend/tests/integration/test_events_routes.py (1)
63-72: Inconsistent metadata access pattern.

Line 64 accesses event.metadata.user_id as an attribute (implying metadata is an object), but lines 69-70 assert isinstance(event.metadata, dict). If metadata is a Pydantic model with a user_id attribute, the isinstance(..., dict) check will fail.

Either:

Access metadata fields via dict notation: event.metadata["user_id"]

Or update the type check to match the actual schema type
Suggested fix if metadata is a Pydantic model
-                if event.metadata:
-                    assert isinstance(event.metadata, dict)
+                # metadata is validated via Pydantic schema; no dict check needed

🤖 Fix all issues with AI agents

In @backend/tests/integration/app/test_main_app.py:
- Around line 25-27: The test's CORS preflight assertion is too permissive
because it allows a plain 200 without CORS headers; update the assertion so the
response MUST include the Access-Control-Allow-Origin header (check that
"access-control-allow-origin" is in resp.headers) and also verify its value is
either the request Origin ("http://test.com") or "*" and that resp.status_code
== 200; locate the call to client.options("/api/v1/health", headers={"Origin":
"http://test.com"}) and change the assertion on resp to require the header
presence and correct value rather than using an OR with the status code.

In @backend/tests/integration/idempotency/test_consumer_idempotent.py:
- Around line 73-77: The test only asserts seen["n"] >= 1, which doesn't confirm
duplicates were blocked; update the assertion in the backoff-wrapped _wait_one
(and/or add a short await asyncio.sleep after it) to verify exactly one
processing occurred by asserting seen["n"] == 1; specifically, locate the async
def _wait_one() that currently asserts seen["n"] >= 1 and change it to assert
seen["n"] == 1 (or keep the backoff check for >=1 then immediately await a small
delay and add a final assert seen["n"] == 1) to ensure duplicate events were not
processed.

In @backend/tests/unit/core/test_csrf.py:
- Around line 30-33: The test test_csrf_missing_header_raises_when_authenticated
expects the wrong exception type; change the assertion to expect
CSRFValidationError instead of ValueError so it matches the behavior of
validate_csrf_token; update the pytest.raises() in
test_csrf_missing_header_raises_when_authenticated to use CSRFValidationError to
align with the implementation.

In @backend/tests/unit/services/saga/test_execution_saga_steps.py:
- Around line 174-179: The assertion for QueueExecutionStep().get_compensation()
is effectively disabled by the trailing "or True"; remove the "or True" and
replace the no-op with a meaningful check: either assert
isinstance(QueueExecutionStep().get_compensation(),
DeletePodCompensation.__class__.__mro__[1]) (or the actual expected compensation
class) to validate the type, or if the line is purely for coverage assert that
the result is not None (e.g., assert QueueExecutionStep().get_compensation() is
not None). Locate the assertion using the symbols QueueExecutionStep,
get_compensation, and DeletePodCompensation and update accordingly.

In @backend/tests/unit/services/saga/test_saga_orchestrator_unit.py:
- Around line 58-59: The test double's get_compensation currently returns None
and is typed -> None, violating the SagaStep contract; change get_compensation
to return a minimal CompensationStep stub instance (and update the return type
annotation to -> CompensationStep) so tests conform to the interface; ensure you
import or reference the real CompensationStep class used in production and
construct a no-op stub (e.g., a CompensationStep with a do/execute that is a
no-op) and return that from get_compensation in the test double.

🧹 Nitpick comments (37)

backend/tests/integration/test_notifications_routes.py (1)
78-83: Consider reverting to inline iteration for cleaner code.

The intermediate statuses list doesn't add value here. The previous inline iteration pattern is more idiomatic Python and equally clear.
♻️ Proposed simplification
-        statuses = [
-            NotificationStatus.READ.value,
-            NotificationStatus.DELIVERED.value,
-            NotificationStatus.SKIPPED.value,
-        ]
-        for status in statuses:
+        for status in [
+            NotificationStatus.READ.value,
+            NotificationStatus.DELIVERED.value,
+            NotificationStatus.SKIPPED.value,
+        ]:
             response = await client.get(f"/api/v1/notifications?status={status}&limit=5")
backend/tests/load/strategies.py (1)
50-51: Consider consolidating identical strategies.

Both labels and annot_strat are identical in implementation. While they serve different semantic purposes in the schema, consolidating them could reduce duplication. Additionally, the naming is inconsistent—annot_strat is abbreviated while labels is explicit.
♻️ Optional refactor to consolidate and improve naming
-labels: st.SearchStrategy[dict[str, str]] = st.dictionaries(label_key, label_val, max_size=8)
-annot_strat: st.SearchStrategy[dict[str, str]] = st.dictionaries(label_key, label_val, max_size=8)
+# Shared strategy for both labels and annotations (identical structure)
+string_dict_strat: st.SearchStrategy[dict[str, str]] = st.dictionaries(label_key, label_val, max_size=8)
+labels: st.SearchStrategy[dict[str, str]] = string_dict_strat
+annotations: st.SearchStrategy[dict[str, str]] = string_dict_strat
Then update usages:
     "annotations": annot_strat,
+    "annotations": annotations,
     "commonAnnotations": annot_strat,
+    "commonAnnotations": annotations,
Alternatively, if you prefer to keep them separate for potential future divergence, consider using a more explicit name like annotations instead of annot_strat for consistency with labels.
backend/tests/integration/services/saved_script/test_saved_script_service.py (1)

27-28: Consider a more explicit assertion (optional).

The any() check verifies the created script is present, which is appropriate for this CRUD test. If you want to be more thorough, you could verify the count or filter the list to ensure no duplicates, but the current assertion is sufficient for the test's purpose.
backend/tests/unit/services/coordinator/test_queue_manager.py (1)
12-13: Consider using ExecutionRequestedEvent instead of Any for better type safety.

The helper function returns make_execution_requested_event(...), which has a concrete return type of ExecutionRequestedEvent. Using Any loses this type information and reduces the benefits of static type checking.
🔧 Suggested refactor
-def ev(execution_id: str, priority: int = QueuePriority.NORMAL.value) -> Any:
+def ev(execution_id: str, priority: int = QueuePriority.NORMAL.value) -> ExecutionRequestedEvent:
     return make_execution_requested_event(execution_id=execution_id, priority=priority)
You'll also need to import ExecutionRequestedEvent:
from app.domain.events import ExecutionRequestedEvent
backend/tests/load/config.py (1)
24-26: Consider extracting boolean parsing logic.

Similar to the _get_mode() refactor, the repeated pattern for parsing boolean environment variables could be extracted into a helper function for better maintainability:
♻️ Proposed helper function
def _get_bool_env(key: str, default: str = "true") -> bool:
    """Parse boolean from env var, accepting '1', 'true', 'yes' (case-insensitive)."""
    return os.getenv(key, default).lower() in ("1", "true", "yes")
Then apply to the fields:
-    verify_tls: bool = field(
-        default_factory=lambda: os.getenv("LOAD_VERIFY_TLS", "false").lower() in ("1", "true", "yes")
-    )
+    verify_tls: bool = field(default_factory=lambda: _get_bool_env("LOAD_VERIFY_TLS", "false"))
And similarly for the other boolean fields.
Also applies to: 38-40, 46-57
backend/tests/unit/events/test_schema_registry_manager.py (1)
11-11: Type hints improve test quality.

The explicit type annotations for test_settings and caplog enhance IDE support and make test dependencies clear.

Note: The caplog fixture is declared but not used in either test function. If logging assertions aren't planned, you could optionally remove it for clarity.
🧹 Optional cleanup to remove unused fixture

If logging assertions aren't planned, consider removing the unused caplog parameter:
-def test_deserialize_json_execution_requested(test_settings: Settings, caplog: pytest.LogCaptureFixture) -> None:
+def test_deserialize_json_execution_requested(test_settings: Settings) -> None:
-def test_deserialize_json_missing_type_raises(test_settings: Settings, caplog: pytest.LogCaptureFixture) -> None:
+def test_deserialize_json_missing_type_raises(test_settings: Settings) -> None:
Also applies to: 36-36
backend/tests/unit/services/sse/test_kafka_redis_bridge.py (1)

51-57: LGTM!

The type: ignore[arg-type] comments are appropriate for passing test doubles to the bridge constructor. The test fakes serve their purpose well for isolated unit testing.

Optional: Consider using Protocol types

If you want to avoid type ignores in the future, you could define Protocol types for the dependencies and make the test fakes implement them. However, for test code, the current approach is perfectly acceptable.
backend/tests/unit/services/sse/test_sse_service.py (3)
79-81: Consider using a more specific return type.

The method returns self._evt (an asyncio.Event), so the return type could be asyncio.Event instead of Any for better type clarity.
🔍 Suggested refinement
-    async def register_connection(self, execution_id: str, connection_id: str) -> Any:
+    async def register_connection(self, execution_id: str, connection_id: str) -> asyncio.Event:
         self.registered.append((execution_id, connection_id))
         return self._evt
112-116: Redundant type annotation on local variable.

The explicit type annotation on result (line 115) is redundant since the function's return type annotation already declares dict[str, Any].
♻️ Simplification
 def _decode(evt: dict[str, Any]) -> dict[str, Any]:
     import json
 
-    result: dict[str, Any] = json.loads(evt["data"])
-    return result
+    return json.loads(evt["data"])
124-131: Consistent and explicit test double usage.

The multi-line SSEService instantiation with explicit type: ignore[arg-type] annotations makes the test setup clear and readable. The pattern is consistently applied across all test functions.

For future maintainability, consider extracting a factory helper to reduce duplication:
def _create_test_sse_service(
    repo: _FakeRepo | None = None,
    router: _FakeRouter | None = None,
    bus: _FakeBus | None = None,
    shutdown_manager: _FakeShutdown | None = None,
    settings: _FakeSettings | None = None,
) -> SSEService:
    return SSEService(
        repository=repo or _FakeRepo(),  # type: ignore[arg-type]
        router=router or _FakeRouter(),  # type: ignore[arg-type]
        sse_bus=bus or _FakeBus(),  # type: ignore[arg-type]
        shutdown_manager=shutdown_manager or _FakeShutdown(),  # type: ignore[arg-type]
        settings=settings or _FakeSettings(),  # type: ignore[arg-type]
        logger=_test_logger,
    )
This would reduce repetition while maintaining the explicit typing.

Also applies to: 168-175, 198-205, 240-247
backend/tests/unit/events/test_event_dispatcher.py (1)
15-16: Consider inlining the wrapper for directness.

While the consistent naming pattern is clear, make_requested_event() is a thin wrapper around make_execution_requested_event(execution_id="e1"). Calling the helper directly in the test (line 64) would eliminate a layer of indirection.
♻️ Optional refactor to eliminate wrapper

Remove the wrapper and call the helper directly in the test:
-def make_requested_event() -> ExecutionRequestedEvent:
-    return make_execution_requested_event(execution_id="e1")
-
-
Then at line 64:
-    await disp.dispatch(make_requested_event())
+    await disp.dispatch(make_execution_requested_event(execution_id="e1"))
backend/tests/integration/services/events/test_event_bus.py (1)

22-26: LGTM: Backoff-based polling is a solid improvement over eventuality-style waiting.

The backoff configuration (50ms polling interval, 2s max timeout) is appropriate for an integration test. The decorator will retry on AssertionError until the event is received or the timeout is reached.

Minor observation: _wait_received() is declared as async def but doesn't contain any await statements. While technically valid and consistent with the async test style, it could be a synchronous function. This is a very minor code smell but acceptable in this context.
backend/tests/unit/services/saga/test_execution_saga_steps.py (1)
187-192: Consider multi-line formatting for __init__ methods.

Single-line method definitions with pass work but are unconventional. For consistency with PEP8:
♻️ Suggested formatting
     class DummyProd(UnifiedProducer):
-        def __init__(self) -> None: pass
+        def __init__(self) -> None:
+            pass

     class DummyAlloc(ResourceAllocationRepository):
-        def __init__(self) -> None: pass
+        def __init__(self) -> None:
+            pass
backend/tests/unit/services/pod_monitor/test_event_mapper.py (3)
23-30: Consider removing or using the unused event_type parameter.

The event_type parameter is declared but never used within the function body, requiring a noqa: ARG001 suppression. Consider either:

Removing the parameter if it's not needed, or

Passing it to PodContext if it was intended to be used
♻️ Option 1: Remove the unused parameter
-def _ctx(pod: Pod, event_type: str = "ADDED") -> PodContext:  # noqa: ARG001
+def _ctx(pod: Pod) -> PodContext:
     return PodContext(
         pod=pod,
         execution_id="e1",
         metadata=AvroEventMetadata(service_name="t", service_version="1"),
         phase=pod.status.phase or "",
-        event_type=event_type,
+        event_type="ADDED",
     )
122-122: The smoke test is functional but could be more explicit.

The _ctx(p) call validates that context creation doesn't crash, but the intent could be clearer. Consider adding an assertion or a more descriptive comment.
♻️ Make the validation more explicit
-    _ctx(p)  # validate context creation works
+    # Validate context creation doesn't raise exceptions
+    ctx = _ctx(p)
+    assert ctx.execution_id == "L1"
168-168: Consider splitting the compound assertion for clearer failure messages.

The combined assertion makes it harder to identify which condition failed. Splitting into separate assertions improves debugging.
♻️ Split into separate assertions
-    assert logs is not None and logs.exit_code == 3 and logs.stdout == "x"
+    assert logs is not None
+    assert logs.exit_code == 3
+    assert logs.stdout == "x"
backend/tests/unit/services/sse/test_redis_bus.py (2)
48-52: Consider extracting the polling pattern to a helper function.

Both tests implement an identical polling loop with hardcoded retry counts and sleep intervals. Extracting this into a shared helper would improve maintainability and make timing adjustments easier.
♻️ Example refactor
async def _poll_for_message(sub, message_type, max_attempts: int = 20, delay: float = 0.05):
    """Poll subscription until message arrives or max attempts reached."""
    for _ in range(max_attempts):
        msg = await sub.get(message_type)
        if msg:
            return msg
        await asyncio.sleep(delay)
    return None
Then use it in tests:
# In test_publish_and_subscribe_round_trip
msg = await _poll_for_message(sub, RedisSSEMessage)

# In test_notifications_channels  
got = await _poll_for_message(nsub, RedisNotificationMessage)
Also applies to: 92-97

55-56: Consider more comprehensive assertions for message content.

The tests only verify the message IDs but don't check other fields like exit_code, stdout for ExecutionCompletedEvent, or severity, status, subject for RedisNotificationMessage. Adding assertions for these fields would strengthen the tests and catch serialization/deserialization regressions.

Also applies to: 100-101
backend/tests/unit/services/pod_monitor/test_monitor.py (1)

211-211: Consider broadening config type to accept float.

The type: ignore[assignment] comments suggest reconcile_interval_seconds expects an int but tests set it to 0.01 (float) for faster execution. If fractional seconds are valid in production, consider updating the config field type to int | float to avoid these suppressions.

Also applies to: 793-793

backend/tests/integration/result_processor/test_result_processor.py (2)

69-69: Consider using _ prefix or a more specific event type.

Since the event parameter is unused in the handler (it only sets a flag), consider either:

Using _event: object or just _ to signal it's intentionally ignored, or

Using the specific event type for RESULT_STORED if available for better type safety.

Using Any is permissive and bypasses type checking benefits.

108-114: LGTM! More explicit polling mechanism.

The backoff-based approach is clearer and more configurable than the previous eventually helper. The configuration (12 seconds max, 0.2s intervals) is reasonable for integration tests.

Optional: Consider extracting the timeout values as test constants if they're reused across multiple tests.
backend/tests/unit/services/sse/test_sse_shutdown_manager.py (2)
13-18: Rename stop() to aclose() for interface consistency.

The _FakeRouter defines a stop() method, but the actual SSEKafkaRedisBridge interface uses aclose() (as shown in the _force_close_connections method which calls await self._router.aclose()). File test_shutdown_manager.py correctly uses aclose() in its DummyRouter.
🔄 Proposed fix to align with the actual interface
 class _FakeRouter:
     def __init__(self) -> None:
         self.stopped = False
 
-    async def stop(self) -> None:
+    async def aclose(self) -> None:
         self.stopped = True
35-39: Consider extracting the backoff pattern into a reusable test helper.

The backoff-decorated assertion pattern is repeated in multiple tests. While functional, extracting this into a parameterized helper (e.g., async def wait_for_condition(condition_fn, max_time)) would reduce duplication and improve readability.
backend/tests/e2e/conftest.py (1)
12-12: Consider using more precise return type for async fixture.

The -> Any return type is functional but less precise than it could be. Since this fixture yields without a value, consider using -> AsyncGenerator[None, None] to match the pattern used in integration tests.
♻️ Suggested refinement
+from collections.abc import AsyncGenerator
+
-async def _cleanup(db: Database, redis_client: redis.Redis) -> Any:
+async def _cleanup(db: Database, redis_client: redis.Redis) -> AsyncGenerator[None, None]:
backend/tests/integration/services/idempotency/test_redis_repository.py (2)
100-106: Simplify TTL assertion.

The assertion ttl == sample_record.ttl_seconds or ttl > 0 is always true if ttl > 0 (which it must be for a valid key with expiry). Consider a more precise assertion.
♻️ Suggested improvement
-    ttl = await redis_client.ttl(key)
-    assert ttl == sample_record.ttl_seconds or ttl > 0
+    ttl = await redis_client.ttl(key)
+    assert 0 < ttl <= sample_record.ttl_seconds
118-120: Simplify TTL comparison assertion.

ttl_after == ttl or ttl_after <= ttl is logically equivalent to ttl_after <= ttl.
♻️ Suggested simplification
-    ttl_after = await redis_client.ttl(key)
-    assert ttl_after == ttl or ttl_after <= ttl  # ttl should not increase
+    ttl_after = await redis_client.ttl(key)
+    assert ttl_after <= ttl  # TTL should not increase
backend/tests/integration/test_admin_routes.py (1)
82-85: Dead comment referencing unused variable.

The comment mentions original_settings being preserved for rollback verification, but the variable from the previous line's response is never used. Either use it for verification or remove both the assignment and comment.
♻️ Suggested fix
         # Get original settings
         original_response = await client.get("/api/v1/admin/settings/")
         assert original_response.status_code == 200
-        # original_settings preserved for potential rollback verification
+        original_settings = original_response.json()
Then use original_settings in the reset verification, or simply remove the comment if not needed.
backend/tests/conftest.py (1)
138-145: HTTP login helper with explicit return type.

The intermediate variable token: str makes the return type explicit, though the function signature could also be annotated.
♻️ Consider adding function return type annotation
-async def _http_login(client: httpx.AsyncClient, username: str, password: str) -> str:
+async def _http_login(client: httpx.AsyncClient, username: str, password: str) -> str:
     data = {"username": username, "password": password}
     resp = await client.post("/api/v1/auth/login", data=data)
     resp.raise_for_status()
-    token: str = resp.json().get("csrf_token", "")
-    return token
+    return resp.json().get("csrf_token", "")
The return type is already annotated in the signature, so the intermediate variable is redundant.
backend/tests/integration/test_dlq_routes.py (1)
269-269: Unnecessary string cast.

The str() cast is unnecessary since policy_data["topic"] is already defined as a string literal on line 254. The original value can be used directly in the assertion.
♻️ Simplify the assertion
-        assert str(policy_data["topic"]) in result.message
+        assert policy_data["topic"] in result.message
backend/tests/unit/services/idempotency/test_middleware.py (1)

118-122: Consider removing empty test classes or adding a TODO.

These empty test classes appear to be placeholders. Given the comment on line 115 stating coverage is "by integration tests," consider either removing these stubs or adding explicit # TODO comments to clarify intent.
backend/tests/integration/services/rate_limit/test_rate_limit_service.py (1)
218-234: Test lacks prefix isolation.

This test writes to a key f"{svc.prefix}tb:user:/api" without using unique_id for prefix isolation. If run in parallel with other tests, there's potential for key collisions.
Suggested fix
 @pytest.mark.asyncio
-async def test_token_bucket_invalid_data(scope: AsyncContainer) -> None:
+async def test_token_bucket_invalid_data(scope: AsyncContainer, unique_id: Callable[[str], str]) -> None:
     svc: RateLimitService = await scope.get(RateLimitService)
-    key = f"{svc.prefix}tb:user:/api"
+    svc.prefix = f"{svc.prefix}{unique_id('')}:"
+    key = f"{svc.prefix}tb:user:/api"
backend/tests/integration/idempotency/test_decorator_idempotent.py (2)
27-31: Hardcoded execution_id may cause test flakiness in parallel runs.

The test uses a hardcoded execution_id="exec-deco-1" which interacts with Redis-based idempotency storage. If tests run in parallel or are retried, the idempotency key from a previous run could still exist, causing false negatives.

Consider using unique_id fixture for isolation, consistent with other tests in this PR.
Suggested fix
 @pytest.mark.asyncio
-async def test_decorator_blocks_duplicate_event(scope: AsyncContainer) -> None:
+async def test_decorator_blocks_duplicate_event(scope: AsyncContainer, unique_id: Callable[[str], str]) -> None:
     idm: IdempotencyManager = await scope.get(IdempotencyManager)

     calls = {"n": 0}

     @idempotent_handler(idempotency_manager=idm, key_strategy="event_based", logger=_test_logger)
     async def h(ev: Any) -> None:
         calls["n"] += 1

-    ev = make_execution_requested_event(execution_id="exec-deco-1")
+    ev = make_execution_requested_event(execution_id=unique_id("exec-deco-"))

     await h(ev)
     await h(ev)  # duplicate
     assert calls["n"] == 1
Add the import at the top:
from collections.abc import Callable
47-51: Same isolation concern for second test.

The test_decorator_custom_key_blocks test uses hardcoded execution IDs and a fixed custom key "fixed-key". The fixed key is particularly problematic for parallel/repeated test runs.
Suggested fix
 @pytest.mark.asyncio
-async def test_decorator_custom_key_blocks(scope: AsyncContainer) -> None:
+async def test_decorator_custom_key_blocks(scope: AsyncContainer, unique_id: Callable[[str], str]) -> None:
     idm: IdempotencyManager = await scope.get(IdempotencyManager)

     calls = {"n": 0}
+    fixed_key_value = unique_id("fixed-key-")

     def fixed_key(_ev: Any) -> str:
-        return "fixed-key"
+        return fixed_key_value
backend/tests/integration/test_auth_routes.py (1)
125-127: Consider clarifying the comment.

The inline comment on line 126 appears truncated. Consider completing it to explain why duplicate email registration might be allowed in some configurations.
Suggested clarification
         # Backend might allow duplicate emails but not duplicate usernames
-        # If it allows the registration, that's also valid behavior
+        # If it allows the registration, that's also valid behavior for this backend
         assert duplicate_response.status_code in [200, 201, 400, 409]
backend/tests/load/plot_report.py (1)
13-14: Consider returning directly for simplicity.

The intermediate variable with explicit type annotation doesn't add value since the function already declares its return type. The original direct return was cleaner.
♻️ Simplify by returning directly
-        result: Dict[str, Any] = json.load(f)
-        return result
+        return json.load(f)
backend/tests/unit/core/test_csrf.py (1)

6-15: LGTM! Cookie handling implementation looks correct.

The multiline function signature improves readability, and the cookie handling logic properly constructs the HTTP cookie header using the standard "; " separator format. This enhancement allows tests to simulate cookie-based authentication correctly.

For production code (if this pattern is reused outside tests), consider URL-encoding cookie values to handle special characters, though for current test scenarios the simple string concatenation is sufficient.

backend/pyproject.toml (1)

137-137: Consider using a more constrained version specifier for fakeredis.

The >=2.33.0 specifier allows any future version, which could introduce breaking changes or unexpected behavior. Consider using ~=2.33.0 (compatible release) to limit updates to patch versions within 2.33.x only, improving reproducibility and stability.

backend/tests/integration/app/test_main_app.py

backend/tests/integration/idempotency/test_consumer_idempotent.py

backend/tests/unit/core/test_csrf.py

backend/tests/unit/services/saga/test_execution_saga_steps.py

backend/tests/unit/services/saga/test_saga_orchestrator_unit.py

cubic-dev-ai

3 issues found across 40 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/tests/integration/test_notifications_routes.py">

<violation number="1" location="backend/tests/integration/test_notifications_routes.py:346">
P1: Test logic flaw: After `make_user(UserRole.ADMIN)`, the client is logged in as ADMIN, not USER. The first `client.get("/api/v1/notifications")` fetches admin's notifications, not user's. Both fetches return the same user's data, so this test doesn't actually verify user isolation. Capture the USER credentials and explicitly login before fetching user notifications.</violation>
</file>

<file name="backend/tests/integration/test_saga_routes.py">

<violation number="1" location="backend/tests/integration/test_saga_routes.py:167">
P1: Bug: `make_user` logs in users, so calling it twice means user2 is logged in when fetching "user1's" sagas. The first user's sagas are never actually retrieved - both `response1` and `response2` reflect user2's session, defeating the access control test purpose. Store user1's data and fetch their sagas before creating user2, or logout/login explicitly between users.</violation>
</file>

<file name="backend/tests/conftest.py">

<violation number="1" location="backend/tests/conftest.py:187">
P2: This fixture mutates the injected `client` fixture's headers rather than creating a copy. If a test uses both `client` and `authenticated_client` parameters, they will be the same object instance - `client` will have auth headers applied unexpectedly. Consider creating a new client instance or documenting this behavior.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/tests/integration/test_notifications_routes.py

backend/tests/integration/test_saga_routes.py

backend/tests/conftest.py

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

backend/tests/integration/test_replay_routes.py (3)

297-307: Align payload structure with ReplayRequest model.

The raw dict payload uses field names (filters, target_topic, preserve_timing) that don't match the ReplayRequest model structure used in other tests (lines 34-44). This inconsistency will likely trigger 422 validation errors and prevent the test from properly exercising state transitions.

♻️ Refactor to use ReplayRequest model consistently

-        replay_request = {
-            "name": f"State Test Session {unique_id('')}",
-            "description": "Testing state transitions",
-            "filters": {
-                "event_types": ["state.test.event"],
-                "start_time": (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat(),
-                "end_time": datetime.now(timezone.utc).isoformat()
-            },
-            "target_topic": "state-test-topic",
-            "speed_multiplier": 1.0
-        }
+        replay_request = ReplayRequest(
+            replay_type=ReplayType.QUERY,
+            target=ReplayTarget.KAFKA,
+            filter=ReplayFilter(
+                event_types=[EventType.SYSTEM_ERROR],
+                start_time=datetime.now(timezone.utc) - timedelta(hours=1),
+                end_time=datetime.now(timezone.utc),
+            ),
+            speed_multiplier=1.0,
+        ).model_dump(mode="json")

336-357: Align complex filter payload with ReplayRequest and ReplayFilter models.

This test uses a raw dict with incorrect field names (filters, target_topic, preserve_timing, batch_size) that don't align with the ReplayRequest model. The filter fields like aggregate_id, correlation_id, and user_id should be nested within ReplayFilter as described in the PR summary.

♻️ Refactor to use ReplayRequest with ReplayFilter

-        replay_request = {
-            "name": f"Complex Filter Session {unique_id('')}",
-            "description": "Testing complex event filters",
-            "filters": {
-                "event_types": [
-                    "execution.requested",
-                    "execution.started",
-                    "execution.completed",
-                    "execution.failed"
-                ],
-                "start_time": (datetime.now(timezone.utc) - timedelta(days=30)).isoformat(),
-                "end_time": datetime.now(timezone.utc).isoformat(),
-                "aggregate_id": unique_id("aggregate-"),
-                "correlation_id": unique_id("corr-"),
-                "user_id": unique_id("user-"),
-                "service_name": "execution-service"
-            },
-            "target_topic": "complex-filter-topic",
-            "speed_multiplier": 0.1,  # Slow replay
-            "preserve_timing": False,
-            "batch_size": 100
-        }
+        replay_request = ReplayRequest(
+            replay_type=ReplayType.QUERY,
+            target=ReplayTarget.KAFKA,
+            filter=ReplayFilter(
+                event_types=[
+                    EventType.EXECUTION_REQUESTED,
+                    EventType.EXECUTION_STARTED,
+                    EventType.EXECUTION_COMPLETED,
+                    EventType.EXECUTION_FAILED,
+                ],
+                start_time=datetime.now(timezone.utc) - timedelta(days=30),
+                end_time=datetime.now(timezone.utc),
+                aggregate_id=unique_id("aggregate-"),
+                correlation_id=unique_id("corr-"),
+            ),
+            speed_multiplier=0.1,
+        ).model_dump(mode="json")

Note: Remove user_id, service_name, preserve_timing, and batch_size if they're not supported in the current ReplayFilter or ReplayRequest models.

376-386: Align payload structure with ReplayRequest model.

The raw dict payload uses incorrect field names (filters, target_topic) that don't match the ReplayRequest model used consistently in other tests.

♻️ Refactor to use ReplayRequest model

-        replay_request = {
-            "name": f"Progress Test Session {unique_id('')}",
-            "description": "Testing progress tracking",
-            "filters": {
-                "event_types": ["progress.test.event"],
-                "start_time": (datetime.now(timezone.utc) - timedelta(minutes=30)).isoformat(),
-                "end_time": datetime.now(timezone.utc).isoformat()
-            },
-            "target_topic": "progress-test-topic",
-            "speed_multiplier": 10.0  # Fast replay
-        }
+        replay_request = ReplayRequest(
+            replay_type=ReplayType.QUERY,
+            target=ReplayTarget.KAFKA,
+            filter=ReplayFilter(
+                event_types=[EventType.SYSTEM_ERROR],
+                start_time=datetime.now(timezone.utc) - timedelta(minutes=30),
+                end_time=datetime.now(timezone.utc),
+            ),
+            speed_multiplier=10.0,
+        ).model_dump(mode="json")

backend/tests/e2e/test_execution_routes.py (1)

505-506: Remove duplicate assertion.

Line 506 duplicates line 505. This appears to be a copy-paste error.
🧹 Proposed fix
 # Use idempotency header on both requests to guarantee keying
 r1 = await client.post("/api/v1/execute", json=execution_request, headers=headers)
 assert r1.status_code == 200
-assert r1.status_code == 200
 e1 = r1.json()["execution_id"]

🤖 Fix all issues with AI agents

In @backend/tests/e2e/test_execution_routes.py:
- Around line 176-184: The assertion compares a raw string result["status"] to
an enum instance ExecutionStatusEnum.COMPLETED; convert one side so types
match—either compare result["status"] to ExecutionStatusEnum.COMPLETED.value or
construct the enum from the string (e.g., ExecutionStatusEnum(result["status"])
== ExecutionStatusEnum.COMPLETED) in the test (see poll_until_terminal result
handling and the assertion using ExecutionStatusEnum).
- Around line 141-144: The test is comparing result["status"] (a string) to
ExecutionStatusEnum members (enum instances), causing the assertion to always
fail; update the assertion to compare like types by either converting the enum
members to their string values (e.g., assert result["status"] in
(ExecutionStatusEnum.FAILED.value, ExecutionStatusEnum.ERROR.value)) or
converting the status string to an enum before comparison (e.g., assert
ExecutionStatusEnum(result["status"]) in (ExecutionStatusEnum.FAILED,
ExecutionStatusEnum.ERROR)); change the assertion around the poll_until_terminal
result accordingly.

In @backend/tests/integration/test_events_routes.py:
- Around line 456-487: The test test_events_isolation_between_users assumes the
AsyncClient is authenticated for each user but never sets the client's auth
headers; after creating the first user via make_user(UserRole.USER) update the
client's headers (e.g., client.headers.update(user["headers"])) before calling
client.get("/api/v1/events/user?limit=10"), and after creating/logging in the
admin similarly ensure the client uses the admin's auth (either update
client.headers with admin["headers"] or rely on the login POST to set cookies
and then refresh headers) so each GET uses the correct user's credentials.

In @backend/tests/integration/test_notifications_routes.py:
- Around line 141-149: Remove the redundant GET to
"/api/v1/notifications/unread-count": keep a single call (use the existing
count_response variable) and assert its status_code is 200, then parse
count_response.json() and assert count_data["unread_count"] == 0; delete the
earlier unused unread_response variable and its status assertion so only one
network request is made.
- Around line 342-371: The test fetches the first user's notifications after
creating the admin because make_user updates client.headers; to fix, preserve or
fetch the first user's notifications before calling make_user for the admin:
either (A) move the user notifications GET (the request that populates
user_notification_ids) to immediately after the first await
make_user(UserRole.USER) and before calling await make_user(UserRole.ADMIN), or
(B) save a copy of client.headers (e.g., user_headers = dict(client.headers))
right after creating the first user, then after creating the admin restore
client.headers = user_headers when performing the GET that builds
user_notification_ids; ensure the admin_notifications_response GET uses the
admin-authenticated headers afterwards so the two lists come from distinct
users.

In @backend/tests/integration/test_saga_routes.py:
- Around line 163-190: The test incorrectly logs in user2 immediately after
creating user1, so the first GET (client.get("/api/v1/sagas/")) actually uses
user2's session; fix by creating user1 with make_user(UserRole.USER) and
capturing its credentials without logging in as user2, then call
client.get("/api/v1/sagas/") to fetch user1_sagas, then logout, login as user2
using client.post("/api/v1/auth/login", data=...), fetch user2_sagas with
client.get("/api/v1/sagas/"), and validate that
SagaListResponse(**response.json()) for each user contains only their sagas;
ensure make_user is used in a way that returns username/password for manual
login when needed rather than auto-switching the client session.

In @backend/tests/integration/test_saved_scripts_routes.py:
- Around line 244-270: The test logs in users via POST to "/api/v1/auth/login"
but does not extract and set the CSRF token on client.headers, so subsequent
POSTs like client.post("/api/v1/scripts", ...) may be rejected; after each login
call capture the login response JSON csrf token (as done in the
authenticated_client fixture) and set client.headers["X-CSRF-Token"] = csrf (or
replace the explicit login flow with the authenticated_client /
authenticated_admin_client fixtures) so the CSRF header is present before
calling create_response = await client.post("/api/v1/scripts",
json=user_script_data).

In @backend/tests/unit/services/sse/test_redis_bus.py:
- Around line 61-64: Remove the fragile sleep by making the test check for
invalid JSON deterministically: call sub.get(RedisSSEMessage) without awaiting a
sleep (if sub.get is non-blocking) or replace the sleep+direct assert with a
timeout-based helper like _wait_for_message that polls/waits up to a short
timeout for a message and assert it returns None; update the test around
redis.publish("sse:exec:exec-1", "not-json") to use sub.get(RedisSSEMessage)
immediately or via the timeout helper so the assertion no longer depends on
asyncio.sleep(0.05).

🧹 Nitpick comments (11)

backend/tests/integration/test_dlq_routes.py (2)
72-72: Optional: Remove redundant enum validation.

Since message is validated as DLQMessageResponse by Pydantic (line 66), and DLQMessageResponse.status is typed as DLQMessageStatus, this assertion is redundant—Pydantic already ensures status is a valid enum member.
♻️ Simplify by removing redundant check
-            assert message.status in DLQMessageStatus.__members__.values()
161-161: Optional: Remove redundant str() cast.

Since policy_data["topic"] is already the string literal "test-topic" (line 146), the str() cast is unnecessary.
♻️ Remove redundant cast
-        assert str(policy_data["topic"]) in result.message
+        assert policy_data["topic"] in result.message
backend/tests/integration/test_user_settings_routes.py (2)
218-244: Verify the restore actually restored to the correct state.

The test calls the restore endpoint and checks for a 200 response, but doesn't verify that settings were actually restored to the restore_point state. After the restore at Line 238, consider asserting that the current theme matches new_theme (the value at the restore point), not just that settings are retrievable.
✨ Suggested enhancement
 # Try to restore to the restore point
 restore_data = {"timestamp": restore_point}
 restore_resp = await authenticated_client.post("/api/v1/user/settings/restore", json=restore_data)
 assert restore_resp.status_code == 200

 # Verify we get valid settings back
 current_resp = await authenticated_client.get("/api/v1/user/settings/")
 assert current_resp.status_code == 200
+
+# Verify theme was restored to the restore point value
+current_settings = UserSettings(**current_resp.json())
+assert current_settings.theme == new_theme, f"Expected theme to be restored to {new_theme}"
269-304: Strengthen the isolation assertion logic.

The assertion at lines 300-303 uses an OR condition that only requires one of the two settings to differ. This is a weak check that could miss partial state leakage if one setting was somehow affected by user1's changes.
🔒 Stronger isolation verification
 # Verify second user's settings are not affected by first user's changes
-assert (
-    user2_settings["theme"] != user1_update["theme"]
-    or user2_settings["timezone"] != user1_update["timezone"]
-)
+# Both settings should differ from user1's updates (user2 has defaults)
+assert user2_settings["theme"] != user1_update["theme"], \
+    "User2's theme should not match user1's custom theme"
+assert user2_settings["timezone"] != user1_update["timezone"], \
+    "User2's timezone should not match user1's custom timezone"
This change ensures both settings are isolated, not just one.
backend/tests/integration/services/idempotency/test_redis_repository.py (2)
102-102: Simplify TTL assertion.

ttl == sample_record.ttl_seconds or ttl > 0 is redundant—if the first condition is true, the second is implied. Consider simplifying to assert 0 < ttl <= sample_record.ttl_seconds to clearly express the intent.
Suggested fix
-    assert ttl == sample_record.ttl_seconds or ttl > 0
+    assert 0 < ttl <= sample_record.ttl_seconds
120-120: Simplify redundant TTL comparison.

ttl_after == ttl or ttl_after <= ttl simplifies to ttl_after <= ttl since equality is already included in <=.
Suggested fix
-    assert ttl_after == ttl or ttl_after <= ttl  # ttl should not increase
+    assert ttl_after <= ttl  # ttl should not increase
backend/tests/unit/services/saga/test_execution_saga_steps.py (2)
22-23: Consider preserving specific return type for better type safety.

The return type Any reduces type checking benefits. If make_execution_requested_event returns a specific event type (e.g., ExecutionRequestedEvent), preserving that type would catch more errors at compile time.
♻️ Optional: Restore specific return type
-def _req(timeout: int = 30, script: str = "print('x')") -> Any:
+def _req(timeout: int = 30, script: str = "print('x')") -> ExecutionRequestedEvent:
     return make_execution_requested_event(execution_id="e1", script=script, timeout_seconds=timeout)
Note: You may need to import ExecutionRequestedEvent if you choose this approach.
188-192: Consider calling super().__init__() in dummy classes.

The dummy classes don't call their parent class constructors. While this works for the current test (which only checks type binding), it could cause issues if the parent classes expect initialization or if these dummies are later used more extensively.
♻️ Defensive initialization for dummy classes
     class DummyProd(UnifiedProducer):
-        def __init__(self) -> None: pass
+        def __init__(self) -> None:
+            super().__init__()

     class DummyAlloc(ResourceAllocationRepository):
-        def __init__(self) -> None: pass
+        def __init__(self) -> None:
+            super().__init__()
Alternatively, if the parent classes have complex initialization requirements that aren't needed for the test, consider using unittest.mock.Mock with spec instead:
from unittest.mock import Mock

prod = Mock(spec=UnifiedProducer)
alloc = Mock(spec=ResourceAllocationRepository)
s.bind_dependencies(producer=prod, alloc_repo=alloc, publish_commands=True)
backend/tests/helpers/polling.py (1)
10-21: Consider adding error handling for unexpected responses.

The function silently retries on non-200 responses and could raise unhandled exceptions:

KeyError if the response JSON lacks a "status" field

ValueError if ExecutionStatus(data["status"]) receives an invalid status string

For a test helper, this may be acceptable (test will fail with a clear error), but you might want to add explicit error handling or logging for easier debugging.
♻️ Optional: Add defensive checks
 async def poll_until_terminal(
     client: AsyncClient, execution_id: str, *, timeout: float = 30.0
 ) -> dict[str, Any]:
     """Poll result endpoint until execution reaches terminal state."""
     async with asyncio.timeout(timeout):
         while True:
             r = await client.get(f"/api/v1/result/{execution_id}")
             if r.status_code == 200:
                 data: dict[str, Any] = r.json()
-                if ExecutionStatus(data["status"]).is_terminal:
-                    return data
+                status_str = data.get("status")
+                if status_str and ExecutionStatus(status_str).is_terminal:
+                    return data
             await asyncio.sleep(0.5)
backend/tests/helpers/protocols.py (2)
15-17: Consider documenting the model parameter.

The docstring doesn't explain the purpose of the model parameter or how it's used (e.g., for deserialization, type validation, etc.). While the type hint shows it expects a type object, the usage pattern isn't immediately clear.
📝 Suggested documentation improvement
-    async def get(self, model: type[Any], timeout: float = 0.5) -> Any | None:
-        """Get the next message from the subscription."""
-        ...
+    async def get(self, model: type[Any], timeout: float = 0.5) -> Any | None:
+        """Get the next message from the subscription.
+        
+        Args:
+            model: Model class to deserialize/validate the message against.
+            timeout: Maximum time to wait for a message in seconds.
+        """
+        ...
134-136: Consider aligning parameter naming with actual implementations.

The protocol defines the parameter as event, but the actual test implementation (shown in relevant snippets) uses event_to_produce. While structural typing doesn't require parameter names to match, consistency aids readability and prevents confusion when developers reference the protocol.
♻️ Optional parameter name alignment
-    async def produce(self, event: Any, key: str | None = None) -> None:
-        """Produce an event."""
+    async def produce(self, event_to_produce: Any, key: str | None = None) -> None:
+        """Produce an event."""
         ...
Alternatively, if event is the preferred naming convention, update the test implementations to match.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac646f and 6d49185.

📒 Files selected for processing (40)

backend/app/api/routes/execution.py
backend/app/domain/enums/execution.py
backend/tests/conftest.py
backend/tests/e2e/test_execution_routes.py
backend/tests/helpers/__init__.py
backend/tests/helpers/polling.py
backend/tests/helpers/protocols.py
backend/tests/integration/app/test_main_app.py
backend/tests/integration/core/__init__.py
backend/tests/integration/db/__init__.py
backend/tests/integration/idempotency/test_consumer_idempotent.py
backend/tests/integration/services/admin/__init__.py
backend/tests/integration/services/coordinator/__init__.py
backend/tests/integration/services/coordinator/test_execution_coordinator.py
backend/tests/integration/services/events/__init__.py
backend/tests/integration/services/execution/__init__.py
backend/tests/integration/services/idempotency/__init__.py
backend/tests/integration/services/idempotency/test_redis_repository.py
backend/tests/integration/services/notifications/__init__.py
backend/tests/integration/services/rate_limit/__init__.py
backend/tests/integration/services/replay/__init__.py
backend/tests/integration/services/saga/__init__.py
backend/tests/integration/services/saved_script/__init__.py
backend/tests/integration/services/user_settings/__init__.py
backend/tests/integration/test_admin_routes.py
backend/tests/integration/test_dlq_routes.py
backend/tests/integration/test_events_routes.py
backend/tests/integration/test_health_routes.py
backend/tests/integration/test_notifications_routes.py
backend/tests/integration/test_replay_routes.py
backend/tests/integration/test_saga_routes.py
backend/tests/integration/test_saved_scripts_routes.py
backend/tests/integration/test_sse_routes.py
backend/tests/integration/test_user_settings_routes.py
backend/tests/unit/core/test_csrf.py
backend/tests/unit/events/core/__init__.py
backend/tests/unit/services/pod_monitor/test_event_mapper.py
backend/tests/unit/services/saga/test_execution_saga_steps.py
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py
backend/tests/unit/services/sse/test_redis_bus.py

🚧 Files skipped from review as they are similar to previous changes (3)

backend/tests/unit/services/pod_monitor/test_event_mapper.py
backend/tests/unit/core/test_csrf.py
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py

🧰 Additional context used

🧬 Code graph analysis (20)

backend/tests/helpers/polling.py (1)

backend/app/domain/enums/execution.py (2)

ExecutionStatus (4-25)

is_terminal (17-25)

backend/tests/unit/services/sse/test_redis_bus.py (5)

backend/app/domain/execution/models.py (1)

ResourceUsageDomain (15-19)

backend/app/infrastructure/kafka/events/execution.py (1)

ExecutionCompletedEvent (86-93)

backend/app/infrastructure/kafka/events/metadata.py (1)

AvroEventMetadata (9-31)

backend/app/schemas_pydantic/sse.py (2)

RedisNotificationMessage (102-112)

RedisSSEMessage (63-68)

backend/app/services/sse/redis_bus.py (2)

SSERedisBus (44-87)

publish_notification (79-81)

backend/app/api/routes/execution.py (1)

backend/app/domain/enums/execution.py (1)

is_terminal (17-25)

backend/tests/integration/app/test_main_app.py (2)

backend/tests/conftest.py (2)

app (82-92)

client (103-111)

backend/tests/helpers/protocols.py (1)

get (15-17)

backend/tests/integration/idempotency/test_consumer_idempotent.py (9)

backend/app/events/core/types.py (1)

ConsumerConfig (64-102)

backend/app/events/core/dispatcher.py (1)

EventDispatcher (15-188)

backend/app/events/core/producer.py (3)

UnifiedProducer (30-289)

producer (64-65)

produce (175-206)

backend/app/events/schema/schema_registry.py (1)

SchemaRegistryManager (54-230)

backend/app/services/idempotency/idempotency_manager.py (1)

IdempotencyManager (69-315)

backend/app/services/idempotency/middleware.py (2)

IdempotentConsumerWrapper (122-279)

subscribe_idempotent_handler (180-261)

backend/tests/helpers/events.py (1)

make_execution_requested_event (8-50)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

backend/tests/integration/idempotency/test_idempotency.py (1)

on_duplicate (352-353)

backend/tests/integration/test_dlq_routes.py (3)

backend/tests/conftest.py (2)

app (82-92)

authenticated_client (184-188)

backend/app/dlq/models.py (1)

DLQMessageStatus (10-16)

backend/app/schemas_pydantic/dlq.py (5)

DLQBatchRetryResponse (70-78)

DLQMessageDetail (95-117)

DLQMessageResponse (22-39)

DLQMessagesResponse (59-67)

DLQStats (10-19)

backend/tests/e2e/test_execution_routes.py (2)

backend/app/domain/enums/execution.py (1)

ExecutionStatus (4-25)

backend/tests/helpers/polling.py (1)

poll_until_terminal (10-21)

backend/tests/integration/test_notifications_routes.py (1)

backend/tests/conftest.py (4)

app (82-92)

authenticated_client (184-188)

client (103-111)

make_user (168-180)

backend/tests/integration/test_sse_routes.py (2)

backend/tests/conftest.py (4)

app (82-92)

client (103-111)

authenticated_client (184-188)

scope (121-123)

backend/app/services/sse/sse_service.py (2)

SSEService (24-262)

create_notification_stream (187-241)

backend/tests/unit/services/saga/test_execution_saga_steps.py (2)

backend/app/services/saga/execution_saga.py (9)

AllocateResourcesStep (58-106)

CreatePodStep (138-203)

DeletePodCompensation (292-329)

MonitorExecutionStep (206-232)

QueueExecutionStep (109-135)

ReleaseResourcesCompensation (238-263)

RemoveFromQueueCompensation (266-289)

ValidateExecutionStep (18-55)

ExecutionSaga (332-367)

backend/app/services/saga/saga_step.py (1)

SagaContext (14-70)

backend/tests/integration/test_replay_routes.py (2)

backend/tests/conftest.py (2)

authenticated_client (184-188)

authenticated_admin_client (192-196)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

backend/tests/helpers/__init__.py (2)

backend/tests/helpers/events.py (1)

make_execution_requested_event (8-50)

backend/tests/helpers/polling.py (1)

poll_until_terminal (10-21)

backend/tests/integration/test_admin_routes.py (2)

backend/tests/conftest.py (2)

authenticated_admin_client (192-196)

authenticated_client (184-188)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

backend/tests/integration/test_health_routes.py (1)

backend/tests/conftest.py (1)

authenticated_client (184-188)

backend/tests/integration/test_user_settings_routes.py (1)

backend/tests/conftest.py (4)

app (82-92)

authenticated_client (184-188)

client (103-111)

make_user (168-180)

backend/tests/integration/test_saga_routes.py (1)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

backend/tests/helpers/protocols.py (3)

backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (2)

close (43-44)

produce (38-39)

backend/app/services/coordinator/queue_manager.py (2)

execution_id (30-31)

user_id (34-35)

backend/tests/unit/services/pod_monitor/test_event_mapper.py (3)

read_namespaced_pod_log (37-40)

read_namespaced_pod_log (46-49)

read_namespaced_pod_log (55-58)

backend/tests/integration/services/idempotency/test_redis_repository.py (2)

backend/app/services/idempotency/redis_repository.py (4)

_iso (13-14)

_json_default (17-20)

_parse_iso_datetime (23-29)

_full_key (44-46)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

backend/tests/integration/test_saved_scripts_routes.py (2)

backend/tests/conftest.py (4)

app (82-92)

authenticated_client (184-188)

client (103-111)

make_user (168-180)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

backend/tests/integration/test_events_routes.py (3)

backend/tests/conftest.py (5)

app (82-92)

authenticated_client (184-188)

authenticated_admin_client (192-196)

client (103-111)

make_user (168-180)

backend/app/services/coordinator/queue_manager.py (2)

user_id (34-35)

execution_id (30-31)

backend/app/domain/events/event_models.py (1)

correlation_id (60-61)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Scan Backend
GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (77)

backend/tests/integration/services/coordinator/test_execution_coordinator.py (2)

1-1: LGTM! Improved type safety and test isolation.

The addition of type annotations and DI container parameters enhances code quality. The AsyncContainer scope and unique_id factory enable better test isolation and prevent cross-test interference, which aligns with the PR's goal of improving test reliability.

Also applies to: 5-5, 13-13

15-16: LGTM! Dynamic ID generation improves test reliability.

The consistent use of dynamically generated execution_id throughout the test flow—from event creation to queue operations and assertions—prevents test interference and Kafka cross-test consumption issues, as intended by this PR.

Also applies to: 21-21, 26-26, 29-29

backend/tests/integration/test_dlq_routes.py (3)

1-14: LGTM: Import refactoring aligns with module restructuring.

The import changes correctly reflect the relocation of DLQMessageStatus from app.schemas_pydantic.dlq to app.dlq, maintaining consistency with the codebase structure.

34-324: LGTM: Excellent refactoring to use authenticated_client fixture.

The consistent use of the authenticated_client fixture across all test methods eliminates repetitive login flows and improves test readability. The tests now focus on business logic validation rather than authentication setup.

40-40: LGTM: Excellent use of Pydantic for response validation.

The tests leverage Pydantic models (DLQStats, DLQMessagesResponse, DLQMessageDetail, etc.) for automatic validation of response structure and types, which is more robust and maintainable than manual assertions.

Also applies to: 55-55, 123-123, 159-159, 185-185, 220-220, 243-243

backend/tests/integration/idempotency/test_consumer_idempotent.py (7)

1-19: LGTM!

The imports are well-organized and appropriate for the test. The use of backoff for retry logic and unique_id fixture for test isolation aligns with the broader test suite patterns.

24-27: LGTM!

The test signature properly uses the scope and unique_id fixtures for dependency injection and test isolation, consistent with the integration test patterns described in conftest.py.

33-40: LGTM!

The separate tracking of processed vs duplicate events is the correct approach for verifying idempotency behavior. The on_duplicate callback signature aligns with the IdempotentConsumerWrapper.subscribe_idempotent_handler API.

42-66: LGTM!

Good test isolation: using unique_id for the consumer group prevents Kafka cross-test consumption issues. Setting enable_for_all_handlers=False is correct to allow explicit handler registration with the on_duplicate callback.

68-73: LGTM!

The handler registration correctly uses subscribe_idempotent_handler with the on_duplicate callback, enabling proper tracking of both processed events and blocked duplicates.

75-79: LGTM!

Producing two identical events before starting the consumer is the correct approach. This ensures both events are queued in Kafka and will be consumed in order, allowing the idempotency mechanism to detect the second as a duplicate.

81-95: Well-structured test assertion logic.

The two-phase approach effectively addresses the idempotency verification:

Backoff waits until both events are handled (processed or detected as duplicate)

Final assertions verify exactly 1 processed and 1 duplicate

This properly validates that the idempotent wrapper blocks duplicates, addressing the concern from the previous review about seen["n"] >= 1 being insufficient.

backend/tests/integration/test_user_settings_routes.py (1)

31-195: LGTM: Clean refactoring to authenticated_client fixture.

The transition from manual login to the authenticated_client fixture significantly improves test readability and maintainability. The use of Pydantic models for response validation and separate variables for editor/notifications payloads are good practices.

backend/tests/integration/test_notifications_routes.py (4)

2-13: LGTM!

Import reorganization aligns well with the fixture-based refactoring. Domain enums are correctly sourced from app.domain.enums, and MakeUser type is properly imported for fixture typing.

33-36: LGTM!

Clean migration to fixture-based authentication. Using authenticated_client eliminates boilerplate login code and improves test maintainability.

63-77: LGTM!

Good use of enum values for parameterizing status filters. The assertion correctly validates that all returned notifications match the requested status.

374-385: LGTM!

Good edge-case test for invalid channel validation. Accepting multiple HTTP error codes is pragmatic given that the error could originate from routing or validation layers.

backend/tests/integration/test_replay_routes.py (2)

1-11: LGTM! Import updates align with refactoring.

The addition of Callable, backoff, ReplayFilter, and explicit AsyncClient imports properly support the updated test structure with type hints, polling, and the new nested filter model.

399-410: Excellent use of backoff for progress polling.

The backoff decorator eliminates fixed sleep calls and provides robust polling with automatic retries. The progress validation logic correctly uses the new replayed_events and total_events field names.

backend/tests/integration/test_sse_routes.py (8)

3-14: LGTM! Import updates align with test improvements.

The new imports support the migration to:

Deterministic ID generation (Callable for unique_id fixture)

Robust event waiting (backoff for retry logic)

Proper DI typing (AsyncContainer from dishka)

Explicit HTTP client typing (AsyncClient from httpx)

25-33: LGTM! Improved test isolation with unique IDs.

The migration from hard-coded exec_id to unique_id("exec-") prevents test interference and aligns with the broader pattern across integration tests.

39-44: LGTM! Better fixture usage reduces duplication.

Switching to the authenticated_client fixture eliminates manual authentication setup and aligns with the DI pattern used across other tests.

47-101: LGTM! Robust event-driven waiting pattern.

The migration to backoff-based waiting (lines 73-77) with a 5-second timeout is more reliable than the previous approach, especially for integration tests with external dependencies (Redis, Kafka). The use of unique_id for user and notification IDs ensures proper test isolation.

The timeout increase from 2.0s to 5.0s (line 94) is reasonable for integration tests but may indicate CI timing sensitivity.

103-154: LGTM! Consistent improvements with notification stream test.

The same robust patterns applied here: unique IDs for isolation, backoff-based waiting for connection readiness, and extended timeout for CI stability. Good consistency across related tests.

157-166: LGTM! Consistent unique ID usage.

The migration to unique_id for test isolation is correctly applied.

169-180: LGTM! Validates client resilience after cancellation.

This test correctly verifies that the HTTP client remains functional after cancelling an SSE stream task. The immediate cancellation (line 174) without sleep is valid—asyncio task cancellation works at any execution point, and the test specifically exercises client resilience under abrupt cancellation.

183-212: LGTM! Validates concurrent connection handling.

The test correctly verifies that multiple concurrent SSE connections can be established through the service. The "at least 2 of 3 should succeed" assertion (line 211) provides reasonable tolerance for concurrent test execution while still validating the core functionality.

backend/tests/integration/services/idempotency/test_redis_repository.py (7)

19-47: LGTM!

Helper function tests are well-structured with proper type annotations. The combined assertions in test_parse_iso_datetime_variants efficiently cover multiple parsing scenarios.

50-68: LGTM!

Fixtures are well-typed and properly leverage unique_id for test isolation. The sample_record correctly includes all IdempotencyRecord fields.

71-73: LGTM!

Properly tests the idempotent key prefixing behavior of _full_key.

76-91: LGTM!

Roundtrip conversion test exercises all IdempotencyRecord fields. The key/status assertion validates core serialization correctness.

128-134: LGTM!

Correctly verifies the edge case where updating a non-existent record returns 0.

137-163: LGTM! Past review concern addressed.

The test now correctly uses a unique prefix for isolation (unique_id("idemp-agg-")) and exact equality assertions (== 2, == 1). This properly addresses the earlier feedback about test precision and isolation.

166-168: LGTM!

Simple smoke test verifying the health check doesn't raise exceptions.

backend/tests/unit/services/sse/test_redis_bus.py (4)

1-17: LGTM! Well-organized test setup.

The imports are properly organized, and using FakeAsyncRedis for unit testing is the right approach. The pytestmark correctly categorizes these as unit tests.

20-38: LGTM! Well-implemented helper functions.

Both helpers are cleanly implemented. The _wait_for_message function properly uses asyncio.timeout for explicit timeout handling and yields control with asyncio.sleep(0.01) to avoid busy-waiting.

41-60: LGTM! Solid test structure and execution.

The test properly:

Sets up isolated test dependencies (FakeAsyncRedis, SSERedisBus)

Uses the delayed publish pattern to ensure subscription is ready

Leverages the timeout helper for reliable message waiting

Validates the round-trip message correctly

Cleans up resources properly

Also applies to: 66-67

70-101: LGTM! Clean notification channel test.

The test follows the same solid pattern as the execution test:

Proper test isolation with fresh Redis and bus instances

Complete RedisNotificationMessage construction with all required fields

Consistent use of the delayed publish pattern and timeout helper

Clear assertion on the notification ID

Proper resource cleanup

backend/tests/integration/app/test_main_app.py (6)

11-16: Type hints and route filtering improvements look good.

The added type hints improve code clarity, and filtering routes by Route type before accessing the path attribute is correct—BaseRoute doesn't expose path, so the isinstance check properly narrows the type and prevents potential AttributeError.

18-19: Middleware count check intentionally simplified.

Replacing specific middleware class checks with a count assertion makes the test less brittle. The new test_middlewares_behavior function compensates by verifying actual middleware behavior via HTTP, which is a more robust approach.

25-32: CORS assertion properly addresses previous review feedback.

The new CORS test correctly verifies middleware behavior by:

Sending a proper preflight OPTIONS request with Origin and Access-Control-Request-Method headers

Checking the status code separately

Verifying the access-control-allow-origin header exactly matches the expected origin

This resolves the weak assertion logic flagged in previous reviews.

34-39: Correlation and cache-control checks are appropriate.

The presence checks for x-correlation-id and cache-control headers appropriately verify that the respective middleware is functioning. Both assertions use the response from the GET request at line 35.

42-45: Type hints added for consistency.

The added type hints improve code clarity while the test logic remains unchanged.

26-26: No action needed—hardcoded origin is correctly configured.

The hardcoded origin "https://localhost:5001" exactly matches the first entry in the allow_origins list configured in the CORSMiddleware (in backend/app/main.py). The test correctly validates CORS behavior for an allowed origin.

backend/tests/unit/services/saga/test_execution_saga_steps.py (6)

1-17: LGTM! Imports properly updated.

The imports correctly include the new RemoveFromQueueCompensation class and necessary typing utilities. The use of the make_execution_requested_event helper improves test maintainability.

53-53: LGTM! Appropriate typing for test fake.

Using Any for the parameter in the fake repository allows test flexibility without requiring full domain model imports.

99-104: LGTM! Test helper correctly matches real signature.

The _Ctx.set method signature correctly matches the real SagaContext.set(self, key: str, value: Any) method, ensuring the exception-raising test path exercises the actual interface.

72-72: Type ignores are acceptable for unit test fakes.

The # type: ignore[arg-type] comments are necessary where test fakes (which don't inherit from the actual repository/producer classes) are passed to methods expecting the real types. This is standard practice in unit testing to avoid requiring full dependency implementations.

Also applies to: 78-78, 128-128, 140-140, 148-148, 158-158, 177-177

175-180: Excellent fix! Previous issue resolved.

The assertions now properly verify compensation types without the or True bypass that was flagged in the previous review. Each get_compensation() call is meaningfully tested:

Steps without compensation correctly return None

Steps with compensation return the expected compensation class instances

This directly addresses the past review feedback and ensures proper test coverage.

183-200: LGTM! Test correctly verifies dependency binding.

The test properly validates that:

The saga accepts and stores bound dependencies

Steps created by get_steps() receive the bound configuration

The publish_commands flag is correctly propagated to CreatePodStep

The test structure is clear and the assertions are meaningful.

backend/tests/integration/test_saga_routes.py (3)

2-2: LGTM! Imports support the new fixture-driven test pattern.

The new imports for Callable, UserRole, and MakeUser properly support the refactored test signatures and authentication flows.

Also applies to: 6-6, 13-14

20-24: LGTM! Consistent use of unique_id fixture ensures test isolation.

The fixture is correctly applied to tests that create or reference specific resources (sagas, executions), while tests querying existing data appropriately omit it. This pattern prevents cross-test interference.

Also applies to: 29-36, 40-46, 49-56, 60-74, 145-149, 152-160, 263-284

83-97: LGTM! Proper use of authenticated_client fixture simplifies authenticated test flows.

The transition from manual authentication to the authenticated_client fixture reduces boilerplate while maintaining test clarity. Tests requiring 401 verification correctly continue using the unauthenticated client.

Also applies to: 100-115, 118-130, 133-142, 192-212, 215-240, 243-260, 287-307, 310-328

backend/tests/helpers/__init__.py (1)

3-6: LGTM!

Clean module structure with explicit __all__ exports. The public API surface is well-defined and matches the imports.

backend/tests/integration/test_health_routes.py (1)

50-76: LGTM!

Good improvements:

Using authenticated_client fixture eliminates manual login boilerplate

Adding return type int | None to create_load improves type safety

Changing bare except to except Exception follows Python best practices

backend/tests/integration/test_admin_routes.py (2)

334-340: Verify login flow after logout updates client headers.

After logging out and logging back in with the new password, the authenticated_admin_client still has the old admin's CSRF token in its headers. The new login response's CSRF token is not being applied to the client headers, which could cause subsequent requests to fail if CSRF validation is enabled.

If this test is solely to verify the password reset worked, the assertion on line 340 is sufficient. However, if further requests were intended to use the newly logged-in user, the headers would need to be updated.

137-192: LGTM!

Comprehensive CRUD test with proper use of unique_id for test isolation. The test covers:

User creation with unique credentials

User retrieval and overview

User update

User deletion and verification

Good practice using Pydantic model validation (AdminUserOverview) for response structure validation.

backend/tests/conftest.py (2)

139-160: LGTM!

Well-structured authentication helper with CSRF token handling. Key points:

Accepts 400 status to handle "user already exists" gracefully

Uses raise_for_status() for login failure detection

Returns a complete credentials dict with headers for easy reuse

183-196: The concern about simultaneous use of both fixtures is not supported by test patterns in the codebase.

The fixtures authenticated_client and authenticated_admin_client are never used together in any single test. Each test method requests one fixture or the other, never both. The mutual exclusivity is enforced by the test design pattern itself, so no header conflict can occur in practice.

Likely an incorrect or invalid review comment.

backend/tests/integration/test_events_routes.py (1)

63-64: LGTM!

Good update to check event_version and metadata.user_id fields, aligning with the updated event schema structure.

backend/app/api/routes/execution.py (1)

175-178: LGTM!

Clean refactor that replaces hard-coded terminal state checks with the centralized is_terminal property. This ensures consistency with the domain model and makes the code more maintainable when terminal states are added or changed.

The explicit check for CANCELLED on line 167 correctly handles the idempotency case with a distinct response before the general terminal state rejection.

backend/app/domain/enums/execution.py (1)

16-25: LGTM!

The is_terminal property provides a clean, centralized way to check for terminal execution states. The implementation correctly includes all five terminal states and follows Python best practices for enum properties.

backend/tests/e2e/test_execution_routes.py (4)

6-10: LGTM!

The imports are well-organized and support the test refactoring. Aliasing ExecutionStatus to ExecutionStatusEnum improves clarity, and the schema imports enable proper response validation.

248-256: LGTM!

The best-effort validation approach is appropriate here. Line 253 correctly compares the status string with "COMPLETED", avoiding the enum comparison issue.

337-347: LGTM!

The validation logic correctly uses e.value at line 347 to get string values from the enum for comparison. This avoids the string-vs-enum mismatch issue.

424-424: LGTM!

Using _ for unused loop variables is a Python convention and improves code readability.

backend/tests/integration/test_saved_scripts_routes.py (6)

1-16: LGTM!

Import structure is clean and all imports are used. The integration marker on the class is appropriate.

17-33: LGTM!

Good test for verifying authentication is required. Uses the unauthenticated client fixture correctly and validates the error response structure.

35-234: LGTM!

These CRUD tests follow good patterns:

Use authenticated_client fixture for proper authentication

Leverage unique_id for test isolation

Validate response structure with SavedScriptResponse

Handle timestamp comparison carefully with timezone normalization

282-339: LGTM!

The flexible status code assertions (accepting 200, 201, 400, 422) are well-documented with comments explaining the rationale. This is reasonable for integration tests where backend validation behavior may vary.

341-401: LGTM!

Good edge case coverage:

The 10MB payload test appropriately uses pytest.skip for environment-specific 5xx responses

Nonexistent resource tests verify proper 404/403 handling

Fake UUID is valid format, avoiding parse errors

427-431: Same auth pattern concern as test_cannot_access_other_users_scripts.

The explicit login at lines 427-430 doesn't update client.headers. If the API uses bearer tokens (as suggested by authenticated_client pattern), the subsequent client.get at line 434 may fail authentication despite the login succeeding.

Consider either:

Updating headers after login: client.headers.update({"Authorization": f"Bearer {token}"}), or

Using cookies if the API supports session-based auth (but this should be verified)

backend/tests/helpers/protocols.py (5)

1-9: LGTM! Clear module documentation and appropriate imports.

The module docstring effectively explains the purpose of these Protocol definitions, and the imports are minimal and well-chosen for defining runtime-checkable protocols.

41-52: LGTM! Well-defined execution repository protocol.

The protocol clearly defines the expected interface for execution repositories with appropriate async methods and return types.

54-104: LGTM! Appropriate protocol decomposition for router capabilities.

The separation of RouterProtocol, RouterWithStopProtocol, and RouterWithCloseProtocol allows for flexible composition where different router implementations can support different shutdown mechanisms via structural typing.

106-111: LGTM! Attribute-only protocol is valid and appropriate.

Defining a protocol with only an attribute is a valid pattern for structural type checking of configuration objects that need to provide specific settings.

148-156: LGTM! K8s API protocol matches actual implementation.

The protocol definition correctly matches the actual implementation shown in the relevant code snippets, with appropriate type hints and default values.

backend/tests/e2e/test_execution_routes.py

backend/tests/integration/test_events_routes.py

backend/tests/integration/test_notifications_routes.py

backend/tests/integration/test_saga_routes.py

backend/tests/integration/test_saved_scripts_routes.py

backend/tests/unit/services/sse/test_redis_bus.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/tests/conftest.py (1)
139-160: Registration error handling could mask failures.

Line 151 accepts status code 400 during registration without distinguishing between "user already exists" (acceptable for test idempotency) and other validation errors (actual failures). If registration fails for a reason other than duplicate user, the code silently continues to login, which will then fail with a potentially confusing error.

Consider checking the error response body to differentiate between expected duplicates and actual registration errors, or at minimum add a comment explaining this trade-off.
♻️ Proposed improvement
 r = await client.post("/api/v1/auth/register", json=creds)
-if r.status_code not in (200, 201, 400):
+# Accept 400 for duplicate user (idempotency), fail on other errors
+if r.status_code not in (200, 201, 400):
     pytest.fail(f"Cannot create {role.value}: {r.status_code} - {r.text}")
+if r.status_code == 400 and "already exists" not in r.text.lower():
+    pytest.fail(f"Registration failed: {r.status_code} - {r.text}")
backend/tests/integration/test_saga_routes.py (1)
163-182: Test doesn't verify access control isolation.

While the session handling is now correct (user1's sagas are fetched before user2 logs in), the test only asserts that both responses are lists. It doesn't verify that:

User1 and user2 see different saga sets

Users cannot access each other's sagas

The lists contain the expected sagas for each user

Since fresh users likely have no sagas, both lists are empty and the test passes without validating actual access control behavior.
🔍 Suggested enhancement: Create sagas and verify isolation

Consider creating a saga for user1, then verifying:

User1 can see their own saga

User2 cannot see user1's saga

Each user's list contains only their own sagas

Example approach:
# Create user1 and a saga for them
user1 = await make_user(UserRole.USER)
# ... create a saga for user1 via API ...

# Fetch user1's sagas - should include the saga just created
response1 = await client.get("/api/v1/sagas/")
assert response1.status_code == 200
user1_sagas = SagaListResponse(**response1.json())

# Create user2
await make_user(UserRole.USER)

# Fetch user2's sagas - should NOT include user1's saga
response2 = await client.get("/api/v1/sagas/")
assert response2.status_code == 200
user2_sagas = SagaListResponse(**response2.json())

# Verify isolation
user1_saga_ids = {s.saga_id for s in user1_sagas.sagas}
user2_saga_ids = {s.saga_id for s in user2_sagas.sagas}
assert len(user1_saga_ids) > 0, "User1 should have at least one saga"
assert len(user1_saga_ids.intersection(user2_saga_ids)) == 0, "Users should not share sagas"

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d49185 and c16a8dd.

📒 Files selected for processing (5)

backend/app/settings.py
backend/tests/conftest.py
backend/tests/e2e/test_execution_routes.py
backend/tests/integration/test_notifications_routes.py
backend/tests/integration/test_saga_routes.py

✅ Files skipped from review due to trivial changes (1)

backend/app/settings.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/tests/integration/test_notifications_routes.py (2)

backend/tests/conftest.py (4)

app (82-92)

authenticated_client (184-193)

client (103-111)

make_user (168-180)

backend/tests/helpers/protocols.py (1)

get (15-17)

backend/tests/e2e/test_execution_routes.py (4)

backend/tests/conftest.py (3)

app (82-92)

authenticated_client (184-193)

client (103-111)

backend/app/domain/enums/execution.py (1)

ExecutionStatus (4-25)

backend/tests/helpers/polling.py (1)

poll_until_terminal (10-21)

backend/app/services/coordinator/queue_manager.py (1)

execution_id (30-31)

backend/tests/integration/test_saga_routes.py (2)

backend/tests/conftest.py (4)

app (82-92)

client (103-111)

authenticated_client (184-193)

make_user (168-180)

backend/tests/integration/conftest.py (1)

unique_id (26-39)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (19)

backend/tests/integration/test_notifications_routes.py (7)

1-18: LGTM!

Imports are well-organized, correctly pulling enums from domain layer and schemas from pydantic layer. The MakeUser type import from conftest enables proper type hints for the fixture.

20-131: LGTM!

The test methods correctly use authenticated_client for authenticated endpoints and client for the unauthenticated test. The status filtering test properly iterates over NotificationStatus enum values.

133-144: LGTM!

The previously flagged redundant API call has been addressed. The test now makes a single call to /api/v1/notifications/unread-count and properly asserts both the status code and the unread count value.

146-306: LGTM!

Subscription and deletion tests are well-structured with proper Pydantic schema validation. The Slack subscription test appropriately handles the case where Slack may be disabled (422 response).

336-359: LGTM - previously flagged issue resolved!

The test logic has been corrected. Notifications are now fetched immediately after each make_user call, before the next user is created. This ensures:

USER notifications are fetched while client is authenticated as USER (lines 342-347)

ADMIN notifications are fetched while client is authenticated as ADMIN (lines 350-355)

The test now correctly verifies user isolation.

308-334: LGTM!

Pagination test correctly validates that pages have no overlapping notification IDs and maintains consistent totals across pages.

361-373: LGTM!

The test appropriately accepts multiple valid error codes (400, 404, 422) for invalid channel input, as different validation layers may handle this differently.

backend/tests/conftest.py (4)

82-135: Fixture typing improvements look solid.

The updates to fixture signatures with explicit FastAPI and AsyncContainer types improve type safety and make the DI flow clearer. The removal of the hasattr check before closing the container (line 92) makes failures explicit, which is appropriate for test code.

163-180: Factory pattern for user creation is excellent.

The make_user factory provides clean isolation for multi-user tests and eliminates the need for shared test user fixtures. The type alias MakeUser makes the signature clear.

184-193: Mutation behavior now properly documented.

The explicit documentation (lines 187-189) clearly warns developers about the fixture mutation behavior and directs them to use make_user for multi-user scenarios. This adequately addresses the concern about shared fixture state.

197-205: Admin client fixture follows the same pattern.

Consistent with authenticated_client, with appropriate mutation warnings in the docstring.

backend/tests/e2e/test_execution_routes.py (6)

5-9: Clean imports for enhanced test infrastructure.

The addition of ExecutionStatus, execution schemas, and poll_until_terminal aligns the tests with the robust polling-based validation pattern.

35-67: Authenticated client adoption looks correct.

The test properly uses authenticated_client and validates the execution response structure. The enum comparisons at lines 61-66 correctly compare execution_response.status (an ExecutionStatusEnum instance from the Pydantic model) directly with enum members.

113-116: Enum comparison now correctly uses .value.

The fix properly compares the string result["status"] from the JSON response with the string values of the enum members. This resolves the type mismatch that would have caused assertion failures.

139-147: Resource tracking validation is robust.

The test correctly:

Uses poll_until_terminal to wait for completion

Compares status using .value for string-to-string comparison

Validates resource_usage presence and structure

Asserts reasonable bounds on execution time

354-368: Public endpoint test correctly uses unauthenticated client.

The test appropriately uses the client fixture (not authenticated_client) for the public examples endpoint and gracefully handles both 200 (implemented) and 404 (not implemented) responses.

387-409: Idempotency test demonstrates correct header usage.

The test properly validates that identical requests with the same Idempotency-Key header return the same execution_id, exercising the idempotency guarantee.

backend/tests/integration/test_saga_routes.py (2)

1-14: LGTM: Import additions support fixture-driven test pattern.

The added imports (Callable, UserRole, MakeUser) are all utilized in the refactored tests. Clean integration with the fixture-based authentication and ID generation pattern.

20-25: LGTM: Fixture-driven refactor improves test isolation and reduces boilerplate.

The migration to unique_id, authenticated_client, and make_user fixtures is consistent and correct throughout. This approach eliminates manual login/registration flows and ensures proper test isolation by generating unique IDs per test.

Also applies to: 28-36, 48-75, 83-97, 100-115, 118-142, 145-160, 184-320

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/tests/integration/test_saved_scripts_routes.py (1)
393-429: Authentication headers not applied after make_user call.

Although line 397 correctly captures the user dictionary from make_user, the test never updates client.headers with user["headers"]. This means:

POST at line 408 lacks the CSRF token

POST at line 413 (logout) lacks the CSRF token

Manual login at lines 416-420 doesn't extract/apply the CSRF token from the response

If CSRF protection is enforced, these requests will fail with 403.

Fix: Update client headers after authentication:
     user = await make_user(UserRole.USER)
+    client.headers.update(user["headers"])

     uid = unique_id("")
     script_data = {
         "name": f"Persistent Script {uid}",
         "script": "print('Should persist')",
         "lang": "python",
         "lang_version": "3.11",
         "description": "Testing persistence",
     }

     create_response = await client.post("/api/v1/scripts", json=script_data)
     assert create_response.status_code in [200, 201]
     script_id = create_response.json()["script_id"]

     # Logout
     await client.post("/api/v1/auth/logout")

     # Second session - login again and retrieve script
     login_resp = await client.post(
         "/api/v1/auth/login",
         data={"username": user["username"], "password": user["password"]},
     )
     assert login_resp.status_code == 200
+    csrf_token = login_resp.json().get("csrf_token")
+    if csrf_token:
+        client.headers["X-CSRF-Token"] = csrf_token

🧹 Nitpick comments (6)

backend/tests/unit/services/sse/test_redis_bus.py (2)
56-59: Consider verifying message content beyond execution_id.

The test only asserts msg.execution_id, but could also verify event_type, exit_code, or other fields from the published event to ensure complete serialization round-trip.
✨ Optional enhancement for more comprehensive validation
     msg = await _wait_for_message(sub, RedisSSEMessage)
     await pub_task
 
-    assert msg.execution_id == "exec-1"
+    assert msg.execution_id == "exec-1"
+    assert msg.event_type == "execution.completed"
+    assert msg.data["exit_code"] == 0
+    assert msg.data["stdout"] == "ok"
100-103: Consider verifying notification fields beyond ID.

Similar to the execution test, verifying additional fields like severity, status, subject, and body would provide more confidence in the full serialization round-trip.
✨ Optional enhancement for more comprehensive validation
     got = await _wait_for_message(nsub, RedisNotificationMessage)
     await pub_task
 
     assert got.notification_id == "n1"
+    assert got.severity == NotificationSeverity.LOW
+    assert got.status == NotificationStatus.PENDING
+    assert got.subject == "test"
+    assert got.body == "body"
backend/tests/integration/app/test_main_app.py (2)
18-19: Consider documenting the expected middlewares.

The assertion checks for at least 6 middlewares but doesn't specify which ones. While this intentional looseness makes the test less brittle, adding a comment listing the expected middlewares would improve maintainability.
📝 Suggested documentation improvement
-    # Verify middleware stack has expected count (6 custom middlewares)
+    # Verify middleware stack has expected count
+    # Expected middlewares: CORS, Correlation, Cache-Control, [list other 3]
     assert len(app.user_middleware) >= 6, "Expected at least 6 middlewares configured"
38-41: Consider verifying cache-control header value.

The test only checks for header presence. Verifying the actual cache directive (e.g., max-age, no-cache, public) would ensure the middleware is configured correctly, not just present.
♻️ Optional enhancement to validate header value
     # Cache-Control middleware: adds cache headers for configured endpoints
     resp = await client.get("/api/v1/example-scripts")
     assert resp.status_code == 200
-    assert "cache-control" in resp.headers
+    cache_control = resp.headers.get("cache-control")
+    assert cache_control is not None, "cache-control header should be present"
+    # Verify it contains expected directives (adjust based on your config)
+    assert "max-age" in cache_control or "no-cache" in cache_control
backend/tests/e2e/test_execution_routes.py (2)
242-276: Test name could be more precise.

The test is named test_execution_with_timeout but doesn't actually wait for or verify timeout behavior—it performs a "bounded check" that long-running executions are accepted and tracked. The docstring clarifies this, but the name might cause confusion.

Consider renaming to something like test_long_running_execution_accepted or test_execution_tracking_for_long_script to better reflect what's actually tested.

356-364: Minor: Extra blank line.

Line 359 has an extra blank line after the response assignment that's inconsistent with the rest of the file's style.
♻️ Optional style fix
     async def test_get_example_scripts(self, client: AsyncClient) -> None:
         """Test getting example scripts (public endpoint)."""
         response = await client.get("/api/v1/example-scripts")
-
         assert response.status_code == 200

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c16a8dd and 3266dcf.

📒 Files selected for processing (5)

backend/tests/e2e/test_execution_routes.py
backend/tests/integration/app/test_main_app.py
backend/tests/integration/idempotency/test_consumer_idempotent.py
backend/tests/integration/test_saved_scripts_routes.py
backend/tests/unit/services/sse/test_redis_bus.py

🧰 Additional context used

🧬 Code graph analysis (4)

backend/tests/integration/app/test_main_app.py (1)

backend/tests/conftest.py (2)

app (82-92)

client (103-111)

backend/tests/unit/services/sse/test_redis_bus.py (5)

backend/app/domain/execution/models.py (1)

ResourceUsageDomain (15-19)

backend/app/infrastructure/kafka/events/execution.py (1)

ExecutionCompletedEvent (86-93)

backend/app/infrastructure/kafka/events/metadata.py (1)

AvroEventMetadata (9-31)

backend/app/schemas_pydantic/sse.py (2)

RedisNotificationMessage (102-112)

RedisSSEMessage (63-68)

backend/app/services/sse/redis_bus.py (2)

SSERedisBus (44-87)

publish_notification (79-81)

backend/tests/e2e/test_execution_routes.py (4)

backend/tests/conftest.py (3)

app (82-92)

authenticated_client (184-193)

client (103-111)

backend/app/domain/enums/execution.py (1)

ExecutionStatus (4-25)

backend/tests/helpers/polling.py (1)

poll_until_terminal (10-21)

backend/app/services/coordinator/queue_manager.py (1)

execution_id (30-31)

backend/tests/integration/test_saved_scripts_routes.py (1)

backend/tests/conftest.py (4)

app (82-92)

authenticated_client (184-193)

client (103-111)

make_user (168-180)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (33)

backend/tests/unit/services/sse/test_redis_bus.py (4)

1-17: LGTM! Clean test setup with appropriate imports.

The imports are well-organized, and the use of FakeAsyncRedis for in-memory testing is a good practice for isolated unit tests.

20-38: LGTM! Well-designed test helpers.

The _wait_for_message helper with explicit timeout and polling pattern is a solid approach for async message retrieval, and _make_completed_event reduces test boilerplate effectively.

41-72: LGTM! Invalid JSON handling improved from past review.

The invalid JSON test (lines 61-69) now uses a deterministic pattern: publishing invalid JSON followed by a valid message and using _wait_for_message to retrieve the valid one. This effectively verifies that invalid JSON is skipped without relying on timing assumptions, addressing the concern from the past review.

75-106: LGTM! Consistent test pattern for notification channels.

The test follows the same solid async pattern as the execution test, using timeout-based message waiting and proper cleanup.

backend/tests/integration/app/test_main_app.py (5)

5-6: LGTM: Import additions support new test pattern.

The AsyncClient and Route imports are correctly used for type-safe route filtering and the new HTTP-based middleware verification test.

15-15: LGTM: Type-safe route filtering.

Filtering to Route instances before accessing .path correctly prevents AttributeError on route types that don't expose a path attribute.

25-32: LGTM: CORS preflight test properly validates middleware.

The test correctly addresses past review feedback by:

Checking status code separately (line 31)

Verifying the CORS header value matches the allowed origin (line 32)

Using a properly formed preflight request with required headers

34-36: LGTM: Correlation ID middleware verification.

Simple and effective check for the correlation middleware behavior.

44-44: LGTM: Type annotation added for consistency.

backend/tests/integration/idempotency/test_consumer_idempotent.py (6)

1-17: LGTM: Imports align with new test structure.

The added imports (Callable, Any, backoff, AsyncContainer) are all used in the refactored test and support the fixture-driven, backoff-based approach.

24-32: LGTM: Improved test signature for isolation.

Using AsyncContainer and the unique_id fixture ensures proper dependency injection and test isolation, preventing cross-test interference in parallel runs.

33-41: Excellent: Separate tracking resolves previous review feedback.

The test now maintains distinct counters for processed vs duplicates, allowing precise verification that the idempotent wrapper blocks duplicate events. This directly addresses the prior concern about insufficient assertions.

42-73: LGTM: Proper configuration for idempotency testing.

The dispatcher and wrapper setup correctly uses manual handler registration via subscribe_idempotent_handler with the on_duplicate callback, enabling precise tracking of duplicate event handling.

83-96: Excellent: Assertions now correctly verify idempotency.

The test fully addresses the previous review feedback by:

Waiting for both events to be handled (processed or duplicate detected)

Asserting exactly 1 processed and 1 duplicate blocked

The backoff strategy with 30s max_time appropriately accounts for Kafka consumer startup latency in integration tests.

75-82: Consider exposing a public flush method on UnifiedProducer.

The test accesses the private _producer attribute to flush messages, but UnifiedProducer does not expose a public flush() method. To avoid relying on internal implementation details, either add a public wrapper method to UnifiedProducer or document why this private access is necessary for testing.
backend/tests/integration/test_saved_scripts_routes.py (5)
18-33: LGTM! Correct fixture usage for unauthenticated test.

The test appropriately uses the bare client fixture to verify that unauthenticated requests are rejected with 401 status.

36-202: LGTM! Proper use of authenticated_client fixture.

These tests correctly use the authenticated_client fixture for authenticated operations. The fixture ensures authentication headers (including CSRF tokens) are properly configured. Response validation using SavedScriptResponse provides strong type safety, and the unique_id fixture ensures test isolation.

205-235: LGTM! Proper cleanup verification.

The delete test correctly verifies both successful deletion and subsequent access denial, with appropriate status code checks for both 404 and 403.

272-390: LGTM! Comprehensive edge-case coverage with proper authentication.

These tests properly validate error handling for invalid languages, name constraints, content size limits, and operations on non-existent scripts. The use of authenticated_client ensures authentication is handled correctly, and the flexible status code assertions (e.g., [200, 201, 400, 422]) appropriately account for different backend validation strategies.

237-269: Authentication headers not applied after make_user calls.

The test calls make_user but doesn't capture the returned user dictionary or update client.headers with the authentication headers. This means the POST request at line 254 lacks the CSRF token required if CSRF protection is enforced.

Fix: Capture the return value and update headers, or refactor to use separate authenticated client instances:
-    await make_user(UserRole.USER)
+    user1 = await make_user(UserRole.USER)
+    client.headers.update(user1["headers"])
     uid = unique_id("")
     user_script_data = {
         "name": f"User Private Script {uid}",
         "script": "print('Private to user')",
         "lang": "python",
         "lang_version": "3.11",
         "description": "Should only be visible to creating user",
     }
     create_response = await client.post("/api/v1/scripts", json=user_script_data)
     assert create_response.status_code in [200, 201]
     user_script_id = create_response.json()["script_id"]

-    await make_user(UserRole.ADMIN)
+    user2 = await make_user(UserRole.ADMIN)
+    client.headers.update(user2["headers"])
Likely an incorrect or invalid review comment.
backend/tests/e2e/test_execution_routes.py (13)

1-12: LGTM: Imports and setup are clean.

The imports properly bring in the execution enums, Pydantic schemas, and the polling helper needed for async state verification.

35-67: LGTM: Authentication and validation updated correctly.

The test properly uses authenticated_client and validates the execution response structure with appropriate enum comparisons.

69-98: LGTM: Result retrieval test properly structured.

The test correctly uses authenticated_client for both execution submission and result retrieval, with appropriate validation of execution state and output.

100-118: LGTM: Error handling test correctly updated.

The enum comparison issue from the previous review has been properly addressed. The test now correctly compares the raw JSON status string to .value of the enum members.

120-150: LGTM: Resource tracking validation properly implemented.

The enum comparison has been fixed, and the new resource usage validation correctly checks that execution_time_wall_seconds is populated and non-negative after completion.

152-174: LGTM: Multi-version testing properly structured.

The test correctly handles different Python versions and appropriately validates both successful execution and version rejection scenarios.

176-203: LGTM: Large output test has appropriate best-effort validation.

The test correctly handles the asynchronous nature of execution by performing conditional validation based on the current execution state.

205-240: LGTM: Cancellation test has appropriate defensive handling.

The test properly handles scenarios where the cancel endpoint might not be available or the execution has already completed, with appropriate error handling and test skipping logic.

278-323: LGTM: Sandbox restriction test has comprehensive validation.

The test properly handles multiple dangerous operation scenarios with appropriate conditional logic to verify sandbox restrictions across different execution states.

325-354: LGTM: Concurrent execution test properly structured.

The test correctly creates coroutines and uses asyncio.gather to execute them concurrently, with appropriate validation of unique execution IDs and handling of potential rate limiting.

366-377: LGTM: Resource limits test enhanced with validation.

The added validation properly checks that the response matches the ResourceLimits schema and contains sensible values for timeout and supported runtimes.

379-385: LGTM: User executions list test properly updated.

The test correctly uses authenticated_client and validates the paginated response structure with all required fields.

387-409: LGTM: Idempotency test correctly validates same-key behavior.

The test properly verifies that submitting the same request with the same Idempotency-Key returns the same execution ID, which is the expected idempotency behavior.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @backend/tests/e2e/conftest.py:
- Around line 47-52: The fixture _cleanup currently annotates its return type as
Any; change it to AsyncGenerator[None, None] to match the integration tests and
improve type safety: update the signature of _cleanup to use
AsyncGenerator[None, None], add the required import (from typing import
AsyncGenerator) if missing, and keep the body the same (await
cleanup_db_and_redis(...) then yield).

🧹 Nitpick comments (3)

backend/tests/e2e/conftest.py (2)
20-21: Consider adding URL validation.

The BACKEND_URL is used directly without validation. While the default is safe, adding basic URL validation could help catch configuration errors early in CI/local development.
🔍 Optional validation example
 # Backend URL for E2E tests (set in CI, defaults to local HTTPS)
 BACKEND_URL = os.environ.get("BACKEND_URL", "https://localhost:443")
+
+# Validate BACKEND_URL format
+if not BACKEND_URL.startswith(("http://", "https://")):
+    raise ValueError(f"Invalid BACKEND_URL: {BACKEND_URL} (must start with http:// or https://)")
40-40: Long timeout for E2E tests.

The 60-second timeout is quite long. While reasonable for E2E tests that may involve real backend operations, consider whether this could mask performance issues or cause slow test failures.
.github/workflows/frontend-ci.yml (1)
155-158: Replace fixed sleep with container completion check.

Using sleep 10 is fragile—init jobs may take longer under load or finish sooner. Prefer waiting for the containers to exit successfully:
♻️ Proposed fix
          # Init jobs
          docker compose up -d --no-build user-seed kafka-init
-         sleep 10
+         # Wait for init containers to complete
+         timeout 120 bash -c 'until [ "$(docker compose ps user-seed --format "{{.State}}")" = "exited" ]; do sleep 2; done'
+         timeout 120 bash -c 'until [ "$(docker compose ps kafka-init --format "{{.State}}")" = "exited" ]; do sleep 2; done'

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54bb765 and bcdb8ea.

📒 Files selected for processing (4)

.github/workflows/frontend-ci.yml
backend/tests/e2e/conftest.py
docker-bake.hcl
docs/operations/cicd.md

🚧 Files skipped from review as they are similar to previous changes (1)

docker-bake.hcl

🧰 Additional context used

🧬 Code graph analysis (1)

backend/tests/e2e/conftest.py (6)

backend/tests/conftest.py (5)

app (82-92)

client (103-111)

db (127-129)

redis_client (133-135)

scope (121-123)

backend/tests/unit/conftest.py (4)

app (27-28)

client (22-23)

db (12-13)

redis_client (17-18)

backend/tests/helpers/cleanup.py (1)

cleanup_db_and_redis (8-22)

backend/app/services/sse/redis_bus.py (1)

get (23-35)

backend/app/events/core/consumer.py (1)

_cleanup (89-92)

backend/tests/integration/conftest.py (1)

_cleanup (14-22)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: E2E Tests
GitHub Check: Build Frontend
GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (9)

backend/tests/e2e/conftest.py (4)

1-15: LGTM!

The docstring clearly explains the E2E test setup, and the imports are appropriate for testing against a real backend.

18-18: LGTM!

Module-level logger is a good practice for E2E test observability.

54-62: LGTM!

The no-op fixture provides backwards compatibility while clearly documenting that workers run in containers. Module scope is appropriate.

33-36: SSL verification is appropriately disabled for E2E testing with self-signed certificates.

The concern about BACKEND_URL accidentally exposing a production endpoint is mitigated by the implementation: BACKEND_URL defaults to https://localhost:443 and can only point to a different endpoint if explicitly overridden via an environment variable. Since E2E tests are not run in the standard CI pipeline (only unit and integration tests), and no CI workflow sets BACKEND_URL, this fixture always uses the safe localhost default. The SSL verification disabling is appropriate for testing against local self-signed certificates.

Likely an incorrect or invalid review comment.

docs/operations/cicd.md (1)

155-160: LGTM!

Documentation accurately reflects the migration to bake-based builds. The explanation of build targets, dependencies, and cache strategy is clear and aligns with the actual workflow changes.

.github/workflows/frontend-ci.yml (4)

8-17: LGTM!

Trigger paths correctly updated to monitor docker-compose.yaml and docker-bake.hcl, reflecting the migration away from CI-specific compose files.

20-26: Centralizing image definitions is a good practice.

Note: The Kafka image here (confluentinc/cp-kafka:7.5.0) differs from the backend CI documentation reference (apache/kafka:3.9.0). Ensure this is intentional for Confluent Schema Registry compatibility.

87-101: LGTM!

Bake-action configuration is correct. Using load: true ensures images are available in the Docker daemon for compose, and the cache settings maximize layer reuse.

187-195: LGTM!

Log collection steps are properly aligned with the single docker-compose.yaml approach and include both container logs and Kubernetes events for debugging.

backend/tests/e2e/conftest.py

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In @.github/workflows/backend-ci.yml:
- Around line 206-237: The workflow currently sets insecure-skip-tls-verify:
true in the generated kubeconfig.yaml which disables server TLS verification;
instead extract the cluster CA from k3s (e.g. CA_CERT=$(sudo k3s kubectl config
view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}'))
alongside CLIENT_CERT and CLIENT_KEY, and update the kubeconfig generation for
users/clusters to include certificate-authority-data: ${CA_CERT} (remove or
avoid insecure-skip-tls-verify) so the kubeconfig uses proper TLS verification
when creating kubeconfig.yaml and copying it to backend/kubeconfig.yaml.
- Around line 258-262: Replace the fragile fixed "sleep 10" after the "docker
compose up -d --no-build user-seed kafka-init" step with a condition-based wait:
poll for the specific readiness indicator emitted by the init job(s) (e.g.,
container exit/status, healthcheck, presence of created Kafka topics or
registered schemas) instead of sleeping; implement a loop that inspects the init
service/container status via "docker compose ps" or "docker inspect" (or checks
Kafka with kafka-topics.sh / schema registry endpoints) and only proceeds when
the init containers have succeeded or the expected resources exist, failing the
workflow with a clear error if a timeout is reached.

In @.github/workflows/frontend-ci.yml:
- Line 127: Replace the unsafe "insecure-skip-tls-verify: true" setting with a
proper CA-based verification: remove or set that key to false and configure the
kube client's TLS to use the k3s CA certificate (e.g., point the TLS/ca
certificate field or kubeconfig ca data to the extracted k3s CA file), ensure
the workflow extracts or references the CA (from the k3s server or cluster
secret) and supplies it to the kubectl/helm step so certificate validation is
performed instead of being skipped.
- Around line 145-146: Replace the hardcoded environment values for
MONGO_ROOT_USER and MONGO_ROOT_PASSWORD in the workflow with references to
GitHub Actions secrets; update the workflow to use the secrets (e.g., reference
secrets for those two variables) and remove the plaintext values, and ensure the
corresponding secrets are created and stored in the repository or organization
secrets settings so CI can access them securely.
- Around line 162-163: Replace the brittle "sleep 10" after "docker compose up
-d --no-build user-seed kafka-init" with an explicit readiness loop that polls
the init services until they finish successfully: e.g., repeatedly check the
containers for "user-seed" and "kafka-init" using docker compose/docker inspect
(exit code == 0 or health status == "healthy") or by tailing their logs for a
success message, sleeping briefly between polls and failing the job on timeout;
update the step that runs "docker compose up -d --no-build user-seed kafka-init"
to invoke this readiness check instead of the hardcoded sleep.

🧹 Nitpick comments (2)

.github/workflows/backend-ci.yml (1)
286-290: Clarify certificate verification behavior.

The comment "Trust self-signed certs" (line 286) is slightly misleading. Setting REQUESTS_CA_BUNDLE="" and CURL_CA_BUNDLE="" actually disables certificate verification entirely rather than trusting specific self-signed certificates.

While this is acceptable for CI environments, consider updating the comment for clarity:
- # Trust self-signed certs
+ # Disable certificate verification for self-signed certs
  REQUESTS_CA_BUNDLE: ""
  CURL_CA_BUNDLE: ""
Alternatively, you could provide the CA certificate path to properly trust only the specific self-signed CA:
REQUESTS_CA_BUNDLE: backend/certs/ca.crt
CURL_CA_BUNDLE: backend/certs/ca.crt
backend/tests/conftest.py (1)
184-205: Refactor to avoid client mutation anti-pattern.

Both authenticated_client and authenticated_admin_client mutate the shared client instance by calling client.headers.update(). This creates several issues:

Test interference risk: If a test accidentally uses both client and authenticated_client, the client's state is mutated unexpectedly

Fragile design: The docstrings explicitly warn against using multiple fixtures together, indicating the pattern is error-prone

Hidden state mutation: Fixtures should ideally be pure and not mutate their dependencies

Recommended approach: Create a new client instance with auth headers instead of mutating the existing one:
♻️ Proposed refactor to avoid mutation
 @pytest_asyncio.fixture
-async def authenticated_client(client: httpx.AsyncClient) -> httpx.AsyncClient:
-    """HTTP client logged in as regular user.
-
-    Note: This fixture mutates and returns the same `client` instance with
-    auth headers applied. Do NOT use both `client` and `authenticated_client`
-    in the same test. For multi-user tests, use `client` + `make_user` fixture.
-    """
+async def authenticated_client(app: FastAPI) -> AsyncGenerator[httpx.AsyncClient, None]:
+    """HTTP client logged in as regular user."""
+    async with httpx.AsyncClient(
+        transport=ASGITransport(app=app),
+        base_url="https://test",
+        timeout=30.0,
+        follow_redirects=True,
+    ) as c:
-    user = await _register_and_login(client, UserRole.USER)
-    client.headers.update(user["headers"])
-    return client
+        user = await _register_and_login(c, UserRole.USER)
+        c.headers.update(user["headers"])
+        yield c
 
 
 @pytest_asyncio.fixture
-async def authenticated_admin_client(client: httpx.AsyncClient) -> httpx.AsyncClient:
-    """HTTP client logged in as admin.
-
-    Note: This fixture mutates and returns the same `client` instance with
-    admin auth headers applied. For multi-user tests, use `make_user` fixture.
-    """
+async def authenticated_admin_client(app: FastAPI) -> AsyncGenerator[httpx.AsyncClient, None]:
+    """HTTP client logged in as admin."""
+    async with httpx.AsyncClient(
+        transport=ASGITransport(app=app),
+        base_url="https://test",
+        timeout=30.0,
+        follow_redirects=True,
+    ) as c:
-    admin = await _register_and_login(client, UserRole.ADMIN)
-    client.headers.update(admin["headers"])
-    return client
+        admin = await _register_and_login(c, UserRole.ADMIN)
+        c.headers.update(admin["headers"])
+        yield c
This eliminates the mutation warning and allows tests to safely use multiple authenticated clients simultaneously.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcdb8ea and 1b4a85b.

📒 Files selected for processing (4)

.github/workflows/backend-ci.yml
.github/workflows/frontend-ci.yml
backend/tests/conftest.py
backend/tests/e2e/conftest.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/tests/e2e/conftest.py (3)

backend/tests/conftest.py (5)

app (82-92)

scope (121-123)

client (103-111)

db (127-129)

redis_client (133-135)

backend/app/main.py (1)

create_app (44-127)

backend/tests/helpers/cleanup.py (1)

cleanup_db_and_redis (8-22)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (21)

.github/workflows/backend-ci.yml (8)

8-16: LGTM! Migration to bake-based workflow.

The updated trigger paths correctly reflect the migration from docker-compose.ci.yaml to docker-compose.yaml and docker-bake.hcl, ensuring the workflow runs when the relevant files change.

24-26: LGTM! Image version updates.

The Zookeeper and Kafka image references have been updated consistently with the workflow's migration to a new container setup.

77-77: LGTM! Complete cache coverage.

Adding ZOOKEEPER_IMAGE to the cached images ensures all infrastructure dependencies benefit from Docker layer caching.

93-96: LGTM! Explicit service startup.

The explicit list of services to start (mongo, redis, zookeeper-certgen, zookeeper, kafka, schema-registry) improves clarity and ensures only necessary infrastructure services run during integration tests.

142-150: LGTM! Improved log collection.

The updated log collection captures relevant service logs, and the artifact name integration-logs is more descriptive than the previous generic naming.

166-181: LGTM! Efficient image building with caching.

The Docker Buildx and bake-action setup with GitHub Actions cache (v2) is an excellent practice for speeding up CI builds. The mode=max setting ensures comprehensive layer caching for optimal performance.

195-204: LGTM! Proper K3s bootstrap.

The K3s setup correctly disables Traefik, configures kubeconfig with appropriate permissions, waits for cluster readiness, and idempotently creates the namespace. This provides a solid foundation for E2E testing.

308-326: LGTM! Comprehensive log collection.

The expanded log collection captures all relevant service logs (backend, saga-orchestrator, k8s-worker, pod-monitor, executor pods) and Kubernetes events, making debugging much easier when E2E tests fail. The artifact name e2e-logs is appropriately descriptive.

backend/tests/e2e/conftest.py (3)

16-21: LGTM: Proper session-scoped app fixture with cleanup.

The session-scoped app fixture correctly initializes the FastAPI application using test settings and ensures proper cleanup of the dishka container on teardown.

40-44: LGTM: Proper E2E test isolation with autouse cleanup.

The cleanup fixture correctly ensures each E2E test starts with a clean state by clearing both the database and Redis before each test.

24-37: SSL verification is disabled intentionally for local-only tests.

The client fixture disables SSL certificate verification (check_hostname=False, verify_mode=ssl.CERT_NONE), which is acceptable for local E2E tests. The design provides adequate safeguards:

The hostname is hardcoded to "localhost", preventing use against remote servers

TestSettings loads from .env.test, which configures all services to localhost endpoints

The fixture is part of the test-only conftest and relies on dependency injection

No additional runtime checks are necessary given the hardcoded localhost restriction.

backend/tests/conftest.py (5)

3-3: LGTM: Type annotations improve code clarity.

The addition of type imports (AsyncGenerator, Callable, Coroutine, Any, UserRole, FastAPI) improves type safety and IDE support for the test fixtures.

Also applies to: 5-5, 12-12, 16-16

40-40: LGTM: Using setdefault for CI flexibility.

The change from direct assignment to setdefault() correctly allows CI-provided environment variables to take precedence while still providing sensible defaults for local test execution. This improves the flexibility of the test infrastructure.

Also applies to: 47-47, 52-52, 57-57, 60-60, 63-66

82-82: LGTM: Proper async cleanup and type annotation.

The app fixture now correctly:

Declares the return type as AsyncGenerator[FastAPI, None]

Uses await to properly close the dishka container during teardown

Also applies to: 92-92

96-96: LGTM: Comprehensive type annotations for fixtures.

The type annotations for fixtures (AsyncContainer, AsyncGenerator[Database, None], AsyncGenerator[redis.Redis, None]) improve type safety and make the fixture contracts explicit. The use of await scope.get() for dependency resolution is also correct.

Also applies to: 103-103, 115-115, 121-121, 127-129, 133-135

167-180: LGTM: Excellent pattern for multi-user test isolation.

The make_user factory fixture provides a clean way to create multiple users with different roles in the same test, which is essential for testing authorization and multi-user scenarios. The type alias MakeUser and clear documentation make this fixture easy to use correctly.

.github/workflows/frontend-ci.yml (5)

8-9: LGTM: Centralized image management and updated triggers.

The migration to docker-bake.hcl and centralized environment variables for Docker images improves maintainability and aligns with the bake-based caching strategy.

Also applies to: 15-16, 20-26, 70-70

87-101: LGTM: Bake action with GitHub Actions cache.

The integration of docker/bake-action with GitHub Actions caching will improve build performance across CI runs.

111-111: LGTM: Idempotent namespace creation.

Using --dry-run=client with kubectl apply ensures the namespace creation is idempotent.

149-179: Well-structured staged startup with health checks.

The staged startup sequence (certs → infra → seeds → backend → frontend) with appropriate health checks and timeouts ensures reliable service initialization. The use of --wait --wait-timeout for infrastructure services is a good practice.

197-201: LGTM: Comprehensive log collection for debugging.

The consolidated log collection approach with separate component logs and Kubernetes events provides good diagnostic coverage for failure scenarios.

.github/workflows/backend-ci.yml

.github/workflows/frontend-ci.yml

backend/tests/conftest.py

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @backend/tests/conftest.py:
- Line 53: The SCHEMA_SUBJECT_PREFIX passed into TestSettings is ignored because
the Settings model does not define that field; add a new attribute
SCHEMA_SUBJECT_PREFIX: str = "" to the Settings class (the Pydantic model
declared at the top of the file) so Pydantic will include it, or alternatively
modify the test fixture that returns TestSettings (the function creating
TestSettings) to set os.environ["SCHEMA_SUBJECT_PREFIX"] = f"test.{worker_id}."
when worker_id != "master" so schema_registry.py will read the intended value
from the environment.

In @backend/tests/e2e/conftest.py:
- Around line 19-24: Remove the unused pytest_asyncio fixture named `app`:
delete the entire async fixture definition (the `app` function marked with
@pytest_asyncio.fixture) so the test suite no longer creates an in-process
FastAPI application and closes `application.state.dishka_container`; ensure no
other tests reference `app` before removal and run tests to confirm nothing else
imports or uses `create_app` in this file.

In @backend/tests/integration/conftest.py:
- Around line 8-22: The MyPy error comes from a type mismatch between the
annotated return Callable[[str], str] and the inner function _make which
currently has a default parameter; update the inner function signature in the
unique_id fixture so _make requires its prefix parameter (change _make from
having prefix: str = "" to prefix: str) so the implementation matches
Callable[[str], str]; ensure the docstring/examples remain consistent with the
required prefix usage.

🧹 Nitpick comments (3)

backend/tests/conftest.py (2)
132-133: Clarify error handling logic.

The condition checks for status codes not in (200, 201, 400), but if 400 is acceptable (e.g., user already exists), it should be handled explicitly rather than being lumped with success codes. The current logic is unclear about intent.
♻️ Proposed clarification

If 400 means user already exists and that's acceptable:
 r = await client.post("/api/v1/auth/register", json=creds)
-if r.status_code not in (200, 201, 400):
-    pytest.fail(f"Cannot create {role.value}: {r.status_code} - {r.text}")
+if r.status_code == 400:
+    # User already exists, continue to login
+    pass
+elif r.status_code not in (200, 201):
+    pytest.fail(f"Cannot create {role.value}: {r.status_code} - {r.text}")
If 400 is truly an error, remove it from the accepted list:
-if r.status_code not in (200, 201, 400):
+if r.status_code not in (200, 201):
     pytest.fail(f"Cannot create {role.value}: {r.status_code} - {r.text}")
164-186: Consider immutable pattern for authenticated clients.

Both authenticated_client and authenticated_admin_client mutate the shared client instance by updating headers. While documented, this pattern can lead to test pollution and confusion, especially if:

The base client fixture is used elsewhere after these fixtures run

Both fixtures are accidentally used in the same test

Headers persist across test boundaries unexpectedly
♻️ Safer alternative: Return new client instances
@pytest_asyncio.fixture
async def authenticated_client(app: FastAPI) -> httpx.AsyncClient:
    """HTTP client logged in as regular user (isolated instance)."""
    async with httpx.AsyncClient(
        transport=ASGITransport(app=app),
        base_url="https://test",
        timeout=30.0,
        follow_redirects=True,
    ) as client:
        user = await _register_and_login(client, UserRole.USER)
        client.headers.update(user["headers"])
        yield client

@pytest_asyncio.fixture
async def authenticated_admin_client(app: FastAPI) -> httpx.AsyncClient:
    """HTTP client logged in as admin (isolated instance)."""
    async with httpx.AsyncClient(
        transport=ASGITransport(app=app),
        base_url="https://test",
        timeout=30.0,
        follow_redirects=True,
    ) as client:
        admin = await _register_and_login(client, UserRole.ADMIN)
        client.headers.update(admin["headers"])
        yield client
This creates isolated client instances, eliminating mutation concerns while maintaining the same API.
backend/tests/e2e/conftest.py (1)
27-40: Consider adding a server availability check before yielding the client.

The fixture disables SSL verification (lines 30-32), which is acceptable for local testing. However, if the backend containers aren't running, tests will fail with cryptic connection errors.
🔍 Add a health check to provide clearer error messages
 @pytest_asyncio.fixture
 async def client(test_settings: Settings) -> AsyncGenerator[httpx.AsyncClient, None]:
     """HTTP client hitting real backend containers."""
     ssl_context = ssl.create_default_context()
     ssl_context.check_hostname = False
     ssl_context.verify_mode = ssl.CERT_NONE
 
     async with httpx.AsyncClient(
         base_url=f"https://localhost:{test_settings.SERVER_PORT}",
         timeout=60.0,
         follow_redirects=True,
         verify=ssl_context,
     ) as c:
+        # Verify server is reachable
+        try:
+            await c.get("/health", timeout=5.0)
+        except (httpx.ConnectError, httpx.TimeoutException) as e:
+            raise RuntimeError(
+                f"Backend container not reachable at https://localhost:{test_settings.SERVER_PORT}. "
+                f"Ensure containers are running. Error: {e}"
+            ) from e
         yield c

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b4a85b and cf513f8.

📒 Files selected for processing (4)

backend/tests/conftest.py
backend/tests/e2e/conftest.py
backend/tests/helpers/cleanup.py
backend/tests/integration/conftest.py

💤 Files with no reviewable changes (1)

backend/tests/helpers/cleanup.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/tests/integration/conftest.py (1)

backend/tests/conftest.py (1)

_make (158-159)

backend/tests/e2e/conftest.py (2)

backend/tests/conftest.py (3)

app (63-73)

scope (102-104)

test_settings (40-58)

backend/app/main.py (1)

create_app (44-127)

backend/tests/conftest.py (3)

backend/tests/e2e/conftest.py (2)

app (20-24)

test_settings (14-16)

backend/app/main.py (1)

create_app (44-127)

backend/app/settings.py (1)

Settings (11-160)

🪛 GitHub Actions: MyPy Type Checking

backend/tests/integration/conftest.py

[error] 1-1: Command failed with exit code 1.

backend/tests/e2e/conftest.py

[error] 1-1: Command failed with exit code 1.

backend/tests/conftest.py

[error] 53-53: mypy error: Unexpected keyword argument "SCHEMA_SUBJECT_PREFIX" for "TestSettings" [call-arg]

[error] 1-1: Command failed with exit code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: E2E Tests
GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (11)

backend/tests/integration/conftest.py (1)

1-5: LGTM! Clean imports and documentation.

The imports are minimal and appropriate for the new unique_id fixture approach. The module docstring clearly identifies this as integration test configuration.

backend/tests/conftest.py (7)

20-24: Good addition to prevent test stalls.

Disabling OpenTelemetry exporters in tests prevents connection attempts and stalls, improving test performance and reliability.

27-35: LGTM!

The TestSettings configuration is clean and the updated docstring clearly explains the behavior.

62-80: LGTM!

The async conversion of fixtures is correctly implemented with proper cleanup and type hints. Session scope is appropriately chosen to avoid Pydantic schema validator memory issues.

83-92: LGTM!

The async client fixture is well-configured with appropriate timeout and redirect settings.

95-104: LGTM!

The container scope management is correctly implemented following async patterns and Dishka best practices.

107-116: LGTM!

Database and Redis fixtures correctly use async dependency injection through the Dishka container.

55-55: Redis database count assumption is correct—no action needed.

The code correctly assumes 16 databases, which is Redis's default across all versions. The docker-compose.yaml uses redis:7-alpine without a custom --databases configuration, confirming this default applies here.

backend/tests/e2e/conftest.py (3)

1-11: LGTM! Imports are appropriate for E2E testing.

The imports correctly include httpx for HTTP client testing, SSL support for certificate handling, and the necessary FastAPI/Settings types.

13-16: LGTM! Correct design for E2E container testing.

The session-scoped test_settings appropriately returns default Settings without worker isolation, since E2E tests target shared real containers rather than per-worker test databases.

1-1: Provide the actual MyPy error message for diagnosis.

The review mentions a MyPy pipeline failure but does not include the error output. The file's type annotations appear syntactically correct for Python 3.12+ (AsyncGenerator from collections.abc, proper async function signatures). MyPy's configuration disables import-untyped and import-not-found errors, which would suppress most import resolution issues.

Without the specific error message from the pipeline, this cannot be verified. Re-run the MyPy check and include the actual error output to proceed.

backend/tests/conftest.py

backend/tests/e2e/conftest.py

backend/tests/integration/conftest.py

cubic-dev-ai

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/tests/conftest.py">

<violation number="1" location="backend/tests/conftest.py:134">
P2: `raise_for_status()` will raise an exception on 400 (user already exists), whereas the original code explicitly allowed this case. This could cause test failures in edge cases like parallel test execution or reruns. Consider keeping the 400 tolerance or explicitly handling the user-exists case.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/tests/conftest.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/backend-ci.yml:
- Line 271: The workflow is using empty per-run prefixes (KAFKA_TOPIC_PREFIX set
to "") which removes E2E test isolation; change KAFKA_TOPIC_PREFIX (and the
analogous KAFKA_SCHEMA_PREFIX referenced in the comment) to use the per-run
prefix pattern used by integration tests (e.g., "ci.${{ github.run_id }}.") so
each E2E run gets unique topics/schemas and avoids cross-run collisions.

In @.github/workflows/frontend-ci.yml:
- Around line 113-142: The heredoc used to write kubeconfig.yaml preserves the
leading spaces and produces invalid YAML; fix it by switching the delimiter to
an unquoted, stripped-heredoc (change "cat > kubeconfig.yaml << EOF" to "cat >
kubeconfig.yaml <<-EOF") and ensure the indented heredoc lines use tabs (or
remove all leading spaces) so variable expansion (HOST_IP, CA_CERT, CLIENT_CERT,
CLIENT_KEY) still works; do not quote the EOF so variables continue to expand.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf513f8 and 1017815.

📒 Files selected for processing (5)

.github/workflows/backend-ci.yml
.github/workflows/frontend-ci.yml
backend/tests/conftest.py
backend/tests/e2e/conftest.py
backend/tests/e2e/test_execution_routes.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/tests/conftest.py (2)

backend/app/main.py (1)

create_app (44-127)

backend/app/settings.py (1)

Settings (11-160)

backend/tests/e2e/conftest.py (2)

backend/tests/conftest.py (4)

app (65-79)

scope (108-110)

test_settings (40-60)

client (90-98)

backend/tests/unit/conftest.py (2)

app (27-28)

client (22-23)

backend/tests/e2e/test_execution_routes.py (3)

backend/tests/conftest.py (3)

app (65-79)

authenticated_client (178-187)

client (90-98)

backend/app/domain/enums/execution.py (1)

ExecutionStatus (4-25)

backend/tests/helpers/sse.py (1)

wait_for_execution_terminal (44-51)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: E2E Tests
GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (40)

.github/workflows/backend-ci.yml (8)

3-26: LGTM: Workflow triggers and environment variables properly updated.

The path triggers now correctly reference docker-compose.yaml and docker-bake.hcl, and the addition of Confluent Zookeeper/Kafka images aligns with the PR's infrastructure modernization.

67-151: LGTM: Integration tests properly configured.

The migration to docker compose v2 syntax with explicit service lists and the addition of Zookeeper to the cache are appropriate updates. Per-run topic/schema prefixes maintain test isolation.

160-181: LGTM: Excellent addition of bake-based builds with caching.

Using Docker Buildx with bake-action and GitHub Actions cache significantly improves build performance and consistency. The mode=max cache strategy is appropriate for CI.

195-204: LGTM: K3s setup is properly configured.

The K3s installation and kubeconfig setup are correctly implemented with appropriate timeouts and namespace creation. The namespace creation uses --dry-run=client for idempotency.

240-262: LGTM: Service startup sequence is well-structured.

The sequential startup with certificate generation first, followed by infrastructure, then backend and workers, with appropriate waits and timeouts, ensures services are ready before tests run.

300-318: LGTM: Comprehensive log collection for E2E debugging.

The expanded log collection includes all relevant services (backend, orchestrators, workers, monitor) and Kubernetes resources, which will significantly aid in debugging E2E test failures.

206-238: No action needed—kubeconfig.yaml is already in the root .gitignore file, so the sensitive Kubernetes certificates will not be committed to the repository.

278-282: The environment variables REQUESTS_CA_BUNDLE="" and CURL_CA_BUNDLE="" are unused in the codebase. The e2e tests properly disable SSL verification in code by configuring an explicit ssl_context with verify_mode = ssl.CERT_NONE when creating the httpx.AsyncClient. The suggested alternative of using backend/certs/ca.crt is not viable—no such file exists, and the environment variables are never referenced by the actual test code.

Likely an incorrect or invalid review comment.

backend/tests/e2e/conftest.py (2)

11-14: Intentional difference from main conftest.py noted.

The E2E test_settings fixture returns Settings() directly without worker_id isolation, while backend/tests/conftest.py (lines 39-60) uses worker-specific isolation for pytest-xdist. The docstring correctly documents this as intentional ("no worker isolation") since E2E tests hit shared real containers.

17-29: LGTM - SSL disabled appropriately for localhost E2E tests.

Disabling SSL verification for localhost container tests is acceptable. The 60-second timeout is reasonable for E2E tests involving K8s pod lifecycle.

backend/tests/e2e/test_execution_routes.py (15)

18-33: LGTM!

Correctly validates that unauthenticated requests return 401 and includes meaningful error messages.

35-67: LGTM!

Good use of Pydantic schema validation and proper UUID verification. The status assertion correctly accounts for all valid states after submission.

69-98: LGTM!

Correctly handles the async nature of execution - validates schema immediately and conditionally checks output only if completed.

100-130: LGTM!

Good use of SSE-based terminal state waiting. The 120s timeout accounts for K8s pod lifecycle in E2E tests. Error validation correctly checks both stderr and stdout.

132-171: LGTM!

Correctly validates resource tracking with schema validation and sensible bounds checking on execution time.

173-195: LGTM!

Appropriately flexible - accepts both successful execution and rejection of unsupported versions.

197-224: LGTM!

Reasonable smoke test for large output handling without blocking on completion.

226-262: LGTM!

Correctly handles the race condition between execution completion and cancellation. Good assertion message for debugging unexpected status codes.

264-298: LGTM!

Smart bounded test that verifies execution tracking without waiting for the platform's 300s timeout.

300-345: LGTM!

Comprehensive sandbox restriction testing with appropriate handling of various blocking outcomes (rejection, failure, or permission denied output).

347-376: LGTM!

Good concurrency test with proper handling of rate limiting and verification of execution ID uniqueness.

378-386: LGTM!

Correctly uses unauthenticated client for this public endpoint.

388-399: LGTM!

Good use of Pydantic model_validate for schema validation with sensible value assertions.

401-407: LGTM!

Correctly validates paginated response structure.

409-432: LGTM!

Critical test for idempotency guarantee - correctly validates that duplicate requests with the same key return the same execution ID.

backend/tests/conftest.py (8)

20-25: LGTM!

Good practice to disable OpenTelemetry exporters in tests to prevent stalls. Using setdefault correctly preserves any CI-specific overrides.

27-35: LGTM!

extra="ignore" is appropriate for test settings to allow flexibility in test environments.

38-60: LGTM - Well-designed pytest-xdist isolation.

Good worker isolation strategy. Note that setting os.environ["SCHEMA_SUBJECT_PREFIX"] (line 54) is a side effect that persists beyond the fixture scope, but this is acceptable since it's only read by schema registry clients during test initialization.

63-79: LGTM!

Session-scoped app with proper lifespan management avoids Pydantic schema validator memory issues. Container cleanup is correctly handled in teardown.

89-98: LGTM!

Function-scoped client with ASGITransport is correct for in-process API testing.

125-154: LGTM!

Well-designed helper that handles existing users gracefully (400/409 during registration). Using data= for login correctly implements OAuth2 password flow. CSRF token extraction and header setup is clean.

157-174: LGTM!

Clean factory pattern with proper type alias. The docstring examples are helpful for test authors.

177-199: LGTM - Documented mutation pattern.

The warning about not using both client and authenticated_client in the same test is important and well-documented. This efficient pattern avoids creating separate HTTP connections.

.github/workflows/frontend-ci.yml (7)

8-9: LGTM!

Adding docker-compose.yaml and docker-bake.hcl to the trigger paths ensures frontend E2E tests run when build configuration changes, which could affect the test environment.

Also applies to: 15-16

20-26: LGTM!

Centralizing Docker image versions as environment variables improves maintainability and ensures consistency across the workflow.

70-70: LGTM!

Adding ZOOKEEPER_IMAGE to the cache list ensures all infrastructure images are properly cached for faster CI runs.

111-111: LGTM!

The idempotent namespace creation using --dry-run=client -o yaml | kubectl apply -f - is a robust pattern that won't fail on re-runs.

144-169: LGTM!

The staged startup sequence with proper wait conditions (--wait --wait-timeout) ensures dependencies are ready before proceeding. The separation of concerns (certs → infra → backend → frontend) and explicit health check waiting is a robust approach for CI orchestration.

184-192: LGTM!

Comprehensive log collection for debugging CI failures, including both docker-compose and Kubernetes events. The || true on the kubectl command is a good defensive pattern.

87-101: The bake-action setup with GitHub Actions cache (type=gha, mode=max) is a good approach for efficient multi-target builds.

Regarding the version: latest in docker/setup-buildx-action@v3: Using latest for Docker Buildx in CI is acceptable and even recommended, as it ensures you receive security patches and bug fixes. If pinning is desired, the current stable version is v0.30.1, but this is not necessary for this workflow.

.github/workflows/frontend-ci.yml

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/workflows/backend-ci.yml">

<violation number="1" location=".github/workflows/backend-ci.yml:217">
P0: Invalid YAML: The heredoc content has no indentation, breaking the `run: |` literal block. GitHub Actions will fail to parse this workflow file. The content inside the heredoc must be indented consistently with the rest of the shell script block.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/workflows/backend-ci.yml

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/backend-ci.yml:
- Line 271: Add a GitHub Actions concurrency block to the E2E workflow to
prevent overlapping runs: insert a top-level (or the E2E job-level) YAML snippet
with "concurrency: group: backend-e2e-${{ github.ref }} cancel-in-progress:
true" alongside the existing workflow variables (near where KAFKA_TOPIC_PREFIX
and SCHEMA_SUBJECT_PREFIX are defined) so concurrent E2E runs on the same runner
are serialized and in-progress runs are canceled.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1017815 and b011e5e.

📒 Files selected for processing (2)

.github/workflows/backend-ci.yml
backend/tests/conftest.py

🧰 Additional context used

🪛 actionlint (1.7.10)

.github/workflows/backend-ci.yml

235-235: could not parse as YAML: could not find expected ':'

(syntax-check)

🪛 YAMLlint (1.37.1)

.github/workflows/backend-ci.yml

[error] 239-239: syntax error: could not find expected ':'

(syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: E2E Tests
GitHub Check: cubic · AI code reviewer

🔇 Additional comments (14)

backend/tests/conftest.py (8)

20-30: LGTM! Environment setup prevents test infrastructure issues.

The environment variable configuration appropriately disables tracing/telemetry and rate limiting to prevent test flakiness. The use of setdefault allows override via environment if needed.

Note: These are module-level side effects that execute on import. This is standard for conftest.py but means all tests share this configuration.

32-40: LGTM! Test configuration follows best practices.

The TestSettings class correctly isolates test configuration by loading from .env.test instead of production .env.

43-65: LGTM! Worker isolation properly configured for parallel testing.

The fixture correctly provides worker-specific isolation for pytest-xdist, using unique database names, Kafka group suffixes, and Redis DB indices.

Minor note: Line 63 uses worker_num % 16 which means workers beyond 16 would cycle back (e.g., worker 0 and worker 16 share Redis DB 0). This is acceptable since DATABASE_NAME and KAFKA_GROUP_SUFFIX provide additional isolation, but with >16 workers there's a small risk of Redis key conflicts.

68-91: LGTM! App lifecycle properly managed with async fixtures.

The app fixture correctly:

Uses worker-specific test_settings for isolation

Runs the lifespan context to initialize Beanie ODM and other startup tasks

Properly closes the Dishka container on teardown

Session scope is appropriate here for performance, and per-test fixtures like scope provide test isolation.

94-127: LGTM! Test client and dependency injection fixtures properly configured.

The fixtures correctly provide:

Async HTTPX client with ASGI transport for in-memory API testing

Per-test Dishka container scope for proper test isolation

Database and Redis client injection from the container scope

162-179: LGTM! Factory pattern enables flexible multi-user testing.

The MakeUser type alias and make_user fixture provide an excellent pattern for creating multiple users with different roles within a single test. This is much more flexible than fixed-role fixtures.

The docstring examples clearly demonstrate the intended usage.

182-204: Client mutation pattern is acceptable with clear documentation.

These fixtures mutate and return the same client instance with authentication headers applied. While fixture mutation is typically a code smell, the explicit warnings in the docstrings (lines 186-188, 200) mitigate confusion by directing users to the make_user fixture for multi-user scenarios.

For single-user convenience tests, this pattern reduces boilerplate.

130-160: The test helper correctly handles user creation and authentication with worker-isolated databases. The intentional lack of per-test user cleanup is appropriate given the pytest-xdist worker isolation strategy, where each worker has a dedicated database. No further action needed.

.github/workflows/backend-ci.yml (6)

166-181: LGTM: Docker Buildx and bake integration.

The Buildx setup and bake action configuration correctly implements build caching with GitHub Actions cache. The mode=max setting ensures maximum cache reuse, and load: true ensures images are available for subsequent steps.

196-204: LGTM: K3s setup is correctly configured.

The K3s installation disables Traefik to avoid port conflicts, configures kubeconfig with appropriate permissions, and creates the namespace idempotently. The 90-second timeout for cluster readiness is reasonable.

207-237: Kubeconfig generation logic is correct; ignore static analysis warnings.

The kubeconfig generation for containerized workers is well-designed: it extracts certificates from the k3s kubeconfig and creates a new configuration with the host IP so containers can reach the k3s API server.

The static analysis warnings at lines 235 and 239 are false positives—the YAML linters are incorrectly parsing the kubeconfig content inside the bash heredoc as part of the workflow YAML syntax.

279-282: LGTM: Certificate verification correctly disabled for E2E tests.

Setting empty CA bundle environment variables disables certificate verification for the self-signed certificates used in the E2E environment. This is appropriate for testing and is clearly documented in the comments.

93-96: Infrastructure service list is complete and correct.

All six services (mongo redis zookeeper-certgen zookeeper kafka schema-registry) are properly defined in docker-compose.yaml. The explicit list correctly provides the minimum infrastructure needed for integration tests without running application workers or backend services.

240-262: Service startup sequence is well-structured.

The three-phase startup (certificates → infrastructure → backend/workers) with explicit waits ensures proper initialization order. Docker Compose properly handles init job dependencies: kafka-init has service_completed_successfully condition and all workers depend on kafka-init, ensuring Kafka topics exist before workers start. The --no-build flag correctly relies on images pre-built by the bake action.

.github/workflows/backend-ci.yml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/frontend-ci.yml (1)
115-115: Consider more robust HOST_IP extraction.

The awk '{print $7}' assumes the host IP is always the 7th field in ip route get 1 output. This can be fragile if the output format varies across Linux distributions or versions.
♻️ More robust extraction
-          HOST_IP=$(ip route get 1 | awk '{print $7}')
+          HOST_IP=$(ip route get 1 | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1)}')
This parses by looking for the src keyword rather than relying on field position.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b011e5e and 4a00f3c.

📒 Files selected for processing (1)

.github/workflows/frontend-ci.yml

🧰 Additional context used

🪛 actionlint (1.7.10)

.github/workflows/frontend-ci.yml

141-141: could not parse as YAML: could not find expected ':'

(syntax-check)

🪛 YAMLlint (1.37.1)

.github/workflows/frontend-ci.yml

[error] 144-144: syntax error: could not find expected ':'

(syntax)

🔇 Additional comments (6)

.github/workflows/frontend-ci.yml (6)

20-26: Good practice: Centralized image version management.

Defining Docker images as environment variables improves maintainability and makes version updates easier.

87-101: LGTM: Bake-based build with caching.

The Docker Buildx setup with bake-action and GitHub Actions cache integration aligns well with the PR's move to docker-bake.hcl. This should improve build performance and consistency.

103-112: LGTM: Kubernetes setup with proper configuration.

The k3s setup is well-configured with appropriate TLS SAN for Docker container communication and idempotent namespace creation.

122-141: Static analysis false positive: heredoc YAML is valid.

The YAML linter reports a syntax error at line 141 because it's attempting to parse the heredoc content (lines 123-140) as part of the workflow YAML. This is a false positive—the shell heredoc syntax is correct. The EOF delimiter properly terminates the cat > kubeconfig.yaml << EOF command started at line 122.

144-170: LGTM: Well-structured staged startup.

The staged startup sequence (certificates → infrastructure → backend/workers → frontend) with appropriate wait times and health checks ensures services start in the correct order with proper dependency handling.

184-192: LGTM: Comprehensive log collection for debugging.

Log collection covers all critical services and includes Kubernetes events with appropriate error handling. This will aid in debugging CI failures.

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In @backend/tests/integration/db/repositories/test_event_repository.py:
- Around line 209-225: The MyPy error is caused by pipeline being inferred as
list[object]; in test_aggregate_events, add an explicit type annotation to
pipeline as list[dict[str, Any]] and ensure Any is imported at the top of the
file; keep the same pipeline content and pass it to
EventRepository.aggregate_events as before.

In @backend/tests/integration/db/repositories/test_notification_repository.py:
- Around line 20-39: The factory _make_notification_create is constructing
DomainNotificationCreate with wrong fields; update it to match the create schema
by removing notification_id and status, rename title→subject and message→body,
and change channels (list)→channel (single NotificationChannel.IN_APP), and
ensure tags default remains; then update the test assertion that currently
checks retrieved.title to instead check retrieved.subject (DomainNotification
uses subject/body).
- Around line 42-60: The factory _make_notification_create is passing title and
message to DomainNotificationCreate but the domain model expects subject and
body, and the test later accesses retrieved.title which doesn't exist on
DomainNotification; update _make_notification_create to map title -> subject and
message -> body when constructing DomainNotificationCreate (use the
DomainNotificationCreate fields subject and body) and change the assertion in
test_create_and_get_notification from asserting retrieved.title to asserting
retrieved.subject so it matches DomainNotification's API.

In @backend/tests/unit/events/test_kafka_events.py:
- Around line 274-285: The test functions use a too-generic annotation
`event_cls: type`, causing MyPy to complain when accessing the class attribute
`topic`; change the annotation to `event_cls: type[BaseEvent]` (or the
appropriate BaseEvent subclass union) in
`test_execution_event_types_and_topics`, `test_saga_event_types`,
`test_saga_command_events`, `test_notification_event_types`, and
`test_user_event_types` so MyPy recognizes that `event_cls` has the `topic`
class variable and other BaseEvent members.
- Around line 106-118: The test test_ensure_correlation_id_generates_when_empty
has a tautological assertion; update it to call
AvroEventMetadata.ensure_correlation_id() and assert that the returned
result.correlation_id is non-empty and a valid UUID (e.g., parseable by
uuid.UUID) and that it differs from the original empty string; reference
AvroEventMetadata.ensure_correlation_id and validate the UUID format rather than
using the current always-true assertion.

In @backend/tests/unit/services/grafana/test_grafana_alert_processor.py:
- Line 51: Remove the now-unnecessary "# type: ignore" suppression from the test
tuple entry; specifically edit the test case tuple (None, None, "warning") in
test_grafana_alert_processor.py and delete the trailing " # type: ignore" so
mypy no longer reports an unused ignore comment.
- Line 184: The trailing "# type: ignore" on the tuple (None, "Alert triggered")
in the test_grafana_alert_processor.py unit test is unnecessary; remove that
unused type-ignore comment so the line becomes just the tuple literal. Locate
the tuple occurrence in the test case (search for (None, "Alert triggered") in
test_grafana_alert_processor.py) and delete the trailing "# type: ignore" token.
- Line 140: Remove the unused "# type: ignore" comment from the tuple literal
(None, None, "Grafana Alert") in the test data within
test_grafana_alert_processor.py so the test no longer contains an unnecessary
type-ignore annotation.

🧹 Nitpick comments (5)

backend/tests/unit/services/auth/test_auth_service.py (3)
156-160: Remove unnecessary parametrization.

Using @pytest.mark.parametrize with a single value adds overhead without benefit. Consider removing the decorator and testing UserRole.USER directly.
♻️ Simplify by removing parametrize
-    @pytest.mark.parametrize(
-        "role",
-        [UserRole.USER],
-        ids=["regular-user"],
-    )
     async def test_raises_for_non_admin_role(
         self,
         auth_service: AuthService,
         mock_logger: MagicMock,
-        role: UserRole,
     ) -> None:
         """Raises AdminAccessRequiredError for non-admin roles."""
         fake_user = FakeUser(
             user_id="user-123",
             username="normaluser",
             email="[email protected]",
-            role=role,
+            role=UserRole.USER,
         )
183-184: Logger assertion is fragile.

Asserting on the exact content of log messages couples the test to the log message format. If the log message changes, this test breaks even though the behavior remains correct.

Consider either:

Verifying only that warning was called (if the message content isn't critical)

Extracting log message format to a constant that both the code and test reference

199-207: Use FakeRequest for consistency.

This test uses MagicMock() while other tests use FakeRequest. Consider using FakeRequest for consistency across the test suite, even when testing edge cases.

♻️ Align with other tests

You could either:

Extend FakeRequest to support this edge case, or

Keep MagicMock but add a comment explaining why this test requires different mocking

If the intent is to test cookie handling when cookies.get() returns None (vs. an empty dict), the current approach works but the distinction should be documented.
backend/tests/integration/db/repositories/test_saga_repository.py (2)
10-10: Unused logger variable.

_test_logger is defined but never used in this file. Consider removing it or using it for test debugging output.
♻️ Suggested fix
-_test_logger = logging.getLogger("test.db.repositories.saga_repository")
And if logging is no longer needed:
-import logging
110-116: Consider verifying exact counts for stronger assertions.

Using >= 3 and >= 2 assertions is a pragmatic choice for shared-database integration tests, but since unique_id provides per-test isolation for the execution_id, you could potentially assert exact counts (== 3 and == 2) for the sagas created in this test. This would catch scenarios where fewer records than expected are inserted.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2153bf and 00e2f42.

📒 Files selected for processing (9)

backend/tests/integration/db/repositories/test_event_repository.py
backend/tests/integration/db/repositories/test_notification_repository.py
backend/tests/integration/db/repositories/test_saga_repository.py
backend/tests/unit/events/test_kafka_events.py
backend/tests/unit/services/auth/__init__.py
backend/tests/unit/services/auth/test_auth_service.py
backend/tests/unit/services/grafana/__init__.py
backend/tests/unit/services/grafana/test_grafana_alert_processor.py
tests.md

✅ Files skipped from review due to trivial changes (1)

tests.md

🧰 Additional context used

🧬 Code graph analysis (4)

backend/tests/integration/db/repositories/test_notification_repository.py (2)

backend/app/db/repositories/notification_repository.py (8)

get_notification (41-45)

update_notification (30-39)

count_notifications (94-96)

try_claim_pending (104-117)

find_pending_notifications (119-132)

upsert_subscription (163-180)

get_subscription (155-161)

get_all_subscriptions (182-190)

backend/tests/integration/conftest.py (1)

unique_id (9-22)

backend/tests/integration/db/repositories/test_event_repository.py (4)

backend/app/db/repositories/event_repository.py (1)

EventRepository (23-295)

backend/app/domain/events/event_metadata.py (1)

EventMetadata (10-25)

backend/app/domain/events/event_models.py (1)

correlation_id (60-61)

backend/tests/integration/conftest.py (1)

unique_id (9-22)

backend/tests/unit/events/test_kafka_events.py (6)

backend/app/domain/enums/auth.py (1)

LoginMethod (4-10)

backend/app/domain/enums/kafka.py (1)

KafkaTopic (7-53)

backend/app/domain/execution/models.py (1)

ResourceUsageDomain (15-19)

backend/app/infrastructure/kafka/events/base.py (3)

BaseEvent (13-37)

to_dict (32-34)

to_json (36-37)

backend/app/infrastructure/kafka/events/execution.py (7)

ExecutionCompletedEvent (86-93)

ExecutionFailedEvent (96-105)

ExecutionQueuedEvent (61-66)

ExecutionRequestedEvent (13-49)

ExecutionRunningEvent (69-74)

ExecutionStartedEvent (77-83)

ExecutionTimeoutEvent (108-115)

backend/app/infrastructure/kafka/events/metadata.py (2)

AvroEventMetadata (9-31)

ensure_correlation_id (28-31)

backend/tests/unit/services/grafana/test_grafana_alert_processor.py (1)

backend/app/services/grafana_alert_processor.py (7)

extract_severity (33-37)

map_severity (40-44)

extract_title (47-49)

build_message (52-61)

build_metadata (64-71)

process_single_alert (73-108)

process_webhook (110-150)

🪛 GitHub Actions: MyPy Type Checking

backend/tests/integration/db/repositories/test_notification_repository.py

[error] 30-30: Unexpected keyword argument "notification_id" for "DomainNotificationCreate" [call-arg]

[error] 30-30: Unexpected keyword argument "title" for "DomainNotificationCreate" [call-arg]

[error] 30-30: Unexpected keyword argument "message" for "DomainNotificationCreate" [call-arg]

[error] 30-30: Unexpected keyword argument "status" for "DomainNotificationCreate" [call-arg]

[error] 30-30: Unexpected keyword argument "channels" for "DomainNotificationCreate"; did you mean "channel"? [call-arg]

[error] 59-59: "DomainNotification" has no attribute "title" [attr-defined]

backend/tests/integration/db/repositories/test_event_repository.py

[error] 223-223: Argument 1 to "aggregate_events" of "EventRepository" has incompatible type "list[object]"; expected "list[dict[str, Any]]" [arg-type]

backend/tests/unit/events/test_kafka_events.py

[error] 285-285: "type" has no attribute "topic" [attr-defined]

[error] 409-409: "type" has no attribute "topic" [attr-defined]

[error] 491-491: "type" has no attribute "topic" [attr-defined]

[error] 576-576: "type" has no attribute "topic" [attr-defined]

[error] 639-639: "type" has no attribute "topic" [attr-defined]

backend/tests/unit/services/grafana/test_grafana_alert_processor.py

[error] 51-51: Unused "type: ignore" comment [unused-ignore]

[error] 140-140: Unused "type: ignore" comment [unused-ignore]

[error] 184-184: Unused "type: ignore" comment [unused-ignore]

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: cubic · AI code reviewer

🔇 Additional comments (35)

backend/tests/unit/services/auth/test_auth_service.py (4)

14-39: LGTM: Clean test mocks.

The FakeUser and FakeRequest classes provide minimal, focused mocks for testing authentication flows. The structure is clear and appropriate for unit tests.

41-57: LGTM: Well-structured fixtures.

The fixtures provide appropriate mocks for AuthService dependencies. The typing and structure are correct.

59-131: LGTM: Comprehensive authentication tests.

The test coverage for get_current_user is thorough, including:

Missing/empty token validation

Multiple role scenarios via parametrization

Exception propagation from the security service

Proper verification of security_service interactions

209-226: LGTM: Timestamp preservation verified.

This test ensures that created_at and updated_at fields propagate correctly from the domain user to the response, which is important for data integrity.

backend/tests/unit/services/grafana/test_grafana_alert_processor.py (6)

13-34: LGTM! Well-structured test fixtures.

The fixtures properly mock dependencies using AsyncMock for async operations and MagicMock with spec for the logger, ensuring type-safe testing of the GrafanaAlertProcessor.

62-74: LGTM! Comprehensive severity extraction tests.

The parameterized test covers all precedence scenarios, case-insensitivity, and default behavior effectively.

77-123: LGTM! Thorough severity mapping tests.

The test suite comprehensively validates severity mapping behavior, including status-aware overrides and edge cases for unknown severities.

197-207: LGTM! Message building tests cover all annotation scenarios.

The tests validate proper message construction from summary/description annotations with correct fallback behavior.

210-293: LGTM! Comprehensive metadata and single alert processing tests.

The test suite effectively validates:

Metadata construction with proper status and label precedence

Successful alert processing with correct notification payloads

Error handling with appropriate logging

296-402: LGTM! Excellent webhook processing and constants validation tests.

The test suite effectively validates:

Batch processing of multiple alerts with correct counts

Resilience to individual alert failures (continues processing)

Proper logging at webhook level

Class-level constants ensuring expected mappings and defaults

The use of async def side_effect with nonlocal for simulating partial failures is a good pattern for testing error resilience.

backend/tests/integration/db/repositories/test_saga_repository.py (4)

15-32: Well-designed factory function.

The _make_saga factory provides sensible defaults and the execution_id derivation from saga_id is a practical approach for test isolation.

263-265: Same timestamp persistence concern for completed_at.

Similar to the created_at concern above, ensure that completed_at is preserved through upsert_saga. If not, the average_duration_seconds statistic might not be computed correctly for these test sagas.

35-81: Good test coverage for basic CRUD operations.

These tests properly cover the upsert behavior (insert vs update detection), retrieval by ID, and not-found scenarios. The use of the unique_id fixture ensures test isolation.

180-219: Thorough state update testing.

Good coverage of state updates including success path, error message propagation, and not-found behavior.

backend/tests/integration/db/repositories/test_event_repository.py (15)

1-14: LGTM! Clean test module setup.

The imports, test logger, and integration marker are well-organized and appropriate for integration testing of the EventRepository.

16-40: LGTM! Well-designed test factory.

The _make_event factory provides a clean way to create Event domain objects with sensible defaults. Tests that require isolation properly override the defaults using the unique_id fixture.

42-56: LGTM! Solid foundation test.

This test validates the basic store-and-retrieve flow, which is essential for all other repository operations.

58-64: LGTM! Good edge case coverage.

Validates the repository's behavior when querying non-existent events.

66-90: LGTM! Comprehensive aggregate query test.

Tests both unfiltered and type-filtered retrieval, with appropriate use of >= assertions for integration test isolation.

92-115: LGTM! Thorough pagination test.

Validates both the first and second page of results, including the has_more flag for pagination UX.

117-138: LGTM! Well-designed filtering test.

Validates the system event exclusion logic by creating both types of events and verifying the filter works correctly.

140-160: LGTM! Good statistics test.

Tests the aggregation pipeline with time filtering and validates the returned data structure.

162-188: LGTM! Comprehensive user event test.

Tests both unfiltered and type-filtered user event retrieval with proper validation.

190-207: LGTM! Good arbitrary query test.

Validates the generic query interface with metadata filtering.

227-243: LGTM! Good distinct values test.

Tests the list_event_types method with appropriate filtering.

245-263: LGTM! Thorough archival test.

Validates both the archival record creation and the removal of the original event.

265-273: LGTM! Good edge case coverage.

Validates graceful handling of deletion attempts on non-existent events.

275-299: LGTM! Comprehensive replay info test.

Tests the replay information aggregation with proper timestamp ordering and validation.

301-306: LGTM! Good edge case coverage.

Validates the behavior when querying replay info for non-existent aggregates.

backend/tests/unit/events/test_kafka_events.py (6)

59-78: LGTM! Well-structured test fixtures.

The metadata and resource_usage fixtures provide clean, reusable test data that aligns with the domain models.

121-207: LGTM! Comprehensive BaseEvent tests.

The tests thoroughly validate auto-generation of event_id and timestamp, plus both to_dict and to_json serialization paths with proper JSON integrity checks.

287-337: LGTM! Thorough execution event validation.

The tests for ExecutionCompleted, ExecutionFailed, and ExecutionTimeout events properly validate event types, topics, and domain-specific fields including resource usage.

411-425: LGTM! Comprehensive saga cancellation test.

The test properly validates all saga cancellation details including completed steps, compensated steps, and cancellation metadata.

641-653: LGTM! User settings event test validates correctly.

The test properly verifies event type, topic, and changed_fields for user settings updates.

655-716: LGTM! Excellent serialization edge case coverage.

The tests properly validate:

Nested structures (pod_spec, runtime_command lists)

Unicode characters and emoji in scripts

Optional fields with None values

This ensures robust event serialization across real-world scenarios.

backend/tests/integration/db/repositories/test_event_repository.py

backend/tests/integration/db/repositories/test_notification_repository.py

backend/tests/unit/events/test_kafka_events.py

backend/tests/unit/services/grafana/test_grafana_alert_processor.py

cubic-dev-ai

5 issues found across 9 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/tests/unit/events/test_kafka_events.py">

<violation number="1" location="backend/tests/unit/events/test_kafka_events.py:118">
P2: Tautological assertion that always passes. Since `metadata.correlation_id` is set to `""`, the condition `metadata.correlation_id == ""` is always `True`, making this assertion meaningless. The test should verify that a new correlation_id was generated.</violation>
</file>

<file name="backend/tests/integration/db/repositories/test_event_repository.py">

<violation number="1" location="backend/tests/integration/db/repositories/test_event_repository.py:101">
P1: The `unique_id("evt-")` call inside the loop returns the same ID on every iteration because the fixture generates a fixed suffix per test. All 5 events will have duplicate `event_id` values, causing potential database errors or test failures. Each iteration needs a truly unique ID.</violation>

<violation number="2" location="backend/tests/integration/db/repositories/test_event_repository.py:149">
P1: The `unique_id("evt-")` call returns the same ID on every iteration because the fixture generates a fixed suffix per test. All 4 events will have duplicate `event_id` values. Consider using `f"{unique_id('evt-')}-{event_type.value}-{i}"` or similar to ensure uniqueness.</violation>

<violation number="3" location="backend/tests/integration/db/repositories/test_event_repository.py:198">
P1: The `unique_id("evt-")` call returns the same ID on every iteration. All 3 events will have duplicate `event_id` values.</violation>

<violation number="4" location="backend/tests/integration/db/repositories/test_event_repository.py:217">
P1: The `unique_id("evt-")` call returns the same ID on every iteration. All 3 events will have duplicate `event_id` values.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/tests/integration/db/repositories/test_event_repository.py

backend/tests/unit/events/test_kafka_events.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @backend/tests/integration/db/repositories/test_event_repository.py:
- Around line 163-189: The test test_get_user_events_paginated inconsistently
passes EventType values (line using EventType.EXECUTION_REQUESTED.value) and
uses a non-strict assertion on the total count; update the test to use the same
enum format that get_user_events_paginated expects (either pass
EventType.EXECUTION_REQUESTED or its .value consistently) and change the
assertion to assert result.total == 3 to require the exact stored count; ensure
the filtering call to get_user_events_paginated uses the same enum type as the
earlier calls so the filtered.events check remains valid.

In @backend/tests/unit/events/test_kafka_events.py:
- Around line 106-118: The test test_ensure_correlation_id_generates_when_empty
is using a weak assertion; update it to assert that calling
AvroEventMetadata.ensure_correlation_id on an instance with correlation_id=""
returns a non-empty, valid UUID string and (optionally) differs from the
original empty value. Specifically, in the test for ensure_correlation_id
replace the final assertion with checks that result.correlation_id is not empty
and can be parsed as a UUID (use uuid.UUID(result.correlation_id) to validate).

🧹 Nitpick comments (15)

backend/tests/integration/db/repositories/test_notification_repository.py (5)
19-34: Consider making the channel parameter configurable.

The factory helper always creates IN_APP notifications. While this works, adding a channel parameter with a default value would allow tests to more easily verify behavior across different notification channels.
♻️ Optional enhancement to add channel parameter
 def _make_notification_create(
     user_id: str,
     subject: str = "Test Notification",
     body: str = "Test message content",
     severity: NotificationSeverity = NotificationSeverity.MEDIUM,
     tags: list[str] | None = None,
+    channel: NotificationChannel = NotificationChannel.IN_APP,
 ) -> DomainNotificationCreate:
     """Factory for notification create data."""
     return DomainNotificationCreate(
         user_id=user_id,
-        channel=NotificationChannel.IN_APP,
+        channel=channel,
         subject=subject,
         body=body,
         severity=severity,
         tags=tags or ["test"],
     )
118-133: Consider using exact assertion for better test precision.

Line 132 uses assert count >= 3, but since the test creates exactly 3 DELIVERED notifications for a unique user, the assertion could be assert count == 3 for more precise verification.
♻️ Proposed change for exact assertion
     count = await repo.mark_all_as_read(user_id)
-    assert count >= 3
+    assert count == 3
150-177: Test coverage could be expanded for additional filters.

The list_notifications method supports exclude_tags, tag_prefix, and pagination (skip, limit) parameters, but these aren't tested here. Consider adding test cases for:

Filtering with exclude_tags

Filtering with tag_prefix (e.g., tag_prefix="alert" should match "alert:critical")

Pagination behavior with skip and limit

229-241: Missing edge case for scheduled notifications.

The find_pending_notifications method in the repository respects scheduled_for and only returns notifications where scheduled_for is None or in the past. Consider adding a test case that creates a notification with a future scheduled_for timestamp and verifies it's excluded from the results.

Additionally, the repository has a find_scheduled_notifications method that isn't tested at all.

1-283: Recommend adding tests for remaining repository methods.

Several public methods in NotificationRepository lack integration test coverage:

find_scheduled_notifications - Retrieves pending notifications with scheduled_for <= now

cleanup_old_notifications - Deletes notifications older than a specified number of days

get_users_by_roles - Queries users by role for notification targeting

get_active_users - Identifies recently active users across multiple collections

Adding tests for these methods would provide more comprehensive coverage and confidence in the notification system's behavior.

Would you like me to generate test cases for these missing methods?
backend/tests/unit/services/grafana/test_grafana_alert_processor.py (1)
13-18: Consider simplifying the mock setup.

Line 17 explicitly assigns AsyncMock() to create_system_notification, but AsyncMock already provides async mocks for all attribute accesses automatically. The explicit assignment is redundant.
♻️ Simplified fixture
 @pytest.fixture
 def mock_notification_service() -> AsyncMock:
     """Mock notification service."""
-    service = AsyncMock()
-    service.create_system_notification = AsyncMock()
-    return service
+    return AsyncMock()
backend/tests/integration/db/repositories/test_event_repository.py (9)
17-40: Consider converting to a pytest fixture for better reusability.

The factory function works well as-is, but converting it to a @pytest.fixture would improve discoverability and allow pytest's dependency injection. This is a minor style preference.
♻️ Optional refactor to fixture pattern
-def _make_event(
-    event_id: str,
-    event_type: EventType = EventType.EXECUTION_REQUESTED,
-    aggregate_id: str | None = None,
-    correlation_id: str = "corr-test",
-    user_id: str | None = None,
-    service_name: str = "test-service",
-    timestamp: datetime | None = None,
-) -> Event:
-    """Factory for Event domain objects."""
-    return Event(
-        event_id=event_id,
-        event_type=event_type,
-        event_version="1.0",
-        timestamp=timestamp or datetime.now(timezone.utc),
-        metadata=EventMetadata(
-            service_name=service_name,
-            service_version="1.0.0",
-            correlation_id=correlation_id,
-            user_id=user_id,
-        ),
-        payload={"test": "data", "execution_id": aggregate_id},
-        aggregate_id=aggregate_id,
-    )
+@pytest.fixture
+def make_event() -> Callable[..., Event]:
+    """Factory fixture for Event domain objects."""
+    def _factory(
+        event_id: str,
+        event_type: EventType = EventType.EXECUTION_REQUESTED,
+        aggregate_id: str | None = None,
+        correlation_id: str = "corr-test",
+        user_id: str | None = None,
+        service_name: str = "test-service",
+        timestamp: datetime | None = None,
+    ) -> Event:
+        return Event(
+            event_id=event_id,
+            event_type=event_type,
+            event_version="1.0",
+            timestamp=timestamp or datetime.now(timezone.utc),
+            metadata=EventMetadata(
+                service_name=service_name,
+                service_version="1.0.0",
+                correlation_id=correlation_id,
+                user_id=user_id,
+            ),
+            payload={"test": "data", "execution_id": aggregate_id},
+            aggregate_id=aggregate_id,
+        )
+    return _factory
Then update test signatures to accept make_event fixture and call make_event(...) instead of _make_event(...).
67-91: Consider using exact count assertions for more precise tests.

Line 84 uses assert len(result) >= 3, which is less precise than assert len(result) == 3. While the >= comparison provides flexibility, exact assertions better validate expected behavior and catch regressions. The unique_id fixture suggests test isolation is intended, so exact counts should be safe.

The same applies to line 90's filtering assertion, which should verify the exact filtered count.
♻️ More precise assertions
     # Retrieve all
     result = await repo.get_events_by_aggregate(aggregate_id)
-    assert len(result) >= 3
+    assert len(result) == 3
 
     # Filter by event type
     filtered = await repo.get_events_by_aggregate(
         aggregate_id, event_types=[EventType.EXECUTION_QUEUED]
     )
+    assert len(filtered) == 1
     assert all(e.event_type == EventType.EXECUTION_QUEUED for e in filtered)
93-116: Use exact assertions for pagination test validation.

Lines 109 and 115 use >= comparisons, which reduce test precision. Since the test creates exactly 5 events, assertions should verify exact counts:

Line 109: assert result.total == 5

Line 115: assert len(page2.events) == 2

This ensures pagination logic works correctly and catches any off-by-one errors.
♻️ Exact pagination assertions
     result = await repo.get_events_by_correlation(correlation_id, limit=3, skip=0)
-    assert result.total >= 5
+    assert result.total == 5
     assert len(result.events) == 3
     assert result.has_more is True
 
     # Get second page
     page2 = await repo.get_events_by_correlation(correlation_id, limit=3, skip=3)
-    assert len(page2.events) >= 2
+    assert len(page2.events) == 2
118-139: Verify exact event count for better test precision.

Line 134 should assert all_events.total == 2 since exactly 2 events are stored. This provides stronger validation of the repository behavior.
♻️ Exact count assertion
     # Without filter
     all_events = await repo.get_execution_events(execution_id, exclude_system_events=False)
-    assert all_events.total >= 2
+    assert all_events.total == 2
141-161: Strengthen statistics validation with specific assertions.

The test creates 4 events but only verifies stats.total_events > 0, which is too weak. Consider asserting:

Exact total: assert stats.total_events >= 4

Type counts: verify that events_by_type contains expected counts

Service breakdown: validate events_by_service entries
♻️ Enhanced statistics validation
     stats = await repo.get_event_statistics(
         start_time=now - timedelta(hours=1),
         end_time=now + timedelta(hours=1),
     )
 
-    assert stats.total_events > 0
+    assert stats.total_events >= 4
     assert isinstance(stats.events_by_type, dict)
+    assert EventType.EXECUTION_REQUESTED.value in stats.events_by_type
+    assert EventType.EXECUTION_COMPLETED.value in stats.events_by_type
     assert isinstance(stats.events_by_service, dict)
+    assert "test-service" in stats.events_by_service
191-208: Use exact count assertion.

Line 206 should assert result.total == 3 since exactly 3 events are stored.

210-226: Consider validating aggregation result structure.

The test only checks that results exist (len(result.results) > 0). For more robust validation, assert the expected structure returned by the $group stage, such as verifying that results contain _id and count fields.
♻️ Enhanced aggregation validation
     result = await repo.aggregate_events(pipeline, limit=100)
-    assert len(result.results) > 0
+    assert len(result.results) >= 1
+    # Validate aggregation structure
+    for item in result.results:
+        assert "_id" in item
+        assert "count" in item
+        assert item["count"] >= 1
228-244: Strengthen event type validation.

Line 243 should use assert len(types) == 2 for precision. Additionally, verify that the expected event types are present in the results.
♻️ Precise type validation
     types = await repo.list_event_types(match={"metadata.service_name": service_name})
-    assert len(types) >= 2
+    assert len(types) == 2
+    assert EventType.EXECUTION_REQUESTED.value in types
+    assert EventType.EXECUTION_COMPLETED.value in types
276-300: Use exact count assertion for replay info.

Line 297 should assert info.event_count == 3 since exactly 3 events are stored. This ensures the replay info aggregation is accurate.
♻️ Exact replay count
     info = await repo.get_aggregate_replay_info(aggregate_id)
     assert info is not None
-    assert info.event_count >= 3
+    assert info.event_count == 3
-    assert len(info.event_types) >= 3
+    assert len(info.event_types) == 3
     assert info.start_time <= info.end_time

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00e2f42 and d1da7ed.

📒 Files selected for processing (4)

backend/tests/integration/db/repositories/test_event_repository.py
backend/tests/integration/db/repositories/test_notification_repository.py
backend/tests/unit/events/test_kafka_events.py
backend/tests/unit/services/grafana/test_grafana_alert_processor.py

🧰 Additional context used

🧬 Code graph analysis (2)

backend/tests/integration/db/repositories/test_notification_repository.py (1)

backend/app/db/repositories/notification_repository.py (9)

NotificationRepository (21-233)

get_notification (41-45)

update_notification (30-39)

count_notifications (94-96)

try_claim_pending (104-117)

find_pending_notifications (119-132)

upsert_subscription (163-180)

get_subscription (155-161)

get_all_subscriptions (182-190)

backend/tests/unit/services/grafana/test_grafana_alert_processor.py (1)

backend/app/services/grafana_alert_processor.py (7)

extract_severity (33-37)

map_severity (40-44)

extract_title (47-49)

build_message (52-61)

build_metadata (64-71)

process_single_alert (73-108)

process_webhook (110-150)

🔇 Additional comments (12)

backend/tests/integration/db/repositories/test_notification_repository.py (1)

1-16: LGTM!

The imports, logger setup, and test markers are well-structured and follow best practices for integration tests.

backend/tests/unit/services/grafana/test_grafana_alert_processor.py (10)

1-10: LGTM! Clean imports and test configuration.

The imports are well-organized and the module-level pytestmark ensures all tests in this file are marked as unit tests.

27-34: LGTM! Well-structured fixture composition.

The processor fixture properly combines the mocked dependencies using pytest's fixture composition pattern.

37-74: LGTM! Comprehensive severity extraction tests.

The parametrized tests thoroughly cover precedence rules, case handling, fallback behavior, and default values with clear test IDs.

77-123: LGTM! Thorough severity mapping coverage.

The tests comprehensively validate the severity mapping logic, including status-based overrides for resolved alerts and proper handling of unknown severities.

126-164: LGTM! Complete title extraction test coverage.

The parametrized tests verify all precedence rules and fallback scenarios for title extraction from labels and annotations.

167-207: LGTM! Comprehensive message construction tests.

The tests validate message formatting, including proper joining of summary and description fields, fallback behavior, and default values.

210-241: LGTM! Thorough metadata construction validation.

The tests verify status precedence, label merging behavior, and ensure alert labels correctly override webhook common labels.

244-293: LGTM! Solid single-alert processing tests.

The tests cover both the success path (with detailed argument verification) and error handling, ensuring the processor returns appropriate tuples and logs errors correctly.

296-378: LGTM! Comprehensive webhook processing tests.

The tests thoroughly validate batch processing including success cases, empty lists, partial failures, and logging behavior. The use of a closure with nonlocal in lines 342-346 to simulate partial failures is functional and correctly tests the error resilience.

381-402: LGTM! Useful constant validation tests.

The tests verify that class-level constants contain expected values and provide sensible defaults, serving as documentation and regression protection.

backend/tests/integration/db/repositories/test_event_repository.py (1)

246-264: LGTM!

Excellent test coverage for the archival flow. The test properly validates:

Archived event metadata

Original event removal

Return value structure

backend/tests/integration/db/repositories/test_event_repository.py

backend/tests/unit/events/test_kafka_events.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

.github/workflows/frontend-ci.yml (1)
87-104: Aggressive cache pruning may slow CI.

The docker builder prune -af at line 104 removes all build cache immediately after building. This defeats the purpose of cache-to=type=gha,mode=max since subsequent jobs in the same run won't benefit from the local cache.

Consider removing this step or limiting it to dangling builders only (docker builder prune -f without -a) if disk space is the concern.
🔧 Suggested change
-      - name: Prune Docker build cache
-        run: docker builder prune -af
Or if disk space is a concern:
-        run: docker builder prune -af
+        run: docker builder prune -f
.github/workflows/backend-ci.yml (1)

162-186: Bake-based build for E2E is well configured.

The backend-e2e target with GHA cache matches the pattern used in frontend-ci.yml. Same note about docker builder prune -af potentially being too aggressive applies here.

Consider whether the aggressive cache pruning at line 186 is necessary, as noted in the frontend-ci.yml review.
backend/tests/integration/conftest.py (1)
8-28: Consider extending the test name prefix length.

The current implementation truncates test names to 15 characters (line 20), which could potentially cause collisions if tests have similar prefixes. While the 4-hex-char UUID suffix (65,536 possibilities) provides good collision resistance, increasing the test name length to 20-25 characters would further reduce risk, especially with test parallelization.
♻️ Optional: increase test name prefix length
-    base = f"{request.node.name[:15]}-{uuid.uuid4().hex[:4]}"
+    base = f"{request.node.name[:20]}-{uuid.uuid4().hex[:4]}"
And update the docstring example accordingly:
     Usage:
         def test_something(unique_id):
-            exec_id = unique_id("exec-")  # exec-test_somethin-a1b2-0
-            event_id = unique_id("evt-")  # evt-test_somethin-a1b2-1
+            exec_id = unique_id("exec-")  # exec-test_something_mor-a1b2-0
+            event_id = unique_id("evt-")  # evt-test_something_mor-a1b2-1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1da7ed and 78498f6.

📒 Files selected for processing (7)

.github/workflows/backend-ci.yml
.github/workflows/frontend-ci.yml
backend/tests/integration/conftest.py
backend/tests/integration/db/repositories/test_event_repository.py
backend/tests/integration/events/test_event_store_consumer.py
backend/tests/unit/events/test_kafka_events.py
docker-compose.yaml

🧰 Additional context used

🧬 Code graph analysis (3)

backend/tests/integration/conftest.py (1)

backend/tests/conftest.py (1)

_make (176-177)

backend/tests/integration/events/test_event_store_consumer.py (3)

backend/app/domain/enums/auth.py (1)

LoginMethod (4-10)

backend/app/infrastructure/kafka/events/metadata.py (1)

AvroEventMetadata (9-31)

backend/tests/integration/conftest.py (1)

unique_id (9-28)

backend/tests/unit/events/test_kafka_events.py (7)

backend/app/domain/enums/kafka.py (1)

KafkaTopic (7-53)

backend/app/domain/execution/models.py (1)

ResourceUsageDomain (15-19)

backend/app/infrastructure/kafka/events/base.py (3)

BaseEvent (13-37)

to_dict (32-34)

to_json (36-37)

backend/app/infrastructure/kafka/events/execution.py (5)

ExecutionAcceptedEvent (52-58)

ExecutionQueuedEvent (61-66)

ExecutionRunningEvent (69-74)

ExecutionStartedEvent (77-83)

ExecutionTimeoutEvent (108-115)

backend/app/infrastructure/kafka/events/metadata.py (2)

AvroEventMetadata (9-31)

ensure_correlation_id (28-31)

backend/app/services/coordinator/queue_manager.py (2)

user_id (34-35)

execution_id (30-31)

backend/app/domain/events/event_models.py (1)

correlation_id (60-61)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (38)

.github/workflows/frontend-ci.yml (5)

8-9: LGTM!

Adding docker-compose.yaml and docker-bake.hcl to path triggers ensures the frontend CI runs when shared infrastructure files change, which is appropriate given the E2E tests depend on these.

Also applies to: 15-16

20-25: LGTM!

Centralizing image versions as environment variables improves maintainability and ensures consistent versioning across workflow steps.

116-147: Kubeconfig generation looks correct.

The approach of extracting certs from k3s and generating a kubeconfig with embedded certificate data for container access is sound. Using envsubst with a heredoc and then stripping leading whitespace is a reasonable pattern.

149-175: LGTM — staged startup with dependency ordering.

The sequenced startup (certs → infra → backend/workers → frontend) with explicit waits ensures services are ready before dependents start. The --no-build flag correctly relies on pre-built bake images.

190-198: LGTM!

Simplified log collection using docker compose logs is cleaner and captures all relevant services for debugging failures.

docker-compose.yaml (8)

18-18: LGTM — explicit image declarations.

Adding explicit image: declarations (e.g., integr8scode-backend:latest, integr8scode-frontend:latest) enables the bake-based workflow where images are pre-built and loaded, then referenced by compose without rebuilding.

Also applies to: 76-76, 128-128

53-56: Healthcheck tuning looks reasonable.

Reducing MongoDB's healthcheck interval to 5s with 10 retries and a 10s start period provides faster failure detection while still allowing adequate startup time.

70-73: Redis healthcheck appropriately aggressive.

The 3s interval with 10 retries is suitable for Redis, which typically starts very quickly.

254-257: Zookeeper healthcheck has adequate start period.

15s start period with 15 retries at 5s intervals gives Zookeeper up to 90s total to become healthy, which should be sufficient.

299-301: Good use of environment variable substitution with default.

Using ${KAFKA_HEAP_OPTS:--Xms2G -Xmx2G} allows CI to override with smaller heap sizes while maintaining production defaults.

315-318: Kafka healthcheck tuning is appropriate.

The 5s interval with 20 retries and 20s start period (up to 120s total) should handle Kafka's variable startup time.

336-339: Schema Registry healthcheck appropriately configured.

15 retries at 5s intervals with 10s start period is reasonable for Schema Registry's dependency on Kafka being healthy first.

358-358: Worker service images properly declared.

All worker services now have explicit image declarations matching the bake targets, ensuring consistency between build and runtime.

Also applies to: 382-382, 405-405, 437-437, 476-476, 512-512, 550-550, 601-601, 633-633

.github/workflows/backend-ci.yml (9)

8-9: LGTM!

Path triggers correctly include shared infrastructure files.

Also applies to: 15-16

24-25: Good alignment with docker-compose.yaml.

Using Confluent Kafka/Zookeeper images (7.5.0) matches the docker-compose.yaml definitions, ensuring consistency between local dev and CI.

77-77: LGTM — Zookeeper added to cache.

Including the Zookeeper image in the cache action reduces pull times in subsequent runs.

91-98: Simplified integration infra startup.

Starting only the required infra services (no workers) with KAFKA_HEAP_OPTS override is appropriate for integration tests that don't need the full stack.

144-146: LGTM — cleaner log collection.

Using docker compose logs <service> pattern is consistent and easier to maintain.

Also applies to: 152-152

200-243: K3s setup and kubeconfig generation is correct.

The approach mirrors frontend-ci.yml with proper namespace creation and certificate embedding for containerized workers.

245-269: Staged service startup properly ordered.

Certificate generation → infra → backend/workers sequence ensures dependencies are met before services start.

278-289: E2E environment correctly configured for containerized backend.

Empty KAFKA_TOPIC_PREFIX and SCHEMA_SUBJECT_PREFIX make sense when tests run against the same Kafka instance as containerized services. Disabling CA bundle verification via empty env vars is a pragmatic approach for self-signed certs.

Verify that disabling SSL verification via REQUESTS_CA_BUNDLE="" and CURL_CA_BUNDLE="" works correctly with the Python requests library and any curl commands in tests. Some versions require explicit verify=False in code rather than empty env vars.

307-318: Enhanced failure logging improves debuggability.

Capturing logs from individual workers (saga-orchestrator, k8s-worker, pod-monitor) and executor pods provides better visibility into failures.

Also applies to: 324-324

backend/tests/integration/events/test_event_store_consumer.py (3)

2-2: LGTM! Clean migration to DI-based test infrastructure.

The addition of AsyncContainer, unique_id, and backoff imports aligns with the broader test infrastructure modernization. The updated test signature properly types the new fixtures.

Also applies to: 4-4, 7-7, 16-16, 24-24

37-37: LGTM! Improved test isolation.

Using unique_id("u-") provides better test isolation compared to UUID generation, ensuring unique identifiers scoped to this test.

61-66: No issues found. The backoff library (version 2.2.1) supports decorating async functions with backoff.on_exception, and interval is the correct parameter name for the backoff.constant strategy. The implementation is correct.

backend/tests/unit/events/test_kafka_events.py (7)

59-78: LGTM! Well-structured fixtures.

The metadata and resource_usage fixtures provide clean, reusable test data. This promotes DRY principles across the test suite.

81-119: LGTM! Comprehensive metadata testing.

The test coverage for AvroEventMetadata is thorough, including:

Default UUID generation

Immutability verification

Edge case handling (empty string correlation_id)

The tests correctly verify that helper methods return new instances without mutating originals.

122-208: LGTM! Solid base event testing.

The tests verify core BaseEvent behavior including UUID generation, timestamp handling, and serialization. The explicit event creation in each test enhances readability and makes tests self-contained.

210-338: LGTM! Efficient execution event testing.

The parameterized approach efficiently validates event types and topics across multiple execution events. Individual tests appropriately cover event-specific fields like error_type and resource_usage.

340-493: LGTM! Comprehensive saga event coverage.

The tests effectively cover both saga lifecycle events and command events, correctly validating the topic separation between SAGA_EVENTS and SAGA_COMMANDS.

495-654: LGTM! Complete notification and user event testing.

Both test classes provide thorough coverage. The distinction between USER_EVENTS and USER_SETTINGS_EVENTS topics is correctly validated (line 652).

656-717: LGTM! Excellent serialization edge case coverage.

These tests validate critical serialization scenarios:

Complex nested structures (pod_spec, runtime_command)

Unicode/emoji handling (important for internationalization)

Null/None optional fields

This prevents common serialization pitfalls.

backend/tests/integration/db/repositories/test_event_repository.py (5)

17-40: LGTM! Clean event factory for tests.

The _make_event helper provides a flexible way to create test events with sensible defaults. The parameterization supports various test scenarios.

43-65: LGTM! Solid CRUD test coverage.

Basic repository operations are properly tested with appropriate assertions. The use of unique_id ensures test isolation.

67-139: LGTM! Thorough filtering and pagination tests.

The tests appropriately:

Use >= comparisons to allow for test parallelism

Verify pagination metadata (has_more, total)

Test filtering logic (event types, system events)

141-245: LGTM! Comprehensive query and aggregation testing.

The tests cover:

Statistical aggregations with time ranges

User-scoped queries with type filtering

Arbitrary MongoDB query filters

Aggregation pipelines

Distinct value queries

247-308: LGTM! Complete deletion and replay info testing.

The tests verify:

Archival metadata preservation

Post-deletion state

Replay info calculations (event counts, time ranges, event types)

Missing data handling

backend/tests/integration/conftest.py (1)

10-19: LGTM! Clear documentation and good design.

The docstring provides a clear usage example. The fixture design promotes test isolation through unique ID generation rather than cleanup, which is more robust for parallel test execution.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/frontend-ci.yml:
- Around line 20-26: The Confluent Platform images pinned in the workflow
(ZOOKEEPER_IMAGE, KAFKA_IMAGE, SCHEMA_REGISTRY_IMAGE) are vulnerable; update
those env values to patched tags (at minimum use
confluentinc/cp-zookeeper:7.5.7, confluentinc/cp-kafka:7.5.7,
confluentinc/cp-schema-registry:7.5.7) or switch all three to the supported
8.1.x line (e.g., confluentinc/cp-zookeeper:8.1.0, confluentinc/cp-kafka:8.1.0,
confluentinc/cp-schema-registry:8.1.0) so the workflow uses non-vulnerable
images. Ensure any dependent configs or compatibility constraints are validated
after bumping the image tags.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78498f6 and 5f5e025.

📒 Files selected for processing (3)

.github/workflows/backend-ci.yml
.github/workflows/frontend-ci.yml
backend/tests/integration/events/test_event_store_consumer.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/tests/integration/events/test_event_store_consumer.py (7)

backend/app/domain/enums/auth.py (1)

LoginMethod (4-10)

backend/app/domain/enums/kafka.py (1)

KafkaTopic (7-53)

backend/app/events/event_store.py (1)

EventStore (29-330)

backend/app/events/event_store_consumer.py (1)

EventStoreConsumer (17-156)

backend/app/events/schema/schema_registry.py (2)

SchemaRegistryManager (54-230)

initialize_event_schemas (239-240)

backend/app/infrastructure/kafka/events/metadata.py (1)

AvroEventMetadata (9-31)

backend/tests/integration/conftest.py (1)

unique_id (9-28)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (20)

.github/workflows/backend-ci.yml (10)

8-9: LGTM!

Good addition of Docker-related paths to ensure CI runs when build configuration changes.

Also applies to: 15-16

24-25: LGTM!

Consistent use of Confluent Platform 7.5.0 across Zookeeper, Kafka, and Schema Registry ensures compatibility.

77-77: LGTM!

Cache list correctly updated to include the new Zookeeper image.

92-98: LGTM!

Good practices: memory limits for Kafka prevent CI OOM issues, explicit service list ensures only required infra starts for integration tests, and the 180s timeout provides adequate startup buffer.

144-152: LGTM!

Comprehensive log collection with distinct artifact naming aids debugging.

162-186: LGTM with a note on cache pruning.

The Buildx/bake setup with GHA caching is well-configured. Note that docker builder prune -af removes all build cache aggressively—this is acceptable in isolated CI runners for disk space management, but be aware it may increase build times if layers need rebuilding later in the same job (though unlikely given the workflow structure).

200-211: LGTM!

Well-structured K8s setup: --tls-san host.docker.internal ensures container workers can authenticate to the k3s API server, and the kubeconfig sed replacement is the correct approach for container-to-host networking.

221-237: Well-structured service startup sequence.

The ordered startup (certs → infra → backend/workers) ensures dependencies are satisfied. The file existence check for certificates is acceptable, though if you experience intermittent failures, consider adding a brief sleep after detection or checking file size to ensure the write is complete.

246-257: LGTM!

Empty topic/subject prefixes are correct for E2E tests against containerized services. Setting REQUESTS_CA_BUNDLE and CURL_CA_BUNDLE to empty strings effectively disables SSL verification for self-signed certs—this is acceptable for CI, and the inline comments document the intent clearly.

279-292: LGTM!

Comprehensive log collection covering Docker services, Kubernetes events, and executor pods provides excellent debugging context for E2E failures.

backend/tests/integration/events/test_event_store_consumer.py (4)

2-16: LGTM! Import updates support the test refactor.

The new imports (Callable, backoff, AsyncContainer) properly support the refactored test infrastructure for dependency injection, test isolation, and improved polling mechanisms.

24-25: LGTM! Test signature properly updated for DI and isolation.

The xdist_group marker ensures safe parallel execution for this Kafka/MongoDB integration test, while the new parameters (scope, unique_id) enable dependency injection and test isolation.

38-38: LGTM! Using unique_id for test isolation.

The unique_id("u-") call generates a unique user ID for each test run, preventing collisions in parallel execution scenarios.

62-67: Backoff async implementation is correct and well-suited for integration testing.

The backoff-based polling is a solid improvement with appropriate timeout (30s) and interval (0.3s) parameters for this integration test. Async function decoration with on_exception is fully supported in backoff 2.2.1, and the implementation is correct. The assertion message aids debugging.

.github/workflows/frontend-ci.yml (6)

8-9: LGTM!

Adding docker-compose.yaml and docker-bake.hcl to the trigger paths ensures the workflow runs when these infrastructure files change, which is correct given the workflow's dependency on them.

Also applies to: 15-16

70-70: LGTM!

The images list correctly includes all the newly defined environment variables, ensuring proper caching of all required Docker images.

114-116: LGTM!

The idempotent namespace creation and kubeconfig generation with host.docker.internal correctly enable container-to-k3s communication. The approach aligns with the Docker-based testing setup.

163-166: LGTM!

The simplified logging approach using docker compose logs is cleaner and more maintainable, properly capturing logs from all services and Kubernetes events.

87-104: Good modernization to Docker Bake with efficient caching.

The migration to docker/bake-action with GitHub Actions cache is a solid improvement. The configuration properly loads images and uses mode=max for comprehensive layer caching. The docker-bake.hcl file correctly defines the all target with all required services (base, backend, coordinators, workers, certificate generators, and frontend), and the HCL configuration properly caches the base layer to optimize rebuild times.

118-144: Certificate path and error handling are correctly configured.

The certificate path ./backend/certs:/backend-certs is correctly mapped in docker-compose.yaml, and error handling is properly implemented through Docker Compose's service_completed_successfully dependency conditions. Both backend and frontend services won't start if cert-generator fails, preventing cascading failures.

The manual timeout 60 wait for backend/certs/server.key is somewhat redundant with Docker Compose's dependency management but provides explicit synchronization before infrastructure startup. Consider whether the 60-second timeout is sufficient for your certificate generation complexity, and verify it doesn't cause issues in slow CI environments.

.github/workflows/frontend-ci.yml

sonarqubecloud · 2026-01-08T21:02:17Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

HardMax71 · 2026-01-08T21:39:36Z

wont be working

event repo update

f5fabe3

cubic-dev-ai bot reviewed Jan 6, 2026

View reviewed changes

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

backend/app/db/repositories/event_repository.py Show resolved Hide resolved

HardMax71 added 2 commits January 6, 2026 23:44

small bug fixes (domain vs doc inconsistency) + removed unused function

4c0ea42

flaky test fix

6ddf1e5

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 6, 2026

View reviewed changes

backend/app/dlq/manager.py Show resolved Hide resolved

idempotent logic in manager: skipping any alr existing doc, adding on…

4afcf95

…ly new ones

cubic-dev-ai bot reviewed Jan 6, 2026

View reviewed changes

backend/app/dlq/manager.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

backend/app/dlq/manager.py Show resolved Hide resolved

HardMax71 added 4 commits January 7, 2026 00:20

idempotent logic in manager: atomic upserts

f92148f

idempotent logic in manager: atomic upserts

34fa313

idempotent logic in manager: terminal states ignored

8c47240

idempotent logic in manager: terminal states ignored

45e7115

cubic-dev-ai bot reviewed Jan 6, 2026

View reviewed changes

backend/app/dlq/manager.py Show resolved Hide resolved

HardMax71 added 3 commits January 7, 2026 01:01

updated tests, trying to fix flakiness

297f9ef

other flaky tests

3d52bf3

faker redis lib added, also removed exclusion of mypy+ruff for /tests

1ac646f

cubic-dev-ai bot reviewed Jan 7, 2026

View reviewed changes

backend/tests/integration/app/test_main_app.py Outdated Show resolved Hide resolved

backend/tests/integration/services/idempotency/test_redis_repository.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

further fixes

6d49185

cubic-dev-ai bot reviewed Jan 7, 2026

View reviewed changes

backend/tests/integration/test_notifications_routes.py Outdated Show resolved Hide resolved

backend/tests/integration/test_saga_routes.py Outdated Show resolved Hide resolved

backend/tests/conftest.py Show resolved Hide resolved

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

further fixes

c16a8dd

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

further fixes

3266dcf

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

further fixes

8d80def

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

backend/tests/e2e/conftest.py Outdated Show resolved Hide resolved

HardMax71 added 2 commits January 8, 2026 17:00

further fixes

506dcf1

dumb error fix

1b4a85b

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

removed cleanup - test workers use separate dbs

cf513f8

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

backend/tests/conftest.py Show resolved Hide resolved

backend/tests/e2e/conftest.py Outdated Show resolved Hide resolved

backend/tests/integration/conftest.py Show resolved Hide resolved

further fixes

8c53c26

cubic-dev-ai bot reviewed Jan 8, 2026

View reviewed changes

backend/tests/conftest.py Outdated Show resolved Hide resolved

further fixes

1017815

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

.github/workflows/frontend-ci.yml Outdated Show resolved Hide resolved

further fixes

b011e5e

cubic-dev-ai bot reviewed Jan 8, 2026

View reviewed changes

.github/workflows/backend-ci.yml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

.github/workflows/backend-ci.yml Show resolved Hide resolved

further fixes

4a00f3c

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

HardMax71 added 2 commits January 8, 2026 19:56

urllib 2.6.2 -> 2.6.3

e2153bf

more tests

00e2f42

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 8, 2026

View reviewed changes

mypy

d1da7ed

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

backend/tests/integration/db/repositories/test_event_repository.py Show resolved Hide resolved

backend/tests/unit/events/test_kafka_events.py Outdated Show resolved Hide resolved

HardMax71 added 2 commits January 8, 2026 20:58

invalid yamls

d2bde04

fixes

78498f6

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

fixes

5f5e025

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

.github/workflows/frontend-ci.yml Show resolved Hide resolved

fixes

229e473

HardMax71 closed this Jan 8, 2026

chore: added/updated tests for backend #72

chore: added/updated tests for backend #72

Uh oh!

Conversation

HardMax71 commented Jan 6, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

codecov-commenter commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

HardMax71 commented Jan 6, 2026 •

edited by cubic-dev-ai bot

Loading

coderabbitai bot commented Jan 6, 2026 •

edited

Loading

codecov-commenter commented Jan 6, 2026 •

edited

Loading