fixes #73

HardMax71 · 2026-01-08T22:24:21Z

Summary by cubic

Added CSRF protection middleware and expanded Settings-driven dependency injection across backend, events, and metrics to improve consistency and test stability. Tracing is gated in tests, metrics are skipped without an OTLP endpoint, Kafka/schema prefixes are standardized via Settings, and Kafka consumer tests run serially.

Bug Fixes
- DLQ retry policies normalize topic names to ignore isolation prefixes, so policies match correctly.
Refactors
- Centralized config via Settings and DI (producer, EventBus/KafkaEventService, SSE bridge, schema registry, admin utils, consumer group monitor, metrics, PodMonitor/K8s clients/watch); added CSRFMiddleware; always use Settings for Kafka bootstrap servers; registered SagaOrchestratorProvider; switched to app factory for server startup.
- SecurityService via DI with bcrypt rounds from Settings; routes/services inject Settings/SecurityService; tracing only initializes when ENABLE_TRACING and not TESTING; metrics require OTLP endpoint and use DI with context initialized at app startup; schema subject prefix and Kafka group suffix read from Settings; tests add xdist groups and authenticated fixtures with CSRF-aware helpers to remove duplicate logins; SSE tests use service-driven streaming with timeouts.

^{Written for commit 726e2f9. Summary will update on new commits.}

Summary by CodeRabbit

Refactor
- App components now receive centralized settings; tracing and metrics initialize only when configured; CSRF validation unified; saga orchestration wired at startup; bcrypt rounds made configurable.
New Features
- Global CSRF middleware added; new test helper to register/login and obtain CSRF tokens.
Tests
- Tests use fixture-driven authenticated clients; SSE tests moved to streaming/service-driven patterns; Kafka consumer tests serialized across workers.
Chores
- Test environment defaults updated; CI test reporting enhanced.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-08T22:24:33Z

📝 Walkthrough

Walkthrough

Inject Settings via DI across app and services; centralize test_settings in tests; introduce SecurityService CSRF methods and CSRFMiddleware; propagate Settings into metrics, tracing, Kafka/SSE, providers, pod-monitor; convert tests to fixture-driven authenticated AsyncClient and tighten typing.

Changes

Cohort / File(s)	Summary
Test config & fixtures `backend/tests/conftest.py`, `backend/.env.test`, `backend/pyproject.toml`	Centralize test Settings (_WORKER_ID + session UUID), add `test_settings` fixture, replace per-worker env orchestration, add authenticated client helper and `test_user`/`test_admin`/`another_user` fixtures; update pytest config and remove pytest-env. Attention: test bootstrap and CSRF header handling.
Settings & seed `backend/app/settings.py`, `backend/scripts/seed_users.py`, `backend/.env`	Add `BCRYPT_ROUNDS`, `SCHEMA_SUBJECT_PREFIX`, `ENVIRONMENT`; seed script uses pydantic `SeedSettings` instead of os.getenv; update test env values.
Security & CSRF `backend/app/core/security.py`, `backend/app/core/middlewares/csrf.py`, `backend/app/api/routes/auth.py`, `backend/app/core/middlewares/__init__.py`	SecurityService now requires Settings, owns CSRF validation (`validate_csrf_from_request`); new `CSRFMiddleware` added and exported; auth endpoints accept injected SecurityService/Settings. Attention: middleware integration and exempt paths.
Providers / DI / Container `backend/app/core/providers.py`, `backend/app/core/container.py`	Propagate `Settings` and `SecurityService` through providers; update many provider signatures to accept settings/security; register SagaOrchestratorProvider in container.
Metrics & tracing `backend/app/core/dishka_lifespan.py`, `backend/app/core/middlewares/metrics.py`, `backend/app/core/metrics/base.py`, `backend/app/core/tracing/config.py`, `backend/app/core/metrics/context.py`, `backend/tests/unit/conftest.py`, `backend/app/main.py`	Metrics/tracing initialization now uses injected Settings; metrics classes accept Settings; MetricsContext requires explicit initialize_all (get() raises if uninitialized). Attention: OTLP enablement now driven by settings and endpoint presence.
Kafka / events / schema `backend/app/events/*`, `backend/app/events/schema/schema_registry.py`, `backend/app/events/core/producer.py`	Move topic/group/subject prefixes and bootstrap servers into Settings; UnifiedProducer and related consumers/monitors accept Settings; schema subject prefix from Settings.
SSE / Event bus / Kafka services `backend/app/services/sse/kafka_redis_bridge.py`, `backend/app/services/event_bus.py`, `backend/app/services/kafka_event_service.py`	SSE bridge, EventBus, KafkaEventService require injected Settings; group/client id and producer construction use Settings-sourced suffixes/prefixes.
Pod monitor & K8s clients `backend/app/core/k8s_clients.py`, `backend/app/services/pod_monitor/*`, `backend/tests/helpers/k8s_fakes.py`, `backend/app/services/pod_monitor/monitor.py`	Add `watch` to K8sClients, PodMonitor now requires injected `k8s_clients` and `PodEventMapper`; factory supports creation/cleanup; k8s fakes expanded for DI-friendly tests. Attention: lifecycle/ownership and close_k8s_clients.
Workers, startup & main `backend/workers/*`, `backend/app/main.py`, `backend/Dockerfile`	Pass Settings into tracing init calls; register CSRFMiddleware; metrics wiring updated to accept Settings; Gunicorn switched to factory mode.
Tests: auth, typing & flows many `backend/tests/integration/`, `backend/tests/e2e/`, many `backend/tests/unit/*`	Replace inline login with authenticated AsyncClient fixtures, add extensive typing (AsyncContainer, Settings, redis.Redis), add xdist_group markers for Kafka consumers, refactor SSE tests to streaming/service-driven style. Attention: many signatures changed—watch fixture consumers and test imports.
Load & helpers `backend/tests/load/`, `backend/tests/helpers/`, `backend/tests/helpers/kafka.py`, `backend/tests/helpers/k8s_fakes.py`, `backend/tests/helpers/sse.py`, `backend/tests/helpers/auth.py`	Convert LoadConfig to Pydantic BaseSettings; narrow helpers (eventually -> async-only), richer k8s fake watch, add `login_user` helper and re-exported helper API. Attention: helper return types and auth helper usage.
Metrics tests & unit init `backend/tests/unit/conftest.py`, `backend/tests/unit/core/metrics/*`	Unit test conftest initializes MetricsContext.initialize_all with Settings; all unit metric tests updated to construct metrics with Settings.
Misc: services, repos, configs assorted `backend/app/services/`, `backend/app/db/repositories/`, `backend/app/core/*`	Many constructors and signatures updated to accept Settings and/or SecurityService (AuthService, AdminUserService, repositories, replay service, event bus manager, etc.). Attention: DI call sites and provider updates.
CI & packaging `.github/workflows/backend-ci.yml`, `backend/pyproject.toml`	Add pytest durations in CI, add pydantic-mypy config, remove `pytest-env` dev dependency.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant TR as Test Runner
  participant TS as test_settings (Settings)
  participant App as FastAPI App
  participant DI as DI Container / Providers
  participant Sec as SecurityService
  participant Client as Test AsyncClient
  participant Prod as UnifiedProducer
  participant Kafka as Kafka Cluster

  TR->>TS: instantiate session-scoped Settings (.env.test)
  TR->>App: start app with test_settings=TS
  App->>DI: build container/providers using TS
  DI->>Sec: construct SecurityService(settings=TS)
  App->>App: register CSRFMiddleware (uses SecurityService)
  Client->>App: POST /api/v1/auth/register (username,password)
  App->>Sec: hash password / create user credentials
  App->>Client: return set-cookie + CSRF token
  Client->>App: POST protected endpoint with X-CSRF-Token header
  App->>Sec: validate CSRF via validate_csrf_from_request
  App->>Prod: produce event (topic prefix from TS)
  Prod->>Kafka: send message
  Kafka-->>Prod: ack
  App->>Client: respond to API request

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: tests fix/update #60 — overlaps on backend/tests/conftest.py fixture/settings migration and authenticated client helper changes.
chore: speed-up of CI tests #46 — related fixture rewiring and DI propagation of Settings/SecurityService.
feat: type checking with mypy in strict mode #39 — related propagation of Settings into providers and services (metrics, Kafka, event replay).

Suggested labels

enhancement

Poem

🐇 I nibbled envs, stitched one Settings thread,
Cookies and CSRF tucked safe in my bed.
Fixtures now hop with tokens in hand,
Metrics and Kafka all follow my band.
A tiny rabbit cheers — tests pass, overhead.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'fixes' is vague and does not convey meaningful information about the changeset. It uses a non-descriptive term that fails to summarize the primary changes.	Revise the title to be specific and descriptive. Examples: 'Add CSRF protection and Settings-driven dependency injection' or 'Centralize configuration via Settings-based DI for security, metrics, and Kafka'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

No issues found across 1 file

codecov-commenter · 2026-01-08T22:25:52Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 83.42246% with 31 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/core/providers.py	73.17%	11 Missing ⚠️
backend/app/core/middlewares/csrf.py	75.86%	7 Missing ⚠️
backend/app/core/dishka_lifespan.py	14.28%	6 Missing ⚠️
backend/app/core/security.py	90.90%	2 Missing ⚠️
backend/app/core/tracing/config.py	33.33%	2 Missing ⚠️
backend/app/core/metrics/context.py	0.00%	1 Missing ⚠️
backend/app/events/admin_utils.py	75.00%	1 Missing ⚠️
backend/app/events/consumer_group_monitor.py	75.00%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Flag	Coverage Δ
backend-e2e	`55.37% <62.90%> (+1.26%)`	⬆️
backend-integration	`72.80% <72.04%> (+0.79%)`	⬆️
backend-unit	`59.72% <54.83%> (-0.22%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
backend/app/api/routes/auth.py	`92.30% <100.00%> (-0.10%)`	⬇️
backend/app/api/routes/events.py	`72.89% <100.00%> (+0.42%)`	⬆️
backend/app/api/routes/execution.py	`77.45% <100.00%> (-0.22%)`	⬇️
backend/app/core/adaptive_sampling.py	`76.92% <100.00%> (+0.45%)`	⬆️
backend/app/core/container.py	`65.00% <ø> (ø)`
backend/app/core/k8s_clients.py	`62.96% <100.00%> (+14.96%)`	⬆️
backend/app/core/metrics/base.py	`96.77% <100.00%> (+11.48%)`	⬆️
backend/app/core/middlewares/__init__.py	`100.00% <100.00%> (ø)`
backend/app/core/middlewares/metrics.py	`95.45% <100.00%> (+22.42%)`	⬆️
...app/db/repositories/admin/admin_user_repository.py	`91.66% <100.00%> (ø)`
... and 22 more

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @backend/tests/conftest.py:
- Around line 73-78: The app fixture currently instantiates TestSettings()
directly instead of using the test_settings fixture that generates the unique
Kafka topic prefix; update the app fixture to accept the test_settings fixture
(i.e., add test_settings as a parameter to the app fixture) and use that
TestSettings instance when constructing the FastAPI application so the app (and
derived fixtures like client, scope, db) inherit the unique KAFKA_TOPIC_PREFIX
rather than a fresh TestSettings().

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd418c4 and 728743c.

📒 Files selected for processing (1)

backend/tests/conftest.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Frontend
GitHub Check: Integration Tests
GitHub Check: E2E Tests

backend/tests/conftest.py

cubic-dev-ai

3 issues found across 11 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/services/sse/kafka_redis_bridge.py">

<violation number="1" location="backend/app/services/sse/kafka_redis_bridge.py:65">
P1: Missing separator between base group_id and suffix. The original code used `.` separator for group_id and `-` for client_id. This is inconsistent with all other usages in the codebase (e.g., `coordinator.py`, `saga_orchestrator.py`, `notification_service.py`) which all use `f"{base}.{suffix}"` pattern.</violation>
</file>

<file name="backend/tests/conftest.py">

<violation number="1" location="backend/tests/conftest.py:25">
P2: Session ID doesn't check `PYTEST_SESSION_ID` environment variable. Consider using `os.environ.get("PYTEST_SESSION_ID") or uuid.uuid4().hex[:8]` to match the PR description and preserve the ability for CI to inject a consistent session ID.</violation>
</file>

<file name="backend/app/core/middlewares/metrics.py">

<violation number="1" location="backend/app/core/middlewares/metrics.py:125">
P2: Removing `OTEL_SDK_DISABLED` check breaks compatibility with the standard OpenTelemetry configuration. This environment variable is an official OTel SDK option to disable all signals. Users who have set `OTEL_SDK_DISABLED=true` (common in local dev or specific deployments) will unexpectedly have metrics enabled after this change, potentially causing connection timeouts or overhead. Consider keeping the standard env var check alongside `settings.TESTING`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/services/sse/kafka_redis_bridge.py

backend/tests/conftest.py

backend/app/core/middlewares/metrics.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/core/middlewares/metrics.py (1)

123-136: Fix misleading comment + unreachable service.environment="test" branch

Because you return on settings.TESTING (Line 125-127), "service.environment": "test" if settings.TESTING else "production" (Line 134) will always resolve to "production". Also, the comment “or when explicitly disabled” is no longer true.

Proposed fix

 def setup_metrics(app: FastAPI, logger: logging.Logger) -> None:
     """Set up OpenTelemetry metrics with OTLP exporter."""
     settings = get_settings()
-    # Fast opt-out for tests or when explicitly disabled
+    # Fast opt-out for tests
     if settings.TESTING:
         logger.info("OpenTelemetry metrics disabled (TESTING)")
         return

     # Configure OpenTelemetry resource
+    # NOTE: If you want "test" here, you can't return early above.
     resource = Resource.create(
         {
             SERVICE_NAME: settings.PROJECT_NAME,
             SERVICE_VERSION: "1.0.0",
-            "service.environment": "test" if settings.TESTING else "production",
+            "service.environment": "production",
         }
     )

🤖 Fix all issues with AI agents

In @backend/app/core/middlewares/metrics.py:
- Around line 138-142: The fallback endpoint for OTLP metrics uses a bare
"localhost:4317"; change the default to include the HTTP scheme by setting
endpoint = settings.OTEL_EXPORTER_OTLP_ENDPOINT or "http://localhost:4317" so
OTLPMetricExporter(endpoint=endpoint, insecure=True) uses the explicit
"http://..." format (refer to the endpoint variable and OTLPMetricExporter in
metrics.py and settings.OTEL_EXPORTER_OTLP_ENDPOINT).

In @backend/app/services/sse/kafka_redis_bridge.py:
- Around line 64-66: Update the group naming in kafka_redis_bridge.py to include
the dot separator before the KAFKA_GROUP_SUFFIX: change how group_id and
client_id are constructed so they append a "." before
self.settings.KAFKA_GROUP_SUFFIX (i.e., adjust the expressions that set group_id
and client_id to use a dot separator), ensuring they match the convention used
elsewhere (see group_id and client_id variables in this file and the
KAFKA_GROUP_SUFFIX setting).

In @backend/scripts/seed_users.py:
- Around line 28-35: The file docstring's Environment Variables section is
missing DATABASE_NAME; update the top-level docstring to document DATABASE_NAME
(matching SeedSettings.database_name) as an available env var, include a short
description like "database name for the application" and its default value
("integr8scode_db"), and ensure the name matches the code (DATABASE_NAME) so
environment users can find and override SeedSettings.database_name.

🧹 Nitpick comments (2)

backend/app/core/dishka_lifespan.py (1)
54-54: Minor: enable_console_exporter parameter is always False in this context.

Since Line 48 already ensures not settings.TESTING, the enable_console_exporter=settings.TESTING parameter on Line 54 will always evaluate to False. This parameter could be simplified to enable_console_exporter=False for clarity.
♻️ Simplify the parameter
-            enable_console_exporter=settings.TESTING,
+            enable_console_exporter=False,
backend/tests/conftest.py (1)
22-37: Strong test isolation implementation.

The test_settings fixture correctly implements per-session and per-worker isolation by:

Loading base configuration from .env.test

Generating unique identifiers (session_id, worker-specific values)

Creating isolated resources (DATABASE_NAME, REDIS_DB, Kafka topics/groups)

Using Pydantic's model_copy for clean Settings customization

The worker_num calculation at line 27 (sum(_WORKER_ID.encode()) % 16) is a simple deterministic hash to distribute workers across Redis databases 0-15. While unconventional, it's functional and consistent.
Optional: Add clarifying comment for worker_num calculation
+    # Deterministic mapping of worker ID to Redis DB (0-15)
     worker_num = sum(_WORKER_ID.encode()) % 16

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1fcdd9 and 9c2b6e8.

📒 Files selected for processing (11)

backend/.env.test
backend/app/core/dishka_lifespan.py
backend/app/core/middlewares/metrics.py
backend/app/events/schema/schema_registry.py
backend/app/services/sse/kafka_redis_bridge.py
backend/app/settings.py
backend/pyproject.toml
backend/scripts/seed_users.py
backend/tests/conftest.py
backend/tests/integration/events/test_admin_utils.py
backend/tests/load/config.py

💤 Files with no reviewable changes (1)

backend/.env.test

🧰 Additional context used

🧬 Code graph analysis (4)

backend/tests/integration/events/test_admin_utils.py (2)

backend/tests/conftest.py (2)

app (42-53)

test_settings (22-37)

backend/app/settings.py (1)

Settings (11-161)

backend/app/core/dishka_lifespan.py (2)

backend/app/core/tracing/config.py (1)

init_tracing (177-197)

backend/app/core/tracing/models.py (2)

has_failures (61-63)

get_summary (57-59)

backend/tests/load/config.py (2)

frontend/src/lib/api/core/params.gen.ts (1)

Field (5-36)

backend/app/schemas_pydantic/replay.py (1)

duration_seconds (57-60)

backend/tests/conftest.py (2)

backend/app/settings.py (1)

Settings (11-161)

backend/app/main.py (1)

create_app (44-127)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Scan Backend
GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (12)

backend/pyproject.toml (1)

213-213: Change looks good—OpenTelemetry is properly disabled during tests.

The TESTING flag is correctly read from the environment variable set in pytest configuration and explicitly disables OpenTelemetry metrics in backend/app/core/middlewares/metrics.py (line 125) and prevents tracing initialization in backend/app/core/dishka_lifespan.py (line 48). This centralized approach is cleaner than the previous OTEL_SDK_DISABLED flag and properly prevents telemetry overhead during test runs.

backend/scripts/seed_users.py (1)

81-83: LGTM! Clean migration to Settings pattern.

The refactor from os.getenv() to Pydantic BaseSettings is well-executed. The settings instance is created once per invocation, and all configuration values (URLs, database name, passwords) are now sourced consistently from the SeedSettings object.

Also applies to: 95-95, 105-105

backend/app/core/dishka_lifespan.py (1)

47-72: LGTM! Tracing isolation for tests looks correct.

The conditional initialization of tracing based on ENABLE_TRACING and not TESTING appropriately prevents exporter retries during tests, which aligns with the PR's goal of fixing flaky tests. The failure checking and separate logging paths are well-structured.

backend/tests/load/config.py (2)

12-42: LGTM! Clean migration to Pydantic BaseSettings.

The conversion from dataclass to Pydantic BaseSettings is well-executed:

Environment variable loading with LOAD_ prefix is correctly configured

validation_alias is appropriately used for backward compatibility with existing env var names

All fields have explicit types and sensible defaults

The api() method logic remains unchanged

This refactoring improves maintainability and aligns with the broader PR pattern of centralizing configuration through Settings objects.

5-6: pydantic-settings is already in project dependencies.

pydantic-settings==2.5.2 is already declared in backend/pyproject.toml, so no further action is needed.

Likely an incorrect or invalid review comment.

backend/app/events/schema/schema_registry.py (1)

62-62: LGTM! Settings-based configuration for schema subject prefix.

The change from reading SCHEMA_SUBJECT_PREFIX via environment variable to using settings.SCHEMA_SUBJECT_PREFIX correctly centralizes configuration and enables the per-session/worker test isolation described in the PR objectives.

backend/app/settings.py (1)

76-76: LGTM! New setting for schema subject prefix isolation.

The addition of SCHEMA_SUBJECT_PREFIX with an empty string default is appropriate. This field enables per-session/worker isolation for Kafka schema subjects during tests while keeping production behavior unchanged (no prefix).

backend/tests/integration/events/test_admin_utils.py (2)

4-4: LGTM: Import addition supports test isolation.

The Settings import is required for the updated test signature and enables proper type hinting for the test_settings fixture.

11-12: Excellent: Settings-based test isolation implemented.

The switch from environment variables to the test_settings fixture enables proper per-session and per-worker topic isolation, directly addressing the flaky test issues mentioned in the PR objectives. The topic construction correctly combines the unique PREFIX and GROUP_SUFFIX to ensure test runs don't interfere with each other.

backend/tests/conftest.py (3)

17-17: LGTM: Standard pytest-xdist worker identification.

The worker ID extraction with a sensible default ("gw0") ensures the fixture works correctly both in parallel (pytest-xdist) and single-worker test runs.

42-48: Clean integration with test_settings fixture.

The app fixture now correctly receives and uses the test_settings instance, ensuring the FastAPI application is configured with the same isolated settings used throughout the test session. This maintains consistency across all test fixtures and components.

66-71: LGTM: Improved readability.

The multi-line formatting of the AsyncClient initialization enhances readability without any functional changes.

backend/app/core/middlewares/metrics.py

backend/app/services/sse/kafka_redis_bridge.py

backend/scripts/seed_users.py

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/core/metrics/base.py">

<violation number="1" location="backend/app/core/metrics/base.py:54">
P2: Removing the `config.otlp_endpoint` check means metrics export will be attempted even when no endpoint is configured. This could cause connection errors to the default `localhost:4317` endpoint and spawn unnecessary background threads. Consider preserving the endpoint check:
```python
if settings.TESTING or not settings.ENABLE_TRACING or not config.otlp_endpoint:
```</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/core/metrics/base.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/app/core/metrics/base.py (2)
42-79: Resource leak: MeterProvider and background threads are not cleaned up.

The _create_meter method creates a PeriodicExportingMetricReader (line 61-64) that starts a background thread for periodic metric export, and a SdkMeterProvider (line 67) that manages these resources. However:

The MeterProvider is not stored as an instance variable

The close() method (lines 76-79) does nothing to shut down the provider or stop background threads

This causes resource leaks when BaseMetrics instances are destroyed, as the export threads continue running.
♻️ Proposed fix to store and clean up MeterProvider
 def _create_meter(self, config: MetricsConfig, meter_name: str) -> Meter:
     """Create a new meter instance for this collector.

     Args:
         config: Metrics configuration
         meter_name: Name for this meter

     Returns:
         A new meter instance
     """
     # If tracing/metrics disabled or no OTLP endpoint configured, use NoOp meter to avoid threads/network
     settings = get_settings()
     if settings.TESTING or not settings.ENABLE_TRACING:
+        self._meter_provider = None
         return NoOpMeterProvider().get_meter(meter_name)

     resource = Resource.create(
         {"service.name": config.service_name, "service.version": config.service_version, "meter.name": meter_name}
     )

     reader = PeriodicExportingMetricReader(
         exporter=OTLPMetricExporter(endpoint=config.otlp_endpoint),
         export_interval_millis=config.export_interval_millis,
     )

     # Each collector gets its own MeterProvider
     meter_provider = SdkMeterProvider(resource=resource, metric_readers=[reader])
+    self._meter_provider = meter_provider

     # Return a meter from this provider
     return meter_provider.get_meter(meter_name)

 def _create_instruments(self) -> None:
     """Create metric instruments. Override in subclasses."""
     pass

 def close(self) -> None:
     """Close the metrics collector and clean up resources."""
-    # Subclasses can override if they need cleanup
-    pass
+    if self._meter_provider is not None:
+        self._meter_provider.shutdown()
54-70: Add validation for missing OTLP endpoint before creating exporter.

The condition at line 54 returns a no-op meter when testing or tracing is disabled, but it does not validate that config.otlp_endpoint is configured. According to the documented behavior (metrics-reference.md), a no-op meter should be used when "no OTLP endpoint is configured," but the code only checks the ENABLE_TRACING flag. If ENABLE_TRACING=true, TESTING=false, and OTEL_EXPORTER_OTLP_ENDPOINT is not set (defaults to None), line 62 will instantiate OTLPMetricExporter(endpoint=None), causing a runtime error.

Add an endpoint check to the conditional:
-if settings.TESTING or not settings.ENABLE_TRACING:
+if settings.TESTING or not settings.ENABLE_TRACING or not config.otlp_endpoint:
     return NoOpMeterProvider().get_meter(meter_name)
This aligns the implementation with the documented behavior and matches the pattern used elsewhere in the codebase (e.g., middlewares/metrics.py line 140).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c2b6e8 and 884fb04.

📒 Files selected for processing (2)

backend/.env.test
backend/app/core/metrics/base.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/.env.test

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

cubic-dev-ai

1 issue found across 6 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/services/sse/kafka_redis_bridge.py">

<violation number="1" location="backend/app/services/sse/kafka_redis_bridge.py:65">
P2: Inconsistent separator used before `KAFKA_GROUP_SUFFIX`. The rest of the codebase uses `.` (dot) as the separator, but this change uses `-` (hyphen). Consider using `.{suffix}` for consistency with other Kafka consumer group naming patterns.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/services/sse/kafka_redis_bridge.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @backend/app/core/middlewares/metrics.py:
- Line 140: The code calls OTLPMetricExporter with
settings.OTEL_EXPORTER_OTLP_ENDPOINT but Settings defaults that value to None;
add fallback logic so OTLPMetricExporter receives a concrete endpoint (e.g., use
settings.OTEL_EXPORTER_OTLP_ENDPOINT or "localhost:4317") or, if you prefer not
to add a fallback, update the comment above line 140 to state that the endpoint
must be explicitly configured and no default is applied; reference the
OTLPMetricExporter instantiation and settings.OTEL_EXPORTER_OTLP_ENDPOINT when
making the change.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 884fb04 and 118bd6d.

📒 Files selected for processing (6)

backend/app/core/middlewares/metrics.py
backend/app/core/providers.py
backend/app/events/core/producer.py
backend/app/services/sse/kafka_redis_bridge.py
backend/scripts/seed_users.py
backend/tests/integration/events/test_producer_roundtrip.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/scripts/seed_users.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/tests/integration/events/test_producer_roundtrip.py (3)

backend/tests/conftest.py (2)

scope (82-84)

test_settings (22-37)

backend/app/events/core/producer.py (1)

UnifiedProducer (30-290)

backend/app/events/core/types.py (1)

ProducerConfig (29-60)

backend/app/core/providers.py (1)

backend/app/events/core/producer.py (2)

UnifiedProducer (30-290)

producer (65-66)

backend/app/events/core/producer.py (1)

backend/app/settings.py (2)

Settings (11-161)

get_settings (165-166)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (4)

backend/app/services/sse/kafka_redis_bridge.py (1)

64-66: No changes needed. The KAFKA_GROUP_SUFFIX is properly defined with defaults.

The settings-based approach correctly supports test isolation. KAFKA_GROUP_SUFFIX is defined in backend/app/settings.py as a required str field with default value "suff", and tests override it with a session-and-worker-based suffix (f"{session_id}.{_WORKER_ID}"). The field is type-hinted as str (not str | None), so it cannot be None or empty. The code at lines 64–66 is correct and will never produce malformed Kafka consumer identifiers.

backend/app/core/providers.py (1)

158-158: LGTM! Settings injection enables test isolation.

The addition of settings=settings to the UnifiedProducer initialization correctly propagates the injected Settings instance, enabling per-session/per-worker topic prefix isolation as described in the PR objectives.

backend/tests/integration/events/test_producer_roundtrip.py (1)

17-26: LGTM! Test isolation properly implemented.

The test signature and producer initialization correctly use test_settings to derive dynamic bootstrap servers and topic prefixes, enabling the per-session/per-worker isolation described in the PR objectives.

backend/app/events/core/producer.py (1)

37-50: LGTM! Clean settings injection with backward compatibility.

The optional settings parameter with fallback to get_settings() maintains backward compatibility while enabling test-specific configuration injection. The topic prefix resolution correctly uses the provided settings when available.

backend/app/core/middlewares/metrics.py

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/events/core/producer.py">

<violation number="1" location="backend/app/events/core/producer.py:37">
P1: Using `get_settings()` as a default argument evaluates it at module import time, not at call time. This means all `UnifiedProducer` instances will share the same Settings captured when the module was loaded, defeating the test isolation goals of this PR. Use `None` as default and call `get_settings()` at runtime.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/events/core/producer.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

backend/app/dlq/manager.py (2)
264-267: Retry policy lookup: normalization + default fallback looks correct; consider a tiny cleanup

Behavior-wise this is solid (lookup by normalized topic, then fallback). If you want to keep it compact/readable:
Optional simplification
-        retry_policy = self._retry_policies.get(self._normalize_topic(message.original_topic))
-        retry_policy = retry_policy or self.default_retry_policy
+        retry_policy = (
+            self._retry_policies.get(self._normalize_topic(message.original_topic))
+            or self.default_retry_policy
+        )
438-440: Store key normalization is good, but keep RetryPolicy.topic consistent with the key (debuggability)

Since you’re normalizing the dict key, consider also aligning the RetryPolicy.topic value with that normalized key (or at least asserting they match). Otherwise, inspecting/logging policies can be confusing (policy says "test123.orders" but it’s stored under "orders", or vice versa).
One way to enforce consistency (shape depends on RetryPolicy mutability)
 def set_retry_policy(self, topic: str, policy: RetryPolicy) -> None:
-    self._retry_policies[self._normalize_topic(topic)] = policy
+    key = self._normalize_topic(topic)
+    # Option A: assert to keep invariant strict
+    # assert policy.topic == key, f"RetryPolicy.topic={policy.topic} must match normalized key={key}"
+    #
+    # Option B: normalize the policy's topic too (if mutable / supported by your model)
+    # policy.topic = key
+    self._retry_policies[key] = policy

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 118bd6d and 277cabf.

📒 Files selected for processing (5)

backend/app/core/container.py
backend/app/core/providers.py
backend/app/dlq/manager.py
backend/app/events/core/producer.py
backend/app/services/sse/kafka_redis_bridge.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/app/services/sse/kafka_redis_bridge.py
backend/app/events/core/producer.py

🧰 Additional context used

🧬 Code graph analysis (2)

backend/app/core/container.py (1)

backend/app/core/providers.py (1)

SagaOrchestratorProvider (690-695)

backend/app/core/providers.py (1)

backend/app/events/core/producer.py (2)

UnifiedProducer (30-289)

producer (64-65)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (2)

backend/app/core/container.py (1)

38-57: Verify SagaOrchestratorProvider() scope/side-effects in the main app container (shared singleton across requests).
Adding SagaOrchestratorProvider() here means the web app container can now resolve (and potentially initialize) the saga orchestrator at Scope.APP, which may be shared across concurrent requests. Please confirm that’s intended (thread-safety / no request-specific mutable state) and that initializing it at web-app startup won’t introduce unwanted startup cost or background tasks.

backend/app/core/providers.py (1)

153-160: Good: pass settings into UnifiedProducer for deterministic topic-prefixing; verify constructor/API compatibility.
This makes the producer’s topic prefix come from injected Settings (vs ambient/global settings), which should help test isolation. Please double-check UnifiedProducer’s constructor signature and that overriding settings here can’t accidentally diverge from other producer configuration (e.g., any settings-derived behavior beyond ProducerConfig).

backend/app/dlq/manager.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/tests/integration/test_dlq_routes.py (1)

131-143: Use test isolation prefix for topic filtering test.

The test uses hardcoded topic "execution-events" without the test isolation prefix. Since other DLQ tests (test_dlq_retry_immediate, test_dlq_discard_policy, test_dlq_manager) all build topic names using test_settings.KAFKA_TOPIC_PREFIX, messages in the DLQ database are stored with prefixed topic names. The filtering does exact matching on original_topic, so filtering by the unprefixed name returns no results. The assertion on line 143 never executes.

Add test_settings parameter and use f"{test_settings.KAFKA_TOPIC_PREFIX}execution-events" instead of the hardcoded topic name.

🤖 Fix all issues with AI agents

In @backend/app/events/core/producer.py:
- Around line 9-19: The imports in producer.py are unsorted and include an
unused get_settings import causing ruff errors I001 and F401; reorder imports
into stdlib, third‑party, first‑party, then local groups (e.g., keep
confluent_kafka imports together with Producer and KafkaError in the third‑party
group), remove the unused get_settings symbol, and ensure first‑party imports
like app.core.lifecycle.LifecycleEnabled,
app.core.metrics.context.get_event_metrics,
app.dlq.models.DLQMessage/DLQMessageStatus, app.domain.enums.kafka.KafkaTopic,
app.events.schema.schema_registry.SchemaRegistryManager,
app.infrastructure.kafka.events.BaseEvent, app.settings.Settings, and local
.types imports (ProducerConfig, ProducerMetrics, ProducerState) are in the
correct group and sorted to satisfy ruff.

🧹 Nitpick comments (3)

backend/app/events/core/producer.py (2)
29-49: Settings injection looks right; consider letting the formatter handle indentation.
Requiring settings: Settings and deriving _topic_prefix from it aligns with the test-isolation goal. The parameter indentation in __init__ looks off relative to typical black/ruff-format output—worth running the formatter to avoid churn.

174-176: Signature indentation drift in produce / send_to_dlq—prefer black-style wrapping for stability.
This is cosmetic, but matching formatter output reduces future diffs.
Example formatting
-    async def produce(
-            self, event_to_produce: BaseEvent, key: str | None = None, headers: dict[str, str] | None = None
-    ) -> None:
+    async def produce(
+        self,
+        event_to_produce: BaseEvent,
+        key: str | None = None,
+        headers: dict[str, str] | None = None,
+    ) -> None:
@@
-    async def send_to_dlq(
-            self, original_event: BaseEvent, original_topic: str, error: Exception, retry_count: int = 0
-    ) -> None:
+    async def send_to_dlq(
+        self,
+        original_event: BaseEvent,
+        original_topic: str,
+        error: Exception,
+        retry_count: int = 0,
+    ) -> None:
Also applies to: 207-209
backend/tests/integration/test_dlq_routes.py (1)

35-35: Consider clarifying the purpose of the test_user parameter.

The test_user parameter appears in all test method signatures but is never explicitly used in the test bodies. If this fixture provides authentication via side effects (e.g., setting up auth tokens in the client), consider adding a brief comment to clarify this pattern. If it's genuinely unused, it could be removed.

Also applies to: 80-80, 116-116, 131-131, 146-146, 186-186, 198-198, 222-222, 259-259, 290-290, 328-328, 358-358

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 277cabf and 6027ac4.

📒 Files selected for processing (2)

backend/app/events/core/producer.py
backend/tests/integration/test_dlq_routes.py

🧰 Additional context used

🧬 Code graph analysis (2)

backend/tests/integration/test_dlq_routes.py (1)

backend/tests/conftest.py (1)

test_settings (22-37)

backend/app/events/core/producer.py (4)

backend/app/settings.py (2)

Settings (11-161)

get_settings (165-166)

backend/app/events/core/types.py (1)

ProducerConfig (29-60)

backend/app/events/schema/schema_registry.py (1)

SchemaRegistryManager (53-229)

backend/app/infrastructure/kafka/events/base.py (1)

BaseEvent (13-37)

🪛 GitHub Actions: Ruff Linting

backend/app/events/core/producer.py

[error] 1-19: I001 Import block is un-sorted or un-formatted. Organize imports (ruff) for the step: uv run ruff check . --config pyproject.toml

[error] 15-15: F401 'app.settings.get_settings' imported but unused. Remove unused import.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Scan Backend
GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (2)

backend/tests/integration/test_dlq_routes.py (2)

35-35: LGTM: Type annotations modernized to Python 3.9+ style.

The update from Dict[str, str] to dict[str, str] uses built-in generic types, which is the idiomatic approach for Python 3.9+.

Also applies to: 80-80, 116-116, 131-131, 146-146, 186-186, 198-198, 222-222, 259-259, 290-290, 328-328, 358-358

198-210: LGTM: Test isolation implemented correctly.

The addition of test_settings parameter and the prefixed topic construction at line 204 properly implements per-session and per-worker test isolation, which aligns with the PR's goal to fix flaky tests.

backend/app/events/core/producer.py

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/dlq/manager.py">

<violation number="1">
P2: Removing topic normalization may cause retry policy lookup mismatches when `original_topic` in messages contains the Kafka prefix but policies are registered without it (or vice versa). The previous `_normalize_topic` method ensured both sides used unprefixed topic names for consistent matching. Consider ensuring that either: (1) all `original_topic` values are stored without prefix, and policies are set without prefix, or (2) restore normalization to handle inconsistencies gracefully.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

backend/tests/integration/test_events_routes.py (1)

310-325: Fix CSRF header in fixture-auth tests: use test_user["headers"] on mutating requests.
The test calls client.post("/api/v1/events/publish", json=publish_request) without passing test_user["headers"]. Since the backend enforces CSRF on POST/PUT/DELETE for authenticated sessions, this request fails CSRF validation before reaching the authorization check. Add headers=test_user["headers"] to all mutating requests in fixture-auth tests.
backend/tests/integration/test_user_settings_routes.py (1)
343-382: Assert login/logout success and replace fragile default-based comparison with user_id check.

The test relies on fixture execution order without verifying login success, and the final assertion uses OR logic (!= theme OR != timezone) which only needs one condition to pass—this can flake if defaults coincidentally match your chosen values. Additionally, the comment acknowledges the fixture order dependency but doesn't eliminate it robustly.

Asserting login/logout status codes and comparing user_id fields makes the test deterministic and independent of default settings.
Proposed patch
 async def test_settings_isolation_between_users(self, client: AsyncClient,
                                                 test_user: Dict[str, str],
                                                 another_user: Dict[str, str]) -> None:
     """Test that settings are isolated between users."""

-    # Login as first user (fixture logs in another_user last, so re-login as test_user)
+    # Login as first user (do not rely on fixture execution order)
     login_data = {
         "username": test_user["username"],
         "password": test_user["password"]
     }
-    await client.post("/api/v1/auth/login", data=login_data)
+    r_login_1 = await client.post("/api/v1/auth/login", data=login_data)
+    assert r_login_1.status_code == 200

     # Update first user's settings
     user1_update = {
         "theme": "dark",
         "timezone": "America/New_York"
     }
     response = await client.put("/api/v1/user/settings/", json=user1_update)
     assert response.status_code == 200
+    user1_settings = UserSettings(**response.json())

     # Log out
-    await client.post("/api/v1/auth/logout")
+    r_logout = await client.post("/api/v1/auth/logout")
+    assert r_logout.status_code in (200, 204)

     # Login as second user
     login_data = {
         "username": another_user["username"],
         "password": another_user["password"]
     }
-    await client.post("/api/v1/auth/login", data=login_data)
+    r_login_2 = await client.post("/api/v1/auth/login", data=login_data)
+    assert r_login_2.status_code == 200

     # Get second user's settings
     response = await client.get("/api/v1/user/settings/")
     assert response.status_code == 200
-    user2_settings = response.json()
+    user2_settings = UserSettings(**response.json())

-    # Verify second user's settings are not affected by first user's changes
-    # Second user should have default settings, not the first user's custom settings
-    assert user2_settings["theme"] != user1_update["theme"] or user2_settings["timezone"] != user1_update[
-        "timezone"]
+    # Verify isolation via identity (robust across varying defaults)
+    assert user2_settings.user_id != user1_settings.user_id
backend/tests/integration/test_admin_routes.py (1)
30-60: Avoid asserting hard-coded default values; prefer validating type, constraints, and valid ranges instead.

The test asserts exact values for monitoring settings (e.g., enable_tracing is True, sampling_rate == 0.1), but these are hard-coded Pydantic field defaults that may change or be configurable. While .env.test configuration doesn't affect these specific admin settings defaults, the test is still brittle by coupling to exact values rather than validating that returned values conform to their schema constraints.
Example (less brittle) assertions
-        assert settings.monitoring_settings.enable_tracing is True
-        assert settings.monitoring_settings.sampling_rate == 0.1
+        assert isinstance(settings.monitoring_settings.enable_tracing, bool)
+        assert isinstance(settings.monitoring_settings.sampling_rate, (int, float))
+        assert 0.0 <= settings.monitoring_settings.sampling_rate <= 1.0
backend/tests/conftest.py (1)
99-139: Auth fixtures: don't silently accept 400 on register + set CSRF header on authenticated client.

Accepting 400 on register can hide real regressions (e.g., if validation tightens unexpectedly). More critically, the CSRF token returned from login must be set on client.headers["X-CSRF-Token"] because the API enforces CSRF validation on POST/PUT/PATCH/DELETE requests, and the current fixtures leave it unset—this will cause authenticated requests to fail CSRF checks.
Proposed patch
 async def _http_login(client: httpx.AsyncClient, username: str, password: str) -> str:
     data = {"username": username, "password": password}
     resp = await client.post("/api/v1/auth/login", data=data)
     resp.raise_for_status()
-    return resp.json().get("csrf_token", "")
+    token = resp.json().get("csrf_token", "")
+    if not token:
+        raise RuntimeError("Login succeeded but no csrf_token returned")
+    return token

 @pytest_asyncio.fixture
 async def test_user(client: httpx.AsyncClient):
     """Function-scoped authenticated user."""
     uid = uuid.uuid4().hex[:8]
     creds = {
         "username": f"test_user_{uid}",
         "email": f"test_user_{uid}@example.com",
         "password": "TestPass123!",
         "role": "user",
     }
     r = await client.post("/api/v1/auth/register", json=creds)
-    if r.status_code not in (200, 201, 400):
+    if r.status_code not in (200, 201):
         pytest.fail(f"Cannot create test user (status {r.status_code}): {r.text}")
     csrf = await _http_login(client, creds["username"], creds["password"])
+    client.headers.update({"X-CSRF-Token": csrf})
     return {**creds, "csrf_token": csrf, "headers": {"X-CSRF-Token": csrf}}

 @pytest_asyncio.fixture
 async def test_admin(client: httpx.AsyncClient):
     """Function-scoped authenticated admin."""
     uid = uuid.uuid4().hex[:8]
     creds = {
         "username": f"admin_user_{uid}",
         "email": f"admin_user_{uid}@example.com",
         "password": "AdminPass123!",
         "role": "admin",
     }
     r = await client.post("/api/v1/auth/register", json=creds)
-    if r.status_code not in (200, 201, 400):
+    if r.status_code not in (200, 201):
         pytest.fail(f"Cannot create test admin (status {r.status_code}): {r.text}")
     csrf = await _http_login(client, creds["username"], creds["password"])
+    client.headers.update({"X-CSRF-Token": csrf})
     return {**creds, "csrf_token": csrf, "headers": {"X-CSRF-Token": csrf}}

🧹 Nitpick comments (1)

backend/tests/conftest.py (1)

63-73: HTTP client: consider whether follow_redirects=True is desirable for integration assertions.
If any endpoint accidentally redirects (302/307), this setting can mask issues by auto-following. If you want strictness, disable it; otherwise OK.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6027ac4 and 7cc31fe.

📒 Files selected for processing (11)

backend/app/events/core/producer.py
backend/tests/conftest.py
backend/tests/e2e/test_execution_routes.py
backend/tests/integration/test_admin_routes.py
backend/tests/integration/test_events_routes.py
backend/tests/integration/test_health_routes.py
backend/tests/integration/test_notifications_routes.py
backend/tests/integration/test_replay_routes.py
backend/tests/integration/test_saga_routes.py
backend/tests/integration/test_saved_scripts_routes.py
backend/tests/integration/test_user_settings_routes.py

💤 Files with no reviewable changes (6)

backend/tests/integration/test_saved_scripts_routes.py
backend/tests/e2e/test_execution_routes.py
backend/tests/integration/test_replay_routes.py
backend/tests/integration/test_health_routes.py
backend/tests/integration/test_saga_routes.py
backend/tests/integration/test_notifications_routes.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/app/events/core/producer.py (1)

backend/app/events/core/types.py (1)

ProducerConfig (29-60)

backend/tests/integration/test_user_settings_routes.py (1)

backend/tests/conftest.py (1)

another_user (142-162)

backend/tests/conftest.py (2)

backend/app/settings.py (1)

Settings (11-161)

backend/app/main.py (1)

create_app (44-127)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (6)

backend/app/events/core/producer.py (2)

49-49: Good architectural improvement for test isolation.

Using the injected settings instance for topic prefix configuration supports per-session/worker isolation and improves testability by eliminating runtime dependencies on global settings.

32-37: All instantiation sites correctly pass the settings parameter. The breaking change has been properly implemented across the codebase.
backend/tests/conftest.py (2)
41-54: App lifecycle: good move passing test_settings into create_app, and closing Dishka container.
This should reduce cross-worker config bleed and resource leakage.

21-37: Remove redundant % 16 modulo on REDIS_DB assignment.

The code applies modulo 16 twice: once in worker_num = sum(_WORKER_ID.encode()) % 16 and again in REDIS_DB: worker_num % 16. The second modulo is redundant and should be removed. Additionally, note that with pytest-xdist runs exceeding 16 workers, Redis database collisions are unavoidable given Redis's default 16-database limit (0–15).
Suggested fix
@pytest.fixture(scope="session")
def test_settings() -> Settings:
    """Provide test settings with a unique Kafka topic prefix for isolation."""
    base = Settings(_env_file=".env.test", _env_file_encoding="utf-8")
    session_id = uuid.uuid4().hex[:8]
    base_prefix = f"{base.KAFKA_TOPIC_PREFIX.rstrip('.')}."
    worker_num = sum(_WORKER_ID.encode()) % 16
    unique_prefix = f"{base_prefix}{session_id}.{_WORKER_ID}."
    return base.model_copy(
         update={
             "DATABASE_NAME": f"integr8scode_test_{session_id}_{_WORKER_ID}",
-            "REDIS_DB": worker_num % 16,
+            "REDIS_DB": worker_num,
             "KAFKA_GROUP_SUFFIX": f"{session_id}.{_WORKER_ID}",
             "SCHEMA_SUBJECT_PREFIX": f"test.{session_id}.{_WORKER_ID}.",
             "KAFKA_TOPIC_PREFIX": unique_prefix,
         }
    )
backend/tests/integration/test_admin_routes.py (1)

111-121: No issues found; test is correct.

The test_user fixture properly establishes both authentication and CSRF state: it logs in via _http_login() (which sets session cookies and retrieves the CSRF token), then returns the CSRF token in the dict. However, CSRF validation is skipped for GET requests per the security implementation (see validate_csrf_token() in backend/app/core/security.py, which exempts GET/HEAD/OPTIONS methods). The expected 403 status is correct and comes from the admin_user dependency enforcing the admin role requirement, not from missing CSRF setup.

Likely an incorrect or invalid review comment.

backend/tests/integration/test_events_routes.py (1)

383-399: CSRF validation is not currently enforced on these endpoints.

The DELETE and POST routes (delete_event at line 277 and replay_aggregate_events at line 308 of backend/app/api/routes/events.py) only depend on admin_user for authorization. The validate_csrf_token dependency defined in backend/app/core/security.py is never applied to these endpoints, so CSRF token validation does not occur. The tests correctly return 403 for non-admin users due to role-based access control, not CSRF failures. The fixture infrastructure is sound (httpx.AsyncClient persists auth cookies automatically), but CSRF enforcement is not active on these routes.

Likely an incorrect or invalid review comment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

backend/tests/integration/idempotency/test_consumer_idempotent.py (1)
62-63: Consider polling for consumer readiness instead of a fixed sleep.

While the 2-second sleep addresses the flaky test, a fixed delay may still fail on slower CI systems or waste time on faster ones. Consider polling for partition assignments instead:
♻️ Suggested more robust approach
 await wrapper.start([KafkaTopic.EXECUTION_EVENTS])
-# Allow time for consumer to join group and get partition assignments
-await asyncio.sleep(2)
+
+# Wait for consumer to join group and get partition assignments
+async def _consumer_ready():
+    assignments = wrapper.consumer.assignment()
+    assert assignments, "Consumer should have partition assignments"
+
+await eventually(_consumer_ready, timeout=5.0, interval=0.1)
This approach verifies the consumer is actually ready rather than assuming a fixed duration is sufficient.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cc31fe and 4abcb71.

📒 Files selected for processing (1)

backend/tests/integration/idempotency/test_consumer_idempotent.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/tests/integration/idempotency/test_consumer_idempotent.py (1)

backend/tests/helpers/events.py (1)

make_execution_requested_event (8-50)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (1)

backend/tests/integration/idempotency/test_consumer_idempotent.py (1)

14-15: LGTM!

The import reorganization is clean and maintains the same functionality.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @backend/tests/integration/events/test_consume_roundtrip.py:
- Around line 16-18: The xdist_group("kafka_consumers") (and similar
kafka_consumers, dlq, user_settings) markers are ineffective because
pyproject.toml uses addopts = "-n 4 --dist loadfile"; change the pytest config
to use --dist=loadgroup (update pyproject.toml addopts to "-n 4 --dist
loadgroup") so the xdist_group markers are honored, or alternatively remove the
xdist_group(...) markers from the affected test files if you prefer to keep
--dist=loadfile; update whichever of these two places (the xdist_group markers
in the test files or the addopts line in pyproject.toml) to make the behavior
consistent.

In @backend/tests/integration/events/test_consumer_lifecycle.py:
- Around line 10-13: The xdist_group marker in the test prevents librdkafka
races only when pytest runs in parallel; update the GitHub Actions job that runs
the integration tests (the workflow step invoking "pytest tests/integration") to
add parallelization grouping by appending "-n auto --dist=loadgroup" to the
pytest command so the xdist_group("kafka_consumers") marker takes effect under
CI when tests run with xdist.

In @backend/tests/integration/events/test_event_store_consumer.py:
- Around line 16-23: Add the pytest marker registration for the xdist_group
marker in the pytest config so pytest no longer warns; specifically, update the
[tool.pytest.ini_options] markers list to include the entry "xdist_group: marks
tests for pytest-xdist grouping" so the marker used by pytestmark (xdist_group)
in test_event_store_consumer.py is registered.

In @backend/tests/integration/idempotency/test_consumer_idempotent.py:
- Around line 69-70: Replace the fixed await asyncio.sleep(2) with a readiness
wait that polls the consumer assignment; call the wrapper.assignment() via the
UnifiedConsumer wrapper inside eventually() (e.g. eventually(lambda:
bool(wrapper.assignment()), timeout=10.0, interval=0.1)) so the test waits for
actual partition assignments and avoids flakiness or extra delay; alternatively
you may remove the sleep entirely if you rely on the subsequent eventually(_one,
...) call to synchronize readiness.

🧹 Nitpick comments (1)

backend/tests/integration/idempotency/test_consumer_idempotent.py (1)

14-16: Avoid relying on tests.helpers package side effects for a single factory import.

If this import was moved to dodge circular-import/ordering issues, consider importing the concrete module instead (less likely to reintroduce cycles via tests/helpers/__init__.py), e.g. from tests.helpers.events import make_execution_requested_event.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4abcb71 and aa3c8ac.

📒 Files selected for processing (6)

backend/tests/integration/events/test_consume_roundtrip.py
backend/tests/integration/events/test_consumer_lifecycle.py
backend/tests/integration/events/test_event_dispatcher.py
backend/tests/integration/events/test_event_store_consumer.py
backend/tests/integration/idempotency/test_consumer_idempotent.py
backend/tests/integration/result_processor/test_result_processor.py

🧰 Additional context used

🧬 Code graph analysis (4)

backend/tests/integration/events/test_consumer_lifecycle.py (1)

backend/app/events/core/types.py (1)

ConsumerConfig (64-102)

backend/tests/integration/result_processor/test_result_processor.py (6)

backend/tests/conftest.py (2)

app (42-53)

db (88-90)

backend/tests/unit/conftest.py (2)

app (25-26)

db (10-11)

backend/app/db/repositories/execution_repository.py (1)

ExecutionRepository (17-91)

backend/app/domain/enums/kafka.py (1)

KafkaTopic (7-53)

backend/app/domain/execution/models.py (1)

DomainExecutionCreate (73-80)

backend/tests/helpers/eventually.py (1)

eventually (7-32)

backend/tests/integration/events/test_event_store_consumer.py (1)

backend/app/domain/enums/auth.py (1)

LoginMethod (4-10)

backend/tests/integration/idempotency/test_consumer_idempotent.py (2)

backend/tests/helpers/events.py (1)

make_execution_requested_event (8-50)

backend/tests/helpers/eventually.py (1)

eventually (7-32)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Scan Backend
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (6)

backend/tests/integration/events/test_event_dispatcher.py (1)

16-18: Ensure xdist_group is actually enforced (CI flags + marker registration).

pytest.mark.xdist_group("kafka_consumers") typically only changes scheduling if xdist is run with the grouping dist mode (e.g. --dist=loadgroup). Also, if your config enables strict markers, you’ll need to register xdist_group in pytest.ini/pyproject.toml to avoid “unknown marker” failures.

backend/tests/integration/events/test_consumer_lifecycle.py (1)

21-24: ConsumerConfig reformat is fine. No behavioral changes.

backend/tests/integration/events/test_event_store_consumer.py (1)

6-7: LoginMethod import looks correct and is used.

backend/tests/integration/result_processor/test_result_processor.py (2)

25-32: xdist_group("kafka_consumers") addition is consistent with the rest of the suite.
As above: make sure loadgroup is enabled; otherwise this won’t serialize anything.

5-5: No action required—DomainExecutionCreate is properly re-exported from app.domain.execution.

The __init__.py file in app/domain/execution correctly imports and exports DomainExecutionCreate, so the import path used here and elsewhere is valid.
backend/tests/integration/idempotency/test_consumer_idempotent.py (1)
18-25: The xdist_group containment strategy is properly supported and will work as intended.

pytest-xdist 3.6.1 is already in the dev dependencies (backend/pyproject.toml), and the xdist_group marker is used consistently across multiple Kafka-related test files. The strategy correctly addresses the documented librdkafka race condition in multi-worker environments.

Minor improvement: register xdist_group in the [tool.pytest.ini_options] markers list for consistency:
markers = [
    ...
    "xdist_group: marks tests to run serially in pytest-xdist",
]

backend/tests/integration/events/test_consume_roundtrip.py

backend/tests/integration/events/test_consumer_lifecycle.py

backend/tests/integration/events/test_event_store_consumer.py

backend/tests/integration/idempotency/test_consumer_idempotent.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/backend-ci.yml (1)
50-50: Consider making test duration reporting configurable.

The --durations=0 flag reports execution time for all tests, which significantly increases CI log verbosity. While useful for profiling slow tests, this overhead may not be necessary for every CI run.

Consider making this configurable:

Use a workflow input or environment variable to enable/disable duration reporting

Document the rationale if this is intended to be permanent

Alternatively, use --durations=10 to report only the slowest 10 tests
♻️ Example: Make duration reporting configurable

Add a workflow input at the top of the file:
on:
  workflow_dispatch:
    inputs:
      durations:
        description: 'Number of slowest tests to report (0 for all, empty to disable)'
        required: false
        default: ''
Then use it conditionally in test commands:
       run: |
         cd backend
         uv run pytest tests/unit -v -rs \
-          --durations=0 \
+          ${{ github.event.inputs.durations && format('--durations={0}', github.event.inputs.durations) || '' }} \
           --cov=app \
           --cov-report=xml --cov-report=term
This allows enabling duration reporting on-demand via workflow_dispatch while keeping regular CI runs clean.
Also applies to: 120-120, 215-215

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5489e39 and 93a79e6.

📒 Files selected for processing (1)

.github/workflows/backend-ci.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Integration Tests
GitHub Check: E2E Tests

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

backend/workers/run_pod_monitor.py (1)

67-74: LGTM! Settings injection aligns with the PR's DI refactor.

The addition of the settings parameter to init_tracing is consistent with the broader refactor to inject Settings via dependency injection across the stack. This enables tracing initialization to access configuration like the TESTING flag to conditionally disable tracing during tests.

Minor observation: There's some redundancy in passing both the settings object and individual extracted values (service_version=settings.TRACING_SERVICE_VERSION, sampling_rate=settings.TRACING_SAMPLING_RATE). This is likely intentional for backward compatibility during the transition, but could be simplified in a future cleanup if init_tracing can extract these directly from the settings parameter.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e88e606 and 72a7733.

📒 Files selected for processing (6)

backend/workers/run_coordinator.py
backend/workers/run_event_replay.py
backend/workers/run_k8s_worker.py
backend/workers/run_pod_monitor.py
backend/workers/run_result_processor.py
backend/workers/run_saga_orchestrator.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (5)

backend/workers/run_k8s_worker.py (1)

67-67: LGTM! Settings properly passed to tracing initialization.

The addition of the settings parameter enables the tracing initialization to access the full Settings configuration, aligning with the broader Settings-based dependency injection refactor across the codebase.

backend/workers/run_saga_orchestrator.py (1)

67-67: LGTM! Settings properly passed to tracing initialization.

Consistent with the Settings-based DI pattern applied across all worker entry points.

backend/workers/run_coordinator.py (1)

67-67: LGTM! Settings properly passed to tracing initialization.

Consistent with the Settings-based DI pattern applied across all worker entry points.

backend/workers/run_event_replay.py (1)

74-74: LGTM! Settings properly passed to tracing initialization.

Consistent with the Settings-based DI pattern applied across all worker entry points.

backend/workers/run_result_processor.py (1)

83-83: LGTM! Settings properly passed to tracing initialization.

Consistent with the Settings-based DI pattern applied across all worker entry points.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @backend/pyproject.toml:
- Line 211: Remove the duplicate quiet flag from the addopts setting by editing
the addopts value (currently "-n 4 --dist loadgroup --tb=short -q --no-header
-q") to contain only a single "-q"; keep the rest of the options unchanged so
the test runner still uses 4 workers, loadgroup distribution, short traceback
formatting, and no header.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72a7733 and 2cb4d4d.

⛔ Files ignored due to path filters (1)

backend/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

backend/pyproject.toml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Scan Backend
GitHub Check: E2E Tests
GitHub Check: Integration Tests
GitHub Check: Unit Tests

backend/pyproject.toml

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/core/middlewares/metrics.py">

<violation number="1" location="backend/app/core/middlewares/metrics.py:132">
P2: Hardcoding `service.environment` to "production" removes the ability to distinguish test metrics from production metrics. If `OTEL_EXPORTER_OTLP_ENDPOINT` is configured in a test environment, metrics will be incorrectly tagged as production. Consider preserving the conditional logic based on `settings.TESTING`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/core/middlewares/metrics.py

cubic-dev-ai

3 issues found across 18 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/tests/unit/conftest.py">

<violation number="1" location="backend/tests/unit/conftest.py:29">
P3: Generator fixture has incorrect return type annotation. Since this fixture uses `yield`, it should either omit the return type or use `Generator[None, None, None]`/`Iterator[None]`.</violation>
</file>

<file name="backend/app/core/metrics/base.py">

<violation number="1" location="backend/app/core/metrics/base.py:51">
P1: The check for `settings.TESTING` and `settings.ENABLE_TRACING` was removed, but the PR description says "metrics NoOp in TESTING". This will cause metrics to be sent during tests (if OTLP endpoint is set), and when tracing is explicitly disabled. The `settings` parameter is passed but unused in the condition.</violation>
</file>

<file name="backend/app/main.py">

<violation number="1" location="backend/app/main.py:130">
P0: Removing module-level `app = create_app()` breaks production deployment. The Dockerfile uses `gunicorn app.main:app` which requires `app` to be available at module level. Either restore `app = create_app()` or update the Dockerfile to use factory mode (e.g., `gunicorn 'app.main:create_app()' --factory` with gunicorn 20.1+).</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/main.py

backend/app/core/metrics/base.py

backend/tests/unit/conftest.py

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @backend/app/core/middlewares/metrics.py:
- Around line 128-134: The Resource.create call hard-codes "service.environment"
to "production"; change it to read the environment from your configuration
(e.g., use settings.ENVIRONMENT or settings.ENV with a sensible default) so
metrics are tagged by actual environment; update the Resource.create invocation
that sets SERVICE_NAME, SERVICE_VERSION and "service.environment" to use
settings.ENVIRONMENT (fallback to "production" if missing) instead of the
literal "production".
- Around line 121-125: The setup_metrics function should explicitly skip
initializing metrics when running tests; update setup_metrics to check
settings.TESTING (like the tracing check in dishka_lifespan.py:48) and return
early (with a log message) if TESTING is true, before attempting to read
OTEL_EXPORTER_OTLP_ENDPOINT or set up exporters so metrics cannot be initialized
in test mode even if environment variables are present.

In @backend/app/core/providers.py:
- Around line 630-632: The line constructing event_replay_service with
EventReplayService is >120 chars; break the constructor call across multiple
lines for readability: put EventReplayService(...) on its own line and place
each keyword argument (repository=replay_repository, producer=kafka_producer,
event_store=event_store, settings=settings, logger=logger) on separate indented
lines (or grouped to fit <120 chars) so the resulting lines all stay within the
120-character limit while preserving the same argument names and order.

🧹 Nitpick comments (1)

backend/tests/unit/core/metrics/test_replay_and_security_metrics.py (1)
37-58: LGTM: Test correctly updated for Settings-based DI.

The test properly uses the test_settings fixture and instantiates SecurityMetrics with it.
💅 Optional: Consider consistent formatting

Line 43 uses semicolons to chain multiple calls, which is inconsistent with the formatting in the rest of the file where calls are on separate lines. For consistency, consider:
-    m.update_active_sessions(2); m.increment_active_sessions(); m.decrement_active_sessions()
+    m.update_active_sessions(2)
+    m.increment_active_sessions()
+    m.decrement_active_sessions()

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb4d4d and b3cdef8.

📒 Files selected for processing (19)

backend/app/core/metrics/base.py
backend/app/core/metrics/context.py
backend/app/core/middlewares/metrics.py
backend/app/core/providers.py
backend/app/main.py
backend/app/services/event_replay/replay_service.py
backend/app/services/k8s_worker/worker.py
backend/pyproject.toml
backend/tests/unit/conftest.py
backend/tests/unit/core/metrics/test_base_metrics.py
backend/tests/unit/core/metrics/test_connections_and_coordinator_metrics.py
backend/tests/unit/core/metrics/test_database_and_dlq_metrics.py
backend/tests/unit/core/metrics/test_execution_and_events_metrics.py
backend/tests/unit/core/metrics/test_health_and_rate_limit_metrics.py
backend/tests/unit/core/metrics/test_kubernetes_and_notifications_metrics.py
backend/tests/unit/core/metrics/test_metrics_classes.py
backend/tests/unit/core/metrics/test_metrics_context.py
backend/tests/unit/core/metrics/test_replay_and_security_metrics.py
backend/tests/unit/services/coordinator/test_queue_manager.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/app/services/k8s_worker/worker.py
backend/pyproject.toml

🧰 Additional context used

🧬 Code graph analysis (9)

backend/tests/unit/core/metrics/test_execution_and_events_metrics.py (3)

backend/tests/unit/conftest.py (1)

app (66-67)

backend/app/core/metrics/execution.py (16)

ExecutionMetrics (5-108)

record_script_execution (65-66)

record_execution_duration (68-69)

increment_active_executions (71-72)

decrement_active_executions (74-75)

record_memory_usage (77-78)

record_error (80-81)

update_queue_depth (83-84)

record_queue_wait_time (86-87)

record_execution_assigned (89-90)

record_execution_queued (92-93)

record_execution_scheduled (95-96)

update_cpu_available (98-99)

update_memory_available (101-102)

update_gpu_available (104-105)

update_allocations_active (107-108)

backend/app/core/metrics/events.py (21)

EventMetrics (4-209)

record_event_published (98-110)

record_event_processing_duration (112-113)

record_pod_event_published (115-116)

record_event_replay_operation (118-119)

update_event_buffer_size (121-122)

record_event_buffer_dropped (124-125)

record_event_buffer_processed (127-128)

record_event_buffer_latency (130-131)

set_event_buffer_backpressure (133-135)

record_event_buffer_memory_usage (137-138)

record_event_stored (140-141)

record_events_processing_failed (143-154)

record_processing_duration (172-177)

record_kafka_message_produced (179-182)

record_kafka_message_consumed (184-185)

record_kafka_consumer_lag (187-190)

record_kafka_production_error (192-193)

record_kafka_consumption_error (195-198)

update_event_bus_queue_size (200-201)

set_event_bus_queue_size (203-209)

backend/app/core/metrics/base.py (1)

backend/app/settings.py (1)

Settings (11-164)

backend/app/core/providers.py (7)

backend/app/core/security.py (1)

SecurityService (23-107)

backend/app/settings.py (1)

Settings (11-164)

backend/app/core/metrics/context.py (2)

get_event_metrics (204-205)

get_event_metrics (256-257)

backend/app/core/metrics/events.py (1)

EventMetrics (4-209)

backend/app/db/repositories/admin/admin_user_repository.py (1)

AdminUserRepository (22-115)

backend/app/services/kafka_event_service.py (1)

KafkaEventService (25-295)

backend/app/services/event_replay/replay_service.py (1)

EventReplayService (23-368)

backend/tests/unit/conftest.py (1)

backend/app/core/metrics/context.py (3)

MetricsContext (113-233)

initialize_all (123-163)

reset_all (166-185)

backend/tests/unit/core/metrics/test_health_and_rate_limit_metrics.py (3)

backend/tests/unit/conftest.py (1)

app (66-67)

backend/app/settings.py (1)

Settings (11-164)

backend/app/core/metrics/health.py (7)

HealthMetrics (4-109)

record_health_check_duration (62-68)

record_health_check_failure (70-73)

update_health_check_status (75-77)

record_health_status (79-83)

record_service_health_score (85-86)

update_liveness_status (88-90)

backend/tests/unit/core/metrics/test_replay_and_security_metrics.py (4)

backend/tests/unit/conftest.py (1)

app (66-67)

backend/app/settings.py (1)

Settings (11-164)

backend/app/core/metrics/replay.py (20)

ReplayMetrics (4-180)

record_session_created (93-94)

update_active_replays (96-103)

increment_active_replays (105-106)

decrement_active_replays (108-109)

record_events_replayed (111-117)

record_event_replayed (119-120)

record_replay_duration (122-128)

record_event_processing_time (130-131)

record_replay_error (133-134)

record_status_change (136-141)

update_sessions_by_status (143-145)

record_replay_by_target (147-151)

record_speed_multiplier (153-154)

record_delay_applied (156-157)

record_batch_size (159-160)

record_events_filtered (162-163)

record_filter_effectiveness (165-168)

record_replay_memory_usage (170-171)

update_replay_queue_size (173-180)

backend/app/core/metrics/security.py (1)

SecurityMetrics (4-322)

backend/tests/unit/core/metrics/test_connections_and_coordinator_metrics.py (2)

backend/tests/unit/conftest.py (1)

app (66-67)

backend/app/settings.py (1)

Settings (11-164)

backend/tests/unit/core/metrics/test_database_and_dlq_metrics.py (3)

backend/tests/unit/conftest.py (1)

app (66-67)

backend/app/settings.py (1)

Settings (11-164)

backend/app/core/metrics/dlq.py (1)

DLQMetrics (4-94)

backend/app/main.py (1)

backend/app/core/middlewares/metrics.py (1)

setup_metrics (121-161)

🪛 GitHub Actions: Ruff Linting

backend/app/core/providers.py

[error] 631-631: E501: Line too long (124 > 120). The line 'repository=replay_repository, producer=kafka_producer, event_store=event_store, settings=settings, logger=logger' exceeds the maximum line length. Consider breaking into multiple lines. Command: uv run ruff check . --config pyproject.toml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Scan Backend
GitHub Check: Scan Frontend
GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (32)

backend/tests/unit/services/coordinator/test_queue_manager.py (1)

10-11: LGTM! Good addition for test organization.

The module-level pytestmark correctly applies the unit test marker to all tests in this module, which aligns with the PR's goal of improving test organization and categorization.

backend/tests/unit/core/metrics/test_connections_and_coordinator_metrics.py (2)

5-5: LGTM!

The import is correctly added to support Settings dependency injection in the test functions.

10-12: LGTM!

Both test functions correctly implement the Settings dependency injection pattern. The test_settings fixture is injected and passed to the metric constructors, aligning with the PR's goal of centralizing configuration via Settings. Since metrics become no-op under test mode, these smoke tests appropriately verify that metric methods can be called without errors.

Also applies to: 25-27

backend/app/core/metrics/context.py (1)

48-61: LGTM! Explicit initialization improves predictability.

The removal of lazy initialization in favor of explicit DI-based initialization is a solid architectural improvement. The RuntimeError provides a clear, actionable message that guides developers to call MetricsContext.initialize_all() during app startup.

backend/tests/unit/core/metrics/test_metrics_context.py (1)

15-24: LGTM! Test correctly validates session-scoped initialization.

The test properly verifies that metrics initialized via the session fixture maintain singleton behavior within the context, aligning with the new explicit initialization pattern.

backend/tests/unit/core/metrics/test_kubernetes_and_notifications_metrics.py (2)

11-37: LGTM! Proper Settings injection for DI pattern.

The test correctly injects the test_settings fixture into KubernetesMetrics, aligning with the new dependency injection approach for metrics initialization.

40-59: LGTM! Consistent Settings injection.

The test properly passes test_settings to NotificationMetrics, maintaining consistency with the DI-based initialization pattern.

backend/app/services/event_replay/replay_service.py (1)

24-41: LGTM! Settings properly propagated to metrics.

The service correctly accepts Settings via constructor injection and passes it to ReplayMetrics, maintaining the DI pattern across the service hierarchy.

backend/app/core/metrics/base.py (2)

22-36: LGTM! Clean dependency injection implementation.

The refactor from implicit get_settings() calls to explicit Settings injection improves testability and makes dependencies explicit. The configuration object construction is clear and straightforward.

39-67: ENABLE_TRACING check removed from metrics initialization, but intended behavior is preserved through environment configuration.

The concern about removing ENABLE_TRACING is valid: line 51 now only checks config.otlp_endpoint rather than also checking ENABLE_TRACING. However, the architecture ensures this works correctly:

ENABLE_TRACING is checked at the application initialization level (dishka_lifespan.py line 48) to gate tracing setup entirely

For TESTING mode, .env.test explicitly sets OTEL_EXPORTER_OTLP_ENDPOINT= (empty), ensuring the check at line 51 returns NoOpMeterProvider as intended

Metrics correctly achieve NoOp behavior in tests through the empty endpoint configuration

Potential edge case: If ENABLE_TRACING=False but OTEL_EXPORTER_OTLP_ENDPOINT is configured, the meter would be created as a real meter instead of NoOp. Consider adding explicit ENABLE_TRACING check in _create_meter() if this scenario needs to be explicitly handled.

backend/app/core/providers.py (9)

146-148: LGTM: Clean SecurityService provisioning.

The SecurityService provider correctly instantiates the service with Settings, enabling centralized security configuration. The APP scope ensures a singleton instance.

247-293: LGTM: Consistent Settings injection across all metrics.

All metrics providers now consistently receive and pass Settings to their respective metric classes. This excellent refactoring enables:

Configuration-driven metrics initialization

Test isolation with unique settings per test session

Removal of global state dependencies

342-343: LGTM: Proper SecurityService injection into repository.

AdminUserRepository now receives SecurityService via dependency injection, eliminating internal instantiation and improving testability.

415-418: LGTM: AuthService properly wired with SecurityService.

The AuthService now receives SecurityService as a constructor parameter, maintaining the DI pattern consistently across authentication components.

431-443: LGTM: KafkaEventService updated with Settings parameter.

The KafkaEventService provider now correctly passes Settings to the service, enabling configuration-driven Kafka operations and metrics initialization.

636-652: LGTM: AdminUserService properly wired with SecurityService.

AdminUserService now receives SecurityService via dependency injection, completing the security service propagation to admin-related services.

721-735: LGTM: EventReplayProvider consistent with Settings pattern.

EventReplayService in the EventReplayProvider follows the same Settings injection pattern as other services, maintaining consistency across the codebase.

159-164: UnifiedProducer correctly accepts the settings parameter.

The settings=settings keyword argument passed to UnifiedProducer is properly supported by its constructor, which includes settings: Settings in its signature. The settings object is used within the class (e.g., self._topic_prefix = settings.KAFKA_TOPIC_PREFIX), confirming the integration is correct.

222-223: EventBusManager signature confirmed. The constructor at backend/app/services/event_bus.py accepts settings: Settings and logger: logging.Logger as parameters, matching the instantiation pattern at lines 222-223.

backend/app/main.py (2)

66-66: LGTM: setup_metrics correctly receives Settings.

The setup_metrics call now properly passes the Settings instance, enabling configuration-driven OpenTelemetry metrics setup.

144-146: LGTM: Factory pattern enables flexible app initialization.

The switch to factory=True with the "app.main:create_app" string is correct. The create_app function properly handles Settings loading by calling get_settings() when settings is None, ensuring configuration is loaded on each factory invocation.

backend/tests/unit/core/metrics/test_database_and_dlq_metrics.py (1)

1-45: LGTM: Tests properly updated with Settings fixtures.

Both test functions correctly accept and use the test_settings fixture to construct DatabaseMetrics and DLQMetrics, ensuring test isolation and consistent configuration.

backend/tests/unit/conftest.py (1)

28-47: LGTM: Well-designed metrics initialization fixture.

The session-scoped autouse fixture properly initializes all metrics with test_settings and ensures cleanup via reset_all. This provides:

Consistent metrics availability across unit tests

Proper test isolation with unique settings

Efficient resource usage via session scope

Clean state between test runs

backend/tests/unit/core/metrics/test_health_and_rate_limit_metrics.py (1)

9-21: LGTM: Test properly updated with Settings fixture.

The test function correctly accepts and uses test_settings to construct HealthMetrics, maintaining consistency with the broader refactoring. The semicolon cleanup on Line 17 is also appreciated.

backend/tests/unit/core/metrics/test_base_metrics.py (2)

10-14: LGTM: DummyMetrics correctly updated for Settings-based DI.

The constructor now properly accepts and forwards the Settings instance to BaseMetrics, aligning with the broader DI configuration pattern.

19-23: LGTM: Test properly uses Settings fixture.

The test correctly receives the test_settings fixture and instantiates DummyMetrics with it, ensuring proper dependency injection throughout the test lifecycle.

backend/tests/unit/core/metrics/test_execution_and_events_metrics.py (2)

12-30: LGTM: Test correctly updated for Settings-based DI.

The test properly receives the test_settings fixture and instantiates ExecutionMetrics with it. The separation of chained calls into individual lines also improves readability.

33-61: LGTM: Test correctly updated for Settings-based DI.

The test properly receives the test_settings fixture and instantiates EventMetrics with it, consistent with the DI pattern applied throughout the codebase.

backend/tests/unit/core/metrics/test_replay_and_security_metrics.py (1)

11-35: LGTM: Test correctly updated for Settings-based DI.

The test properly receives the test_settings fixture and instantiates ReplayMetrics with it, aligning with the broader DI configuration pattern.

backend/tests/unit/core/metrics/test_metrics_classes.py (3)

20-30: LGTM: Test correctly updated for Settings-based DI.

The test properly receives the test_settings fixture and instantiates ConnectionMetrics with it, consistent with the DI pattern.

32-58: LGTM: Test correctly updated for Settings-based DI.

The test properly instantiates EventMetrics with test_settings and exercises a comprehensive set of metric methods.

60-71: LGTM: All metric classes correctly instantiated with Settings.

All metric classes are properly instantiated with test_settings, ensuring consistent DI configuration across the entire metrics suite.

backend/app/core/middlewares/metrics.py

backend/app/core/providers.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/app/settings.py (1)
133-133: Consider using an Enum for stronger type safety.

The "production" default is appropriately fail-safe. However, since the comment lists specific expected values (production, staging, development), consider using a Literal type or an Enum to enforce valid values at the type level.
♻️ Optional: Use Literal for type safety
+from typing import Literal
+
 class Settings(BaseSettings):
     # ... other fields ...
-    ENVIRONMENT: str = "production"  # deployment environment (production, staging, development)
+    ENVIRONMENT: Literal["production", "staging", "development"] = "production"
backend/app/core/middlewares/metrics.py (1)
136-139: Verify insecure=True is appropriate for all environments.

The OTLP exporter is configured with insecure=True, which is suitable for local development but may not be appropriate for production. Ensure that production environments use secure connections (TLS) to the OTLP endpoint.

Consider making the insecure parameter configurable via Settings:
♻️ Suggested improvement

In backend/app/settings.py, add:
OTEL_EXPORTER_INSECURE: bool = True  # Set to False in production
Then update the exporter initialization:
-    otlp_exporter = OTLPMetricExporter(endpoint=settings.OTEL_EXPORTER_OTLP_ENDPOINT, insecure=True)
+    otlp_exporter = OTLPMetricExporter(
+        endpoint=settings.OTEL_EXPORTER_OTLP_ENDPOINT,
+        insecure=settings.OTEL_EXPORTER_INSECURE
+    )

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3cdef8 and 4d78cc1.

📒 Files selected for processing (6)

backend/.env.test
backend/Dockerfile
backend/app/core/middlewares/metrics.py
backend/app/core/providers.py
backend/app/settings.py
backend/tests/unit/conftest.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/tests/unit/conftest.py (1)

backend/app/core/metrics/context.py (2)

initialize_all (123-163)

reset_all (166-185)

🪛 dotenv-linter (4.0.0)

backend/.env.test

[warning] 29-29: [UnorderedKey] The BCRYPT_ROUNDS key should go before the SECURE_COOKIES key

(UnorderedKey)

[warning] 33-33: [UnorderedKey] The ENABLE_TRACING key should go before the RATE_LIMIT_ENABLED key

(UnorderedKey)

[warning] 41-41: [UnorderedKey] The ENVIRONMENT key should go before the LOG_LEVEL key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: Integration Tests
GitHub Check: E2E Tests

🔇 Additional comments (18)

backend/app/settings.py (2)

47-48: LGTM!

The configurable bcrypt rounds with a secure default of 12 is appropriate. The comment clearly explains the tradeoff, and the test environment override to 4 (shown in .env.test) provides the intended speed improvement.

79-79: LGTM!

Empty string default for schema subject prefix is sensible, allowing flexible per-environment configuration for schema isolation.

backend/Dockerfile (1)

30-31: LGTM!

The factory pattern with --factory flag correctly aligns with the Settings-based DI refactor. This allows create_app() to initialize the application with proper dependency injection.

backend/.env.test (3)

29-29: LGTM!

Setting BCRYPT_ROUNDS=4 for tests provides the intended speed improvement while maintaining sufficient security for the test environment.

35-36: LGTM!

The simplified comment is clearer. The empty OTEL_EXPORTER_OTLP_ENDPOINT effectively disables metrics as intended for tests.

41-41: LGTM!

Setting ENVIRONMENT=test appropriately identifies the test environment, aligning with the new Settings field.

backend/app/core/middlewares/metrics.py (2)

121-125: LGTM!

The Settings injection pattern is correct, and the conditional skip when OTEL_EXPORTER_OTLP_ENDPOINT is not configured prevents unnecessary initialization errors in test/development environments.

128-134: LGTM!

Resource attributes now correctly use Settings fields for service name, version, and environment instead of hardcoded values.

backend/tests/unit/conftest.py (1)

28-47: Dependency on test_settings at session scope is properly satisfied.

The test_settings fixture is correctly defined at session scope in the parent backend/tests/conftest.py, making it available to this fixture in the unit test conftest. The metrics initialization correctly depends on this session-scoped fixture, and cleanup is properly handled with MetricsContext.reset_all().

backend/app/core/providers.py (9)

640-656: Verify AdminUserService constructor accepts security_service parameter.

AdminUserService is now injected with SecurityService via dependency injection. Ensure the service constructor has been updated to accept the security_service parameter along with all other dependencies shown in the get_admin_user_service method.

342-343: AdminUserRepository correctly accepts SecurityService via dependency injection.

The constructor signature in backend/app/db/repositories/admin/admin_user_repository.py has been properly updated to accept security_service: SecurityService, and the provider correctly passes it. This removes tight coupling and follows good DI practices.

163-163: No action needed. The UnifiedProducer constructor correctly accepts the settings parameter and the initialization at line 163 is valid.

415-418: AuthService constructor properly accepts SecurityService.

The AuthService constructor signature correctly matches the dependency injection call, with SecurityService properly injected and utilized within the service for authentication operations.

431-443: KafkaEventService constructor signature verified.

The KafkaEventService __init__ method correctly accepts all four parameters (event_repository, kafka_producer, settings, and logger) with the proper type annotations. All parameters are stored as instance variables, and settings is actively used within the service to access configuration values like SERVICE_NAME and SERVICE_VERSION.

222-223: No action needed. Verification shows no other EventBusManager instantiation sites exist in the codebase outside of the DI container. The signature change is properly contained, and the code at lines 222-223 correctly uses the new (settings, logger) signature.

Likely an incorrect or invalid review comment.

146-148: SecurityService is stateless and thread-safe for APP scope.

Verification confirms SecurityService stores only immutable configuration (Settings reference) and a thread-safe CryptContext instance from passlib. All methods (verify_password, get_password_hash, create_access_token, get_current_user) are pure operations without side effects. This makes the service safe for APP-scoped (singleton) usage with concurrent requests.

248-293: Consistent Settings propagation across all metrics.

All metric classes inherit from BaseMetrics, which properly accepts the Settings parameter in its constructor. The provider pattern correctly instantiates each metric with the settings, centralizing configuration across the metrics layer.

627-636: EventReplayService construction is consistent across both providers.

EventReplayService is correctly and uniformly constructed with all required parameters, including settings, in both get_replay_service (lines 627-636) and get_event_replay_service (lines 730-738). The constructor signature matches the parameters being passed in both locations.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @backend/app/main.py:
- Around line 69-85: Wrap the MetricsContext.initialize_all(...) call in a
try/except block to catch any exceptions thrown by the metric constructors
(e.g., ConnectionMetrics, CoordinatorMetrics, DatabaseMetrics, etc.), remove any
redundant OTLP endpoint conditional checks here (since base metric constructors
and setup_metrics already handle otlp_endpoint/NoOpMeterProvider), and in the
except handler log the full exception with logger.exception("Failed to
initialize metrics") and then either re-raise to fail fast or log and continue
with a graceful fallback (e.g., proceed without metrics or re-initialize a NoOp
metrics context) depending on your startup policy.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d78cc1 and 57fd0b1.

📒 Files selected for processing (1)

backend/app/main.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/main.py (1)

backend/app/core/metrics/context.py (1)

initialize_all (123-163)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: cubic · AI code reviewer
GitHub Check: E2E Tests
GitHub Check: Integration Tests

🔇 Additional comments (2)

backend/app/main.py (2)

99-99: LGTM: Settings-driven metrics setup.

The expanded signature setup_metrics(app, settings, logger) aligns with the PR's goal of centralizing Settings-driven dependency injection. This propagates the settings instance through to metrics configuration.

178-179: LGTM: Factory pattern enables proper DI.

The factory pattern ("app.main:create_app" with factory=True) is the correct approach for dependency injection frameworks. Benefits include:

Settings loaded per-worker instead of at module import time

Better testability with injectable settings

Proper lifecycle management for DI containers

This aligns with the PR's broader Settings-driven DI goals.

backend/app/main.py

cubic-dev-ai

1 issue found across 23 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/services/pod_monitor/monitor.py">

<violation number="1" location="backend/app/services/pod_monitor/monitor.py:149">
P1: Blocking synchronous call in async method will block the event loop. The `get_api_resources()` call makes a network request to the Kubernetes API and should be wrapped in `asyncio.to_thread()` to avoid blocking, consistent with how `list_namespaced_pod` is called elsewhere in this file.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/app/services/pod_monitor/monitor.py

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

backend/tests/integration/test_replay_routes.py (3)

307-317: Inconsistent test payload structure.

This test uses a raw dictionary with "filters" (plural) and fields like "name", "description", and "target_topic" that don't appear in the ReplayRequest schema used by other tests. This inconsistency bypasses Pydantic validation and may not align with the actual API schema.

♻️ Refactor to use ReplayRequest and ReplayFilter models

-        replay_request = {
-            "name": f"State Test Session {uuid4().hex[:8]}",
-            "description": "Testing state transitions",
-            "filters": {
-                "event_types": ["state.test.event"],
-                "start_time": (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat(),
-                "end_time": datetime.now(timezone.utc).isoformat()
-            },
-            "target_topic": "state-test-topic",
-            "speed_multiplier": 1.0
-        }
+        replay_request = ReplayRequest(
+            replay_type=ReplayType.QUERY,
+            target=ReplayTarget.KAFKA,
+            filter=ReplayFilter(
+                event_types=[EventType.SYSTEM_ERROR],  # or appropriate event type
+                start_time=datetime.now(timezone.utc) - timedelta(hours=1),
+                end_time=datetime.now(timezone.utc),
+            ),
+            speed_multiplier=1.0,
+        ).model_dump(mode="json")

382-392: Inconsistent test payload structure.

This test also uses a raw dictionary with "filters" (plural) and extra fields not present in the ReplayRequest schema used elsewhere in the file.

♻️ Refactor to use ReplayRequest and ReplayFilter models

-        replay_request = {
-            "name": f"Progress Test Session {uuid4().hex[:8]}",
-            "description": "Testing progress tracking",
-            "filters": {
-                "event_types": ["progress.test.event"],
-                "start_time": (datetime.now(timezone.utc) - timedelta(minutes=30)).isoformat(),
-                "end_time": datetime.now(timezone.utc).isoformat()
-            },
-            "target_topic": "progress-test-topic",
-            "speed_multiplier": 10.0  # Fast replay
-        }
+        replay_request = ReplayRequest(
+            replay_type=ReplayType.QUERY,
+            target=ReplayTarget.KAFKA,
+            filter=ReplayFilter(
+                event_types=[EventType.SYSTEM_ERROR],  # or appropriate event type
+                start_time=datetime.now(timezone.utc) - timedelta(minutes=30),
+                end_time=datetime.now(timezone.utc),
+            ),
+            speed_multiplier=10.0,
+        ).model_dump(mode="json")

343-365: Fix test payload to use ReplayRequest and ReplayFilter models.

The test uses a raw dictionary that doesn't match the ReplayRequest schema:

Uses "filters" (plural) instead of "filter" (singular)
Uses "preserve_timing" instead of "preserve_timestamps"
Uses "target_topic" (string) instead of "target_topics" (Dict[EventType, str])
Includes unsupported fields "name" and "description"
Missing required fields replay_type and target

The filter fields (aggregate_id, correlation_id, user_id, service_name) and batch_size are all valid and supported.

♻️ Refactor to use ReplayRequest and ReplayFilter models

-        replay_request = {
-            "name": f"Complex Filter Session {uuid4().hex[:8]}",
-            "description": "Testing complex event filters",
-            "filters": {
-                "event_types": [
-                    "execution.requested",
-                    "execution.started",
-                    "execution.completed",
-                    "execution.failed"
-                ],
-                "start_time": (datetime.now(timezone.utc) - timedelta(days=30)).isoformat(),
-                "end_time": datetime.now(timezone.utc).isoformat(),
-                "aggregate_id": str(uuid4()),
-                "correlation_id": str(uuid4()),
-                "user_id": test_admin.get("user_id"),
-                "service_name": "execution-service"
-            },
-            "target_topic": "complex-filter-topic",
-            "speed_multiplier": 0.1,  # Slow replay
-            "preserve_timing": False,
-            "batch_size": 100
-        }
+        replay_request = ReplayRequest(
+            replay_type=ReplayType.QUERY,
+            target=ReplayTarget.KAFKA,
+            filter=ReplayFilter(
+                event_types=[
+                    EventType.EXECUTION_REQUESTED,
+                    EventType.EXECUTION_STARTED,
+                    EventType.EXECUTION_COMPLETED,
+                    EventType.EXECUTION_FAILED,
+                ],
+                start_time=datetime.now(timezone.utc) - timedelta(days=30),
+                end_time=datetime.now(timezone.utc),
+                aggregate_id=str(uuid4()),
+                correlation_id=str(uuid4()),
+                user_id=test_admin.get("user_id"),
+                service_name="execution-service",
+            ),
+            speed_multiplier=0.1,
+            preserve_timestamps=False,
+            batch_size=100,
+            target_topics={EventType.EXECUTION_COMPLETED: "complex-filter-topic"},
+        ).model_dump(mode="json")

backend/tests/integration/services/sse/test_partitioned_event_router.py (1)

58-65: Avoid mutating session-scoped test_settings (leaks across tests). Line 59 sets test_settings.SSE_CONSUMER_POOL_SIZE = 1, but test_settings is scope="session", so this can make unrelated tests order-dependent/flaky.

Proposed fix (copy settings for this test)

 async def test_router_start_and_stop(redis_client, test_settings: Settings) -> None:
-    test_settings.SSE_CONSUMER_POOL_SIZE = 1
+    settings = test_settings.model_copy(update={"SSE_CONSUMER_POOL_SIZE": 1})
     suffix = uuid4().hex[:6]
     router = SSEKafkaRedisBridge(
-        schema_registry=SchemaRegistryManager(settings=test_settings, logger=_test_logger),
-        settings=test_settings,
-        event_metrics=EventMetrics(test_settings),
+        schema_registry=SchemaRegistryManager(settings=settings, logger=_test_logger),
+        settings=settings,
+        event_metrics=EventMetrics(settings),
         sse_bus=SSERedisBus(
             redis_client,
             exec_prefix=f"sse:exec:{suffix}:",
             notif_prefix=f"sse:notif:{suffix}:",
             logger=_test_logger,
         ),
         logger=_test_logger,
     )

backend/tests/conftest.py (1)

114-120: Don’t silently accept missing CSRF token. Line 119 returning "" will turn failures into confusing 403/CSRF errors later.

Proposed fix

 async def _http_login(client: httpx.AsyncClient, username: str, password: str) -> str:
@@
     resp.raise_for_status()
     json_data: dict[str, str] = resp.json()
-    return json_data.get("csrf_token", "")
+    csrf = json_data.get("csrf_token")
+    assert csrf, "Login response missing csrf_token"
+    return csrf

backend/tests/integration/test_events_routes.py (1)

35-67: Fix type annotation: test_user should be typed as UserCredentials, not Dict[str, str].

The fixture returns a UserCredentials TypedDict with fields {username, email, password, role, csrf_token, headers}. Using the exact type eliminates type checking noise and improves IDE support. Note: CSRF validation is not currently wired in the application (the validate_csrf_from_request method exists but is never called in route handlers or middleware), so the missing headers on unsafe requests do not cause test failures today. However, the type annotation should still be corrected for clarity.

🤖 Fix all issues with AI agents

In @backend/tests/conftest.py:
- Around line 34-51: The session-scoped fixture test_settings is mutable and can
leak changes between tests; add a new function-scoped fixture (e.g., settings)
that takes test_settings and returns test_settings.model_copy() so each test
gets an independent Settings instance for per-test overrides, leaving
test_settings unchanged and session-scoped; update tests to use settings where
tests modify attributes.
- Around line 107-111: The redis_client fixture yields an async redis client but
lacks teardown, causing connection leaks; update the redis_client
pytest_asyncio.fixture to explicitly close the client after yield by awaiting
client.aclose() in a finally/teardown section so the AsyncGenerator returns the
client and then performs await client.aclose() to clean up the redis.asyncio
connection.

In @backend/tests/integration/test_dlq_routes.py:
- Around line 194-196: The test_set_retry_policy function is missing a type for
the test_settings parameter; import the Settings class (e.g., from app.settings
import Settings) at the top of the test file and update the
test_set_retry_policy signature to annotate test_settings: Settings so the
fixture type matches the conftest.py definition and type checkers recognize it.

In @backend/tests/unit/core/test_security.py:
- Around line 225-230: The test's claim dict is typed too narrowly as data:
dict[str, str | datetime], causing mypy friction when passing to jwt.encode;
change the annotation for data (and to_encode if annotated) to dict[str, Any]
(or use typing.Any) so the claims dict accepts mixed types expected by
jwt.encode and removes the strict-mypy complaint while keeping the same runtime
behavior.

🧹 Nitpick comments (6)

backend/tests/unit/core/test_adaptive_sampling.py (2)
70-71: Move imports to the top of the file.

Imports placed inside functions reduce readability and are non-standard. Move these imports to the module level at the top of the file.
♻️ Proposed fix

Move the imports to the top of the file with the other imports:
 import time
 from unittest.mock import patch
+from unittest.mock import MagicMock

 import pytest

 from app.core.adaptive_sampling import AdaptiveSampler, create_adaptive_sampler
+from app.core.config import Settings
And remove them from inside the function:
         s._running = False

-        from unittest.mock import MagicMock
-        from app.core.config import Settings
-
         mock_settings = MagicMock(spec=Settings)
         mock_settings.TRACING_SAMPLING_RATE = 0.2
73-79: Remove dead code and update outdated comment.

The environment variable setup at line 76 appears to be unused since create_adaptive_sampler now receives settings directly via the parameter. The comment at line 77 is also outdated—it states the function "pulls settings via get_settings," but the implementation now requires an explicit Settings parameter.
♻️ Proposed fix
         mock_settings = MagicMock(spec=Settings)
         mock_settings.TRACING_SAMPLING_RATE = 0.2

-        monkeypatch.setenv("TRACING_SAMPLING_RATE", "0.2")
-        # create_adaptive_sampler pulls settings via get_settings; just ensure it constructs
+        # Verify create_adaptive_sampler constructs correctly with settings
         sampler = create_adaptive_sampler(mock_settings)
         sampler._running = False
backend/tests/helpers/k8s_fakes.py (1)
202-209: Signature inconsistency with real API and other test fakes.

list_namespaced_pod uses positional parameters (namespace, label_selector), but the real CoreV1Api.list_namespaced_pod accepts namespace positionally with other parameters as keyword arguments. This differs from TrackingV1.list_namespaced_pod in test_monitor.py (line 673) which uses **kwargs.

Consider aligning the signature for consistency:
Suggested alignment
-    def list_namespaced_pod(self, namespace: str, label_selector: str) -> Any:  # noqa: ARG002
+    def list_namespaced_pod(self, namespace: str, **kwargs: Any) -> Any:  # noqa: ARG002
         """Return configured pods for reconciliation tests."""

         class PodList:
             def __init__(self, items: list[Pod]) -> None:
                 self.items = items

         return PodList(list(self._pods))
backend/tests/unit/services/pod_monitor/test_monitor.py (1)
677-680: Return type inconsistency in TrackingWatch.stream.

TrackingWatch.stream returns an empty list while FakeWatch.stream returns FakeWatchStream. This works here because the test only checks watch_kwargs, but it breaks the interface contract and could cause confusion in future test modifications.

Consider returning FakeWatchStream([]) for consistency:
Suggested fix
 class TrackingWatch(FakeWatch):
-    def stream(self, func: Any, **kwargs: Any) -> list[dict[str, Any]]:
+    def stream(self, func: Any, **kwargs: Any) -> FakeWatchStream:
         watch_kwargs.append(kwargs)
-        return []
+        return FakeWatchStream([], "rv1")
Note: You'll need to import FakeWatchStream from tests.helpers.k8s_fakes.
backend/app/services/pod_monitor/monitor.py (1)
148-150: Blocking Kubernetes API call in async method.

self._v1.get_api_resources() is a synchronous network call that could block the event loop during startup. Other Kubernetes API calls in this file (e.g., list_namespaced_pod at line 387-389) correctly use asyncio.to_thread.

Consider wrapping for consistency:
Suggested fix
         # Verify K8s connectivity (all clients already injected via __init__)
-        self._v1.get_api_resources()
+        await asyncio.to_thread(self._v1.get_api_resources)
         self.logger.info("Successfully connected to Kubernetes API")
backend/tests/unit/core/test_logging_and_correlation.py (1)
41-42: Consider direct return if MyPy allows.

The intermediate typed variables (result and fallback_result) add verbosity. If --strict MyPy permits, you could return directly:
return json.loads(output)
and
return json.loads(s)
However, if this pattern is needed to satisfy type inference, the current approach is acceptable.

Also applies to: 49-50

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57fd0b1 and 8120957.

📒 Files selected for processing (23)

backend/app/core/k8s_clients.py
backend/app/services/pod_monitor/monitor.py
backend/pyproject.toml
backend/tests/conftest.py
backend/tests/e2e/test_resource_cleaner_orphan.py
backend/tests/helpers/eventually.py
backend/tests/helpers/k8s_fakes.py
backend/tests/helpers/kafka.py
backend/tests/integration/services/sse/test_partitioned_event_router.py
backend/tests/integration/test_alertmanager.py
backend/tests/integration/test_dlq_routes.py
backend/tests/integration/test_events_routes.py
backend/tests/integration/test_replay_routes.py
backend/tests/unit/conftest.py
backend/tests/unit/core/metrics/test_metrics_classes.py
backend/tests/unit/core/test_adaptive_sampling.py
backend/tests/unit/core/test_logging_and_correlation.py
backend/tests/unit/core/test_security.py
backend/tests/unit/schemas_pydantic/test_events_schemas.py
backend/tests/unit/schemas_pydantic/test_execution_schemas.py
backend/tests/unit/schemas_pydantic/test_notification_schemas.py
backend/tests/unit/services/pod_monitor/test_event_mapper.py
backend/tests/unit/services/pod_monitor/test_monitor.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/tests/unit/conftest.py

🧰 Additional context used

🧬 Code graph analysis (12)

backend/tests/unit/core/test_logging_and_correlation.py (1)

backend/app/core/logging.py (1)

format (61-98)

backend/tests/unit/core/test_security.py (1)

backend/app/core/security.py (1)

SecurityService (23-107)

backend/tests/conftest.py (2)

backend/app/main.py (1)

create_app (59-160)

backend/app/settings.py (1)

Settings (11-165)

backend/tests/unit/core/test_adaptive_sampling.py (2)

backend/app/settings.py (1)

Settings (11-165)

backend/app/core/adaptive_sampling.py (1)

create_adaptive_sampler (242-251)

backend/tests/unit/core/metrics/test_metrics_classes.py (2)

backend/tests/unit/conftest.py (1)

app (68-69)

backend/app/settings.py (1)

Settings (11-165)

backend/tests/unit/services/pod_monitor/test_monitor.py (3)

backend/app/core/k8s_clients.py (1)

K8sClients (10-17)

backend/app/services/pod_monitor/event_mapper.py (1)

PodEventMapper (57-519)

backend/app/services/pod_monitor/monitor.py (6)

MonitorState (42-48)

PodEvent (72-77)

PodMonitor (91-457)

WatchEventType (34-39)

create_pod_monitor (461-501)

state (140-142)

backend/tests/integration/test_alertmanager.py (2)

backend/tests/conftest.py (1)

client (78-86)

backend/tests/unit/conftest.py (1)

client (63-64)

backend/tests/unit/services/pod_monitor/test_event_mapper.py (3)

backend/app/infrastructure/kafka/events/execution.py (3)

ExecutionCompletedEvent (86-93)

ExecutionFailedEvent (96-105)

ExecutionTimeoutEvent (108-115)

backend/app/services/pod_monitor/event_mapper.py (3)

PodContext (31-38)

PodEventMapper (57-519)

_extract_logs (446-474)

backend/tests/helpers/k8s_fakes.py (10)

ContainerStatus (54-59)

FakeApi (130-135)

Meta (12-25)

Pod (78-94)

Spec (28-31)

State (47-51)

Status (62-75)

Terminated (34-38)

read_namespaced_pod_log (134-135)

read_namespaced_pod_log (195-196)

backend/tests/integration/services/sse/test_partitioned_event_router.py (2)

backend/app/core/metrics/events.py (1)

EventMetrics (4-209)

backend/tests/conftest.py (1)

test_settings (35-51)

backend/tests/integration/test_replay_routes.py (2)

backend/tests/conftest.py (1)

app (56-67)

backend/tests/unit/conftest.py (1)

app (68-69)

backend/tests/integration/test_dlq_routes.py (1)

backend/tests/conftest.py (3)

client (78-86)

test_user (123-146)

test_settings (35-51)

backend/tests/helpers/k8s_fakes.py (3)

backend/tests/unit/services/pod_monitor/test_event_mapper.py (3)

read_namespaced_pod_log (169-170)

read_namespaced_pod_log (173-174)

read_namespaced_pod_log (177-178)

backend/tests/unit/services/idempotency/test_middleware.py (1)

event (35-39)

backend/tests/unit/services/pod_monitor/test_monitor.py (4)

stream (678-680)

Pod (391-393)

list_namespaced_pod (244-245)

list_namespaced_pod (673-675)

🪛 GitHub Actions: MyPy Type Checking

backend/pyproject.toml