-
Notifications
You must be signed in to change notification settings - Fork 0
2 events system (domain, avro) -> single one #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughDomain events were centralized into an Avro-backed DomainEvent union; DLQ models/repositories converted to Pydantic/Beanie with nested Changes
Sequence Diagram(s)sequenceDiagram
participant Prod as UnifiedProducer
participant Schema as SchemaRegistryManager
participant Kafka as Kafka Broker
participant DLQMgr as DLQManager
participant Mongo as MongoDB
participant DLQEvents as DLQ Events Consumer
Prod->>Schema: serialize_event(DomainEvent)
Schema-->>Prod: bytes
Prod->>Kafka: publish to EVENT_TYPE_TO_TOPIC[event.event_type]
alt publish fails or send_to_dlq invoked
Prod->>Kafka: publish DLQ payload to DLQ topic
end
Kafka->>DLQMgr: deliver DLQ topic message
DLQMgr->>Mongo: persist DLQMessage (message.model_dump())
DLQMgr->>Kafka: emit DLQ_MESSAGE_RECEIVED on DLQ_EVENTS
Kafka->>DLQEvents: deliver DLQ_MESSAGE_RECEIVED
DLQEvents->>DLQMgr: request manual/batch retry or discard
alt retry requested
DLQMgr->>Kafka: republish original DomainEvent
DLQMgr->>Kafka: emit DLQ_MESSAGE_RETRIED
else discard requested
DLQMgr->>Mongo: update status = DISCARDED
DLQMgr->>Kafka: emit DLQ_MESSAGE_DISCARDED
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (11)
💤 Files with no reviewable changes (3)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧬 Code graph analysis (1)backend/app/core/providers.py (1)
🪛 dotenv-linter (4.0.0)backend/.env[warning] 27-27: [UnorderedKey] The KAFKA_SESSION_TIMEOUT_MS key should go before the SCHEMA_REGISTRY_URL key (UnorderedKey) [warning] 28-28: [UnorderedKey] The KAFKA_HEARTBEAT_INTERVAL_MS key should go before the KAFKA_SESSION_TIMEOUT_MS key (UnorderedKey) [warning] 29-29: [UnorderedKey] The KAFKA_REQUEST_TIMEOUT_MS key should go before the KAFKA_SESSION_TIMEOUT_MS key (UnorderedKey) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/tests/unit/services/saga/test_execution_saga_steps.py">
<violation number="1" location="backend/tests/unit/services/saga/test_execution_saga_steps.py:6">
P3: Duplicate import source: `app.domain.events.typed` is imported twice. Consider consolidating with the existing import on line 3: `from app.domain.events.typed import BaseEvent, ExecutionRequestedEvent`</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15 issues found across 92 files
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/architecture/model-conversion.md">
<violation number="1" location="docs/architecture/model-conversion.md:28">
P3: Path inconsistency in architecture diagram: `app/domain/events/typed.py` is placed in the Infrastructure Layer box, but `app/domain/` is shown as the Domain Layer. Consider either moving the events to an infrastructure path (e.g., `app/infrastructure/events/typed.py`) to match the layer designation, or clarify in the documentation why Pydantic event models live within the domain directory structure.</violation>
</file>
<file name="backend/tests/integration/dlq/test_dlq_manager.py">
<violation number="1" location="backend/tests/integration/dlq/test_dlq_manager.py:60">
P2: Broad exception catching with debug-level logging may mask test failures. If deserialization consistently fails, the test will timeout after 15 seconds with no actionable error message. Consider logging at warning/error level, or re-raising after a few failures to make issues visible.</violation>
</file>
<file name="backend/tests/integration/idempotency/test_consumer_idempotent.py">
<violation number="1" location="backend/tests/integration/idempotency/test_consumer_idempotent.py:84">
P1: Test no longer verifies idempotency correctly. The Future resolves after the first event, but the second duplicate event may not have been processed yet. The assertion `seen["n"] >= 1` doesn't prove duplicates were blocked—it would pass even if idempotency is broken. Consider waiting for both events to be attempted (e.g., brief delay after Future resolves) and asserting `seen["n"] == 1` to verify exactly one event was processed.</violation>
</file>
<file name="backend/app/dlq/models.py">
<violation number="1" location="backend/app/dlq/models.py:52">
P0: Breaking change: `age_seconds` property was removed but is still used by `dlq_processor.py` for filtering old messages and by `DLQMessageResponse` schema. This will cause `AttributeError` at runtime.</violation>
</file>
<file name="backend/app/events/core/producer.py">
<violation number="1" location="backend/app/events/core/producer.py:121">
P2: Using direct dictionary access `EVENT_TYPE_TO_TOPIC[...]` may raise `KeyError` if an event type is not in the mapping. Consider using the safer `get_topic_for_event()` function from the same module, which provides a default fallback to `KafkaTopic.SYSTEM_EVENTS`.</violation>
</file>
<file name="backend/app/services/notification_service.py">
<violation number="1" location="backend/app/services/notification_service.py:330">
P2: Changing from `asyncio.create_task()` to `await` blocks the caller until notification delivery completes. Since `_deliver_notification` can make external HTTP calls (webhooks/Slack with 30s timeouts), this could significantly increase latency for callers of `create_notification`. If this blocking behavior is intentional, consider whether the performance impact on API response times is acceptable. Otherwise, restore the background task pattern.</violation>
</file>
<file name="docs/components/schema-manager.md">
<violation number="1" location="docs/components/schema-manager.md:21">
P3: Inconsistent path format: should be `app/domain/events/typed.py` to match the convention used elsewhere in this document (e.g., `app/db/schema/schema_manager.py`).</violation>
</file>
<file name="backend/app/dlq/manager.py">
<violation number="1" location="backend/app/dlq/manager.py:75">
P1: Missing null safety in header decoding. If any Kafka header value is `None`, this will crash with `AttributeError`. The old code handled this with `v.decode("utf-8") if isinstance(v, (bytes, bytearray)) else (v or "")`.</violation>
<violation number="2" location="backend/app/dlq/manager.py:137">
P2: Missing error backoff sleep after exception. If persistent errors occur (e.g., database unavailable), the loop will spin rapidly without pause, causing excessive logging and CPU usage. The old code had `await asyncio.sleep(5)` after errors.</violation>
</file>
<file name="backend/app/services/sse/sse_shutdown_manager.py">
<violation number="1" location="backend/app/services/sse/sse_shutdown_manager.py:68">
P1: The `initiated_event` is created but never set. External code awaiting this event will hang indefinitely. Add `self.initiated_event.set()` in `initiate_shutdown()` after setting `_shutdown_initiated = True`.</violation>
<violation number="2" location="backend/app/services/sse/sse_shutdown_manager.py:69">
P1: The `notifying_event` is created but never set. External code awaiting this event will hang indefinitely. Add `self.notifying_event.set()` in `_notify_connections()` after setting the phase to NOTIFYING.</violation>
</file>
<file name="backend/app/events/core/dispatcher.py">
<violation number="1" location="backend/app/events/core/dispatcher.py:29">
P2: Dead code: `_topic_event_types` is initialized but never populated or used. The `_build_topic_mapping` method that populated this attribute was removed, but the attribute declaration was left behind. Consider removing this unused attribute.</violation>
</file>
<file name="backend/app/services/kafka_event_service.py">
<violation number="1" location="backend/app/services/kafka_event_service.py:201">
P2: Logging was reduced to only `event_id`, removing `event_type` and `aggregate_id` that were previously logged. This makes debugging and tracing events harder. Consider adding these fields back for consistency with `publish_event` and better observability.</violation>
</file>
<file name="backend/app/services/coordinator/coordinator.py">
<violation number="1" location="backend/app/services/coordinator/coordinator.py:246">
P1: Off-by-one error: queue positions are 1-indexed, so the front of queue is `position == 1`, not `position == 0`. The current check will never trigger for items at the front of the queue - it only triggers when position was `None` (error case) and defaulted to 0.</violation>
</file>
<file name="backend/tests/unit/services/idempotency/test_middleware.py">
<violation number="1" location="backend/tests/unit/services/idempotency/test_middleware.py:31">
P2: `MagicMock(spec=DomainEvent)` is problematic because `DomainEvent` is an `Annotated` union type alias, not a class. The `spec` parameter is designed for concrete classes. Consider using `spec=BaseEvent` instead, which is the base class that defines `event_type` and `event_id`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
backend/app/services/notification_service.py (1)
631-633: Fix null-pointer dereference onresource_usage.The pipeline failure indicates
resource_usagecan beNone(typed asResourceUsageDomain | None), but line 632 accesses.execution_time_wall_secondswithout a null check. This will raiseAttributeErrorat runtime.🐛 Proposed fix
title = f"Execution Completed: {event.execution_id}" - body = ( - f"Your execution completed successfully. Duration: {event.resource_usage.execution_time_wall_seconds:.2f}s." - ) + if event.resource_usage: + body = f"Your execution completed successfully. Duration: {event.resource_usage.execution_time_wall_seconds:.2f}s." + else: + body = "Your execution completed successfully."backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (1)
30-43: Pipeline failure: MyPy reports BaseEvent has type "Any".The metadata field update to
EventMetadatais correct. However, MyPy reports thatBaseEvent(imported fromapp.infrastructure.kafka.eventson line 17) has typeAny, causing the "Class cannot subclass" error.This suggests
BaseEventmay lack proper type annotations or exports. Consider:
- Ensuring
BaseEventis properly typed inapp.infrastructure.kafka.events- Or updating
_FakeEventto extend fromDomainEvent(fromapp.domain.events.typed) to align with the migration#!/bin/bash # Check BaseEvent definition and exports in infrastructure module ast-grep --pattern 'class BaseEvent($_) { $$$ }' # Also check if there's a __all__ that might be missing BaseEvent rg -n "BaseEvent" --type py -C 3 backend/app/infrastructure/kafka/events/backend/tests/unit/services/saga/test_saga_step_and_base.py (1)
75-76: Type error:SagaStep[BaseEvent]generic constraint violation.
SagaStepis generic overDomainEvent, which is a union of concrete event types. UsingBaseEventas the type argument violates this constraint. Replace with a concrete event type or useDomainEventdirectly if the step should handle any domain event.🐛 Proposed fix
def get_steps(self) -> list[SagaStep[BaseEvent]]: + def get_steps(self) -> list[SagaStep[DomainEvent]]: return []- class S(SagaStep[BaseEvent]): - async def execute(self, context: SagaContext, event: BaseEvent) -> bool: + class S(SagaStep[DomainEvent]): + async def execute(self, context: SagaContext, event: DomainEvent) -> bool: return TrueAlso add the import at line 6:
-from app.domain.events.typed import BaseEvent, EventMetadata +from app.domain.events.typed import BaseEvent, DomainEvent, EventMetadataAlso applies to: 82-84
backend/tests/integration/events/test_schema_registry_real.py (1)
17-24:PodCreatedEventhas notopicattribute.The pipeline fails because
PodCreatedEventdoesn't expose atopicattribute. Based on the PR's migration to domain events, topic resolution is now handled via centralized mappings (likely inmappings.py), not as an event attribute.🐛 Proposed fix using topic mapping or constant
+from app.domain.enums.kafka import KafkaTopicThen update the deserialization call:
- obj = await m.deserialize_event(data, topic=str(ev.topic)) + obj = await m.deserialize_event(data, topic=KafkaTopic.POD_EVENTS.value)Alternatively, if there's a topic resolver utility:
+from app.infrastructure.kafka.mappings import get_topic_for_event - obj = await m.deserialize_event(data, topic=str(ev.topic)) + obj = await m.deserialize_event(data, topic=get_topic_for_event(ev))backend/workers/dlq_processor.py (1)
67-71:DLQMessagedoes not have anage_secondsattribute—usecreated_atto compute message age.The filter function accesses
message.age_seconds, which does not exist on theDLQMessageclass. The class has acreated_atdatetime field that should be used instead.Fix
+from datetime import datetime, timezone def filter_old_messages(message: DLQMessage) -> bool: max_age_days = 7 - return message.age_seconds < (max_age_days * 24 * 3600) + max_age_seconds = max_age_days * 24 * 3600 + age = (datetime.now(timezone.utc) - message.created_at).total_seconds() + return age < max_age_secondsbackend/tests/integration/services/sse/test_redis_bus.py (1)
21-28: Change BaseEvent import to correct module path.Line 12 imports
BaseEventfrom a non-existent moduleapp.infrastructure.kafka.events.BaseEventis defined inapp.domain.events.typedand exported fromapp.domain.events. Change the import to:from app.domain.events import BaseEventThis non-existent module causes MyPy to treat
BaseEventas typeAny, which triggers the subclass error for_DummyEvent.backend/tests/unit/services/sse/test_kafka_redis_bridge.py (1)
26-29: Type mismatch:_FakeBus.publish_eventsignature differs fromSSERedisBus.The real
SSERedisBus.publish_eventnow acceptsDomainEvent(line 65 inredis_bus.py), but this fake usesBaseEvent. This inconsistency could mask integration issues and cause type errors.🔧 Proposed fix
+from app.domain.events.typed import DomainEvent + class _FakeBus(SSERedisBus): """Fake SSERedisBus for testing.""" def __init__(self) -> None: - self.published: list[tuple[str, BaseEvent]] = [] + self.published: list[tuple[str, DomainEvent]] = [] - async def publish_event(self, execution_id: str, event: BaseEvent) -> None: + async def publish_event(self, execution_id: str, event: DomainEvent) -> None: self.published.append((execution_id, event))backend/app/services/result_processor/processor.py (2)
177-183: Fix null dereference onresource_usage- pipeline failure.
ExecutionCompletedEvent.resource_usageis typed asResourceUsageDomain | None. Accessing.execution_time_wall_secondsand.peak_memory_kbwithout a null check will raiseAttributeErrorwhenresource_usageisNone.Proposed fix
- runtime_seconds = event.resource_usage.execution_time_wall_seconds - self._metrics.record_execution_duration(runtime_seconds, lang_and_version) - - # Record memory utilization - memory_mib = event.resource_usage.peak_memory_kb / 1024 + if event.resource_usage is not None: + runtime_seconds = event.resource_usage.execution_time_wall_seconds + self._metrics.record_execution_duration(runtime_seconds, lang_and_version) + + # Record memory utilization + memory_mib = event.resource_usage.peak_memory_kb / 1024 + else: + self.logger.warning(f"No resource_usage for completed execution {event.execution_id}") + runtime_seconds = 0.0 + memory_mib = 0.0Also guard the memory utilization percentage calculation (lines 186-191) since it depends on
memory_mib.
218-218: Fix type mismatch inrecord_errorcall - pipeline failure.
event.error_typeisExecutionErrorType | None, butrecord_errorexpects astr. Additionally, iferror_typeisNone, this will fail.Proposed fix
- self._metrics.record_error(event.error_type) + if event.error_type is not None: + self._metrics.record_error(event.error_type.value if hasattr(event.error_type, 'value') else str(event.error_type))Or if
record_errorshould accept the enum directly, update its signature instead.backend/tests/integration/dlq/test_dlq_manager.py (1)
63-70: Fix:DomainEventusesmodel_dump(), notto_dict().The pipeline failure indicates
ExecutionRequestedEventhas no attributeto_dict. Pydantic v2 models usemodel_dump()for serialization.🐛 Proposed fix
payload = { - "event": ev.to_dict(), + "event": ev.model_dump(mode="json"), "original_topic": f"{prefix}{str(KafkaTopic.EXECUTION_EVENTS)}", "error": "handler failed", "retry_count": 0, "failed_at": datetime.now(timezone.utc).isoformat(), "producer_id": "tests", }backend/app/events/core/producer.py (1)
119-148: PotentialUnboundLocalErrorifevent_typenot in topic mapping.If
EVENT_TYPE_TO_TOPIC[event_to_produce.event_type]raises aKeyErrorat line 121, thetopicvariable will be undefined when theexcept KafkaErrorblock executes at line 146, causing anUnboundLocalError.🐛 Proposed fix: define topic before the try block or handle KeyError separately
async def produce( self, event_to_produce: DomainEvent, key: str | None = None, headers: dict[str, str] | None = None ) -> None: """Produce a message to Kafka.""" if not self._producer: self.logger.error("Producer not running") return + topic = f"{self._topic_prefix}{EVENT_TYPE_TO_TOPIC[event_to_produce.event_type]}" try: serialized_value = await self._schema_registry.serialize_event(event_to_produce) - topic = f"{self._topic_prefix}{EVENT_TYPE_TO_TOPIC[event_to_produce.event_type]}" # Convert headers to list of tuples format header_list = [(k, v.encode()) for k, v in headers.items()] if headers else None
🤖 Fix all issues with AI agents
In @backend/app/services/sse/sse_shutdown_manager.py:
- Around line 67-70: The new asyncio.Events initiated_event and notifying_event
are never set; in initiate_shutdown() call self.initiated_event.set()
immediately after transitioning state to the initiated/shutdown-started phase
(so external coordinators see shutdown begin), and in _notify_connections() call
self.notifying_event.set() right before or as you begin the notifying/broadcast
phase so external observers know notifying has started; ensure these calls are
safe to call idempotently (no blocking) and placed before any awaits that yield
control.
In @backend/tests/integration/services/sse/test_partitioned_event_router.py:
- Around line 49-51: The test is flaky because using
asyncio.wait_for(subscription.get(RedisSSEMessage), timeout=2.0) only protects
against hangs but subscription.get has an internal 0.5s timeout and can return
None; replace the single-call approach with a retry/poll loop (or reuse the
existing eventual() helper) that repeatedly calls
subscription.get(RedisSSEMessage) until a non-None message is returned or a
total timeout elapses, and then assert msg is not None; target the test code
around subscription.get, RedisSSEMessage, and asyncio.wait_for to implement the
retry semantics.
In @backend/tests/unit/services/saga/test_execution_saga_steps.py:
- Around line 3-6: The _FakeProducer test double currently uses BaseEvent types
which mismatches UnifiedProducer.produce's contract; update the imports to
include DomainEvent from app.domain.events.typed and change _FakeProducer so its
events list is typed as list[DomainEvent] and its async produce method
signature/parameter uses DomainEvent (not BaseEvent), matching
UnifiedProducer.produce; ensure you remove or stop using BaseEvent in
_FakeProducer and keep behavior of appending the event to self.events.
In @backend/tests/unit/services/saga/test_saga_step_and_base.py:
- Around line 46-50: The test constructs BaseEvent (an abstract base) and passes
it to SagaContext.add_event, but add_event expects a concrete DomainEvent
subtype; replace the BaseEvent instantiation with a concrete event class (e.g.,
a SystemErrorEvent or other concrete subclass that corresponds to
EventType.SYSTEM_ERROR) and pass that to ctx.add_event, or if such a concrete
class does not exist, add a concrete subclass of BaseEvent (and include it in
the DomainEvent union) with appropriate metadata and use that instance in the
test.
🧹 Nitpick comments (9)
backend/tests/integration/services/events/test_event_bus.py (1)
51-53: Consider extending assertions for robustness.The Future-based await with timeout is cleaner than polling. Optionally, consider also asserting on the payload to ensure full event integrity:
assert received.event_type == "test.created" assert received.payload == {"x": 1}This would catch potential serialization/deserialization issues in the Kafka path.
backend/tests/integration/test_replay_routes.py (1)
388-392: Validate start response and align with file's error-handling pattern.Two concerns:
Line 388: The start request response is discarded. If the session fails to start, the test proceeds anyway and may produce misleading failures on line 392.
Line 392: Uses strict
== 200assertion, but other tests in this file (e.g., lines 102, 135, 168) usein [200, 404]with early return. This inconsistency could cause flaky test failures if the session becomes unavailable.Suggested fix to align with file patterns
- await test_admin.post(f"/api/v1/replay/sessions/{session_id}/start") - - # Check progress immediately - session state available right after start - detail_response = await test_admin.get(f"/api/v1/replay/sessions/{session_id}") - assert detail_response.status_code == 200 + start_response = await test_admin.post(f"/api/v1/replay/sessions/{session_id}/start") + assert start_response.status_code in [200, 404] + if start_response.status_code != 200: + return + + # Check progress immediately - session state available right after start + detail_response = await test_admin.get(f"/api/v1/replay/sessions/{session_id}") + assert detail_response.status_code in [200, 404] + if detail_response.status_code != 200: + returnbackend/app/events/schema/schema_registry.py (1)
16-21: Consider adding defensive handling for union extraction.The logic assumes
DomainEventis alwaysAnnotated[Union[...], Discriminator]. If the type structure changes, this could break silently. The current implementation is correct for the existing design, but consider adding validation.💡 Optional: Add defensive check
@lru_cache(maxsize=1) def _get_all_event_classes() -> list[type[DomainEvent]]: """Get all concrete event classes from DomainEvent union.""" union_type = get_args(DomainEvent)[0] # Annotated[Union[...], Discriminator] -> Union + if not get_args(union_type): + raise RuntimeError("DomainEvent union has no concrete types - check type definition") return list(get_args(union_type)) if get_origin(union_type) else [union_type]backend/app/services/k8s_worker/worker.py (1)
215-225: Consider replacingassertwith explicit type guards for production robustness.Using
assert isinstance(...)for type narrowing works but can be silently disabled when Python runs with the-O(optimize) flag. In production, this could lead to unexpectedAttributeErrorexceptions instead of clear type errors.♻️ Suggested defensive pattern
async def _handle_create_pod_command_wrapper(self, event: DomainEvent) -> None: """Wrapper for handling CreatePodCommandEvent with type safety.""" - assert isinstance(event, CreatePodCommandEvent) + if not isinstance(event, CreatePodCommandEvent): + self.logger.error(f"Expected CreatePodCommandEvent, got {type(event).__name__}") + return self.logger.info(f"Processing create_pod_command for execution {event.execution_id} from saga {event.saga_id}") await self._handle_create_pod_command(event) async def _handle_delete_pod_command_wrapper(self, event: DomainEvent) -> None: """Wrapper for handling DeletePodCommandEvent.""" - assert isinstance(event, DeletePodCommandEvent) + if not isinstance(event, DeletePodCommandEvent): + self.logger.error(f"Expected DeletePodCommandEvent, got {type(event).__name__}") + return self.logger.info(f"Processing delete_pod_command for execution {event.execution_id} from saga {event.saga_id}") await self._handle_delete_pod_command(event)backend/tests/integration/idempotency/test_consumer_idempotent.py (1)
82-84: Consider tightening the assertion for idempotency verification.The assertion
seen["n"] >= 1verifies the handler ran at least once, but since two identical events are produced and the test is verifying idempotent behavior, assertingseen["n"] == 1would more precisely confirm that the duplicate was blocked.Suggested fix
- assert seen["n"] >= 1 + assert seen["n"] == 1, f"Expected exactly 1 invocation (duplicate blocked), got {seen['n']}"backend/tests/unit/domain/events/test_event_schema_coverage.py (1)
6-6: Consider updating docstring terminology for consistency.The docstrings still reference "Kafka event class" terminology, but after this PR's migration, all events are unified under
DomainEvent. The distinction between "domain" and "Kafka" event classes is now less meaningful since they share the sameBaseEventhierarchy.This is a minor documentation nit that could be addressed in a follow-up.
Also applies to: 52-54
backend/app/infrastructure/kafka/mappings.py (1)
102-104: Consider caching forget_event_types_for_topic.Unlike
get_event_class_for_typeandget_topic_for_event, this function iterates through the entire mapping on each call. If called frequently, consider adding@lru_cache.♻️ Optional: Add caching
+@lru_cache(maxsize=32) def get_event_types_for_topic(topic: KafkaTopic) -> list[EventType]: """Get all event types that publish to a given topic.""" return [et for et, t in EVENT_TYPE_TO_TOPIC.items() if t == topic]backend/app/services/kafka_event_service.py (1)
67-73: Correlation ID precedence logic is correct but could be simplified.The logic ensures explicit
correlation_idparameter takes precedence over metadata's value. Consider simplifying by always setting correlation_id on metadata creation if provided.♻️ Optional simplification
- event_metadata = metadata or EventMetadata( - service_name=self.settings.SERVICE_NAME, - service_version=self.settings.SERVICE_VERSION, - correlation_id=correlation_id or str(uuid4()), - ) - if correlation_id and event_metadata.correlation_id != correlation_id: - event_metadata = event_metadata.model_copy(update={"correlation_id": correlation_id}) + effective_correlation_id = correlation_id or (metadata.correlation_id if metadata else None) or str(uuid4()) + event_metadata = metadata.model_copy(update={"correlation_id": effective_correlation_id}) if metadata else EventMetadata( + service_name=self.settings.SERVICE_NAME, + service_version=self.settings.SERVICE_VERSION, + correlation_id=effective_correlation_id, + )backend/app/services/idempotency/middleware.py (1)
206-250: Remove unusedasync_handlerfunction.The
async_handlerfunction is defined but never used. The code path either registersdispatch_handlerwith the dispatcher (line 258) or logs an error (line 261) - neither path usesasync_handler. This appears to be dead code left over from a previous implementation.♻️ Proposed fix
- # Create an async handler that processes the message - async def async_handler(message: Any) -> Any: - self.logger.info(f"IDEMPOTENT HANDLER CALLED for {event_type}") - - # Extract event from confluent-kafka Message - if not hasattr(message, "value"): - self.logger.error(f"Received non-Message object for {event_type}: {type(message)}") - return None - - # Debug log to check message details - self.logger.info( - f"Handler for {event_type} - Message type: {type(message)}, " - f"has key: {hasattr(message, 'key')}, " - f"has topic: {hasattr(message, 'topic')}" - ) - - raw_value = message.value - - # Debug the raw value - self.logger.info(f"Raw value extracted: {raw_value[:100] if raw_value else 'None or empty'}") - - # Handle tombstone messages (null value for log compaction) - if raw_value is None: - self.logger.warning(f"Received empty message for {event_type} - tombstone or consumed value") - return None - - # Handle empty messages - if not raw_value: - self.logger.warning(f"Received empty message for {event_type} - empty bytes") - return None - - try: - # Deserialize using schema registry if available - event = await self.consumer._schema_registry.deserialize_event(raw_value, message.topic) - if not event: - self.logger.error(f"Failed to deserialize event for {event_type}") - return None - - # Call the idempotent wrapper directly in async context - await idempotent_wrapper(event) - - self.logger.debug(f"Successfully processed {event_type} event: {event.event_id}") - return None - except Exception as e: - self.logger.error(f"Failed to process message for {event_type}: {e}", exc_info=True) - raise - # Register with the dispatcher if available
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (92)
backend/app/api/routes/dlq.pybackend/app/api/routes/events.pybackend/app/api/routes/execution.pybackend/app/db/docs/dlq.pybackend/app/dlq/manager.pybackend/app/dlq/models.pybackend/app/domain/enums/events.pybackend/app/domain/enums/kafka.pybackend/app/domain/events/typed.pybackend/app/events/core/consumer.pybackend/app/events/core/dispatcher.pybackend/app/events/core/dlq_handler.pybackend/app/events/core/producer.pybackend/app/events/event_store.pybackend/app/events/event_store_consumer.pybackend/app/events/schema/schema_registry.pybackend/app/infrastructure/kafka/__init__.pybackend/app/infrastructure/kafka/events/__init__.pybackend/app/infrastructure/kafka/events/base.pybackend/app/infrastructure/kafka/events/execution.pybackend/app/infrastructure/kafka/events/metadata.pybackend/app/infrastructure/kafka/events/notification.pybackend/app/infrastructure/kafka/events/pod.pybackend/app/infrastructure/kafka/events/saga.pybackend/app/infrastructure/kafka/events/system.pybackend/app/infrastructure/kafka/events/user.pybackend/app/infrastructure/kafka/mappings.pybackend/app/services/coordinator/coordinator.pybackend/app/services/event_replay/replay_service.pybackend/app/services/execution_service.pybackend/app/services/idempotency/idempotency_manager.pybackend/app/services/idempotency/middleware.pybackend/app/services/k8s_worker/pod_builder.pybackend/app/services/k8s_worker/worker.pybackend/app/services/kafka_event_service.pybackend/app/services/notification_service.pybackend/app/services/pod_monitor/event_mapper.pybackend/app/services/pod_monitor/monitor.pybackend/app/services/result_processor/processor.pybackend/app/services/saga/execution_saga.pybackend/app/services/saga/saga_orchestrator.pybackend/app/services/saga/saga_step.pybackend/app/services/sse/kafka_redis_bridge.pybackend/app/services/sse/redis_bus.pybackend/app/services/sse/sse_shutdown_manager.pybackend/tests/e2e/test_k8s_worker_create_pod.pybackend/tests/e2e/test_resource_cleaner_orphan.pybackend/tests/helpers/events.pybackend/tests/helpers/eventually.pybackend/tests/helpers/kafka.pybackend/tests/integration/dlq/test_dlq_manager.pybackend/tests/integration/events/test_consume_roundtrip.pybackend/tests/integration/events/test_dlq_handler.pybackend/tests/integration/events/test_event_dispatcher.pybackend/tests/integration/events/test_event_store.pybackend/tests/integration/events/test_schema_registry_real.pybackend/tests/integration/idempotency/test_consumer_idempotent.pybackend/tests/integration/idempotency/test_decorator_idempotent.pybackend/tests/integration/idempotency/test_idempotency.pybackend/tests/integration/idempotency/test_idempotent_handler.pybackend/tests/integration/notifications/test_notification_sse.pybackend/tests/integration/result_processor/test_result_processor.pybackend/tests/integration/services/coordinator/test_execution_coordinator.pybackend/tests/integration/services/events/test_event_bus.pybackend/tests/integration/services/sse/test_partitioned_event_router.pybackend/tests/integration/services/sse/test_redis_bus.pybackend/tests/integration/test_notifications_routes.pybackend/tests/integration/test_replay_routes.pybackend/tests/integration/test_sse_routes.pybackend/tests/integration/test_user_settings_routes.pybackend/tests/unit/domain/events/test_event_schema_coverage.pybackend/tests/unit/events/test_event_dispatcher.pybackend/tests/unit/events/test_metadata_model.pybackend/tests/unit/events/test_schema_registry_manager.pybackend/tests/unit/services/coordinator/test_queue_manager.pybackend/tests/unit/services/idempotency/test_idempotency_manager.pybackend/tests/unit/services/idempotency/test_middleware.pybackend/tests/unit/services/pod_monitor/test_event_mapper.pybackend/tests/unit/services/pod_monitor/test_monitor.pybackend/tests/unit/services/saga/test_execution_saga_steps.pybackend/tests/unit/services/saga/test_saga_comprehensive.pybackend/tests/unit/services/saga/test_saga_orchestrator_unit.pybackend/tests/unit/services/saga/test_saga_step_and_base.pybackend/tests/unit/services/sse/test_kafka_redis_bridge.pybackend/tests/unit/services/sse/test_sse_shutdown_manager.pybackend/tests/unit/services/test_pod_builder.pybackend/workers/dlq_processor.pydocs/architecture/event-system-design.mddocs/architecture/kafka-topic-architecture.mddocs/architecture/model-conversion.mddocs/architecture/user-settings-events.mddocs/components/schema-manager.md
💤 Files with no reviewable changes (10)
- backend/tests/helpers/eventually.py
- backend/app/infrastructure/kafka/events/base.py
- backend/app/infrastructure/kafka/events/execution.py
- backend/app/infrastructure/kafka/events/notification.py
- backend/app/infrastructure/kafka/events/metadata.py
- backend/app/infrastructure/kafka/events/system.py
- backend/app/infrastructure/kafka/events/user.py
- backend/app/infrastructure/kafka/events/saga.py
- backend/app/infrastructure/kafka/events/init.py
- backend/app/infrastructure/kafka/events/pod.py
🧰 Additional context used
🧬 Code graph analysis (56)
backend/app/events/core/dlq_handler.py (1)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
backend/tests/e2e/test_resource_cleaner_orphan.py (1)
backend/app/services/result_processor/resource_cleaner.py (1)
cleanup_orphaned_resources(149-173)
backend/app/services/idempotency/idempotency_manager.py (1)
backend/app/domain/events/typed.py (1)
BaseEvent(30-42)
backend/tests/integration/services/coordinator/test_execution_coordinator.py (2)
backend/app/services/coordinator/coordinator.py (1)
_handle_execution_requested(215-251)backend/tests/unit/services/coordinator/test_queue_manager.py (1)
ev(14-15)
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (1)
backend/app/domain/events/typed.py (1)
EventMetadata(16-27)
backend/tests/unit/events/test_schema_registry_manager.py (1)
backend/app/domain/events/typed.py (1)
ExecutionRequestedEvent(48-62)
backend/app/domain/events/typed.py (1)
backend/app/domain/enums/events.py (1)
EventType(4-87)
backend/app/services/k8s_worker/pod_builder.py (1)
backend/app/domain/events/typed.py (1)
CreatePodCommandEvent(382-397)
backend/app/services/saga/saga_step.py (1)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
backend/app/services/pod_monitor/event_mapper.py (2)
backend/app/domain/events/typed.py (7)
EventMetadata(16-27)ExecutionCompletedEvent(95-101)ExecutionFailedEvent(104-112)ExecutionTimeoutEvent(115-121)PodRunningEvent(149-153)PodScheduledEvent(142-146)PodTerminatedEvent(176-182)backend/app/domain/execution/models.py (1)
ResourceUsageDomain(13-19)
backend/tests/unit/services/sse/test_sse_shutdown_manager.py (2)
backend/app/services/sse/sse_shutdown_manager.py (6)
initiate_shutdown(127-154)unregister_connection(109-125)get_shutdown_status(273-286)SSEShutdownManager(22-307)register_connection(80-107)is_shutting_down(269-271)backend/tests/unit/services/sse/test_sse_service.py (4)
unregister_connection(91-92)get_shutdown_status(97-104)register_connection(87-89)is_shutting_down(94-95)
backend/app/services/pod_monitor/monitor.py (1)
backend/app/services/kafka_event_service.py (1)
publish_domain_event(177-202)
backend/tests/unit/services/coordinator/test_queue_manager.py (1)
backend/app/domain/events/typed.py (1)
ExecutionRequestedEvent(48-62)
backend/tests/integration/services/sse/test_partitioned_event_router.py (3)
backend/app/services/sse/redis_bus.py (1)
get(23-35)backend/tests/unit/services/sse/test_sse_service.py (1)
get(32-41)backend/app/schemas_pydantic/sse.py (1)
RedisSSEMessage(58-63)
backend/tests/integration/events/test_dlq_handler.py (1)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)SagaStartedEvent(326-331)
backend/tests/helpers/kafka.py (1)
backend/app/events/core/producer.py (2)
producer(56-57)UnifiedProducer(23-218)
backend/app/api/routes/execution.py (1)
backend/app/domain/events/typed.py (2)
BaseEvent(30-42)EventMetadata(16-27)
backend/tests/integration/notifications/test_notification_sse.py (3)
backend/app/services/sse/redis_bus.py (1)
get(23-35)backend/tests/unit/services/sse/test_sse_service.py (1)
get(32-41)backend/app/schemas_pydantic/sse.py (1)
RedisNotificationMessage(91-101)
backend/tests/integration/idempotency/test_idempotency.py (2)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/app/services/idempotency/middleware.py (1)
idempotent_handler(93-119)
backend/tests/unit/services/idempotency/test_idempotency_manager.py (1)
backend/app/domain/events/typed.py (1)
BaseEvent(30-42)
backend/tests/unit/events/test_event_dispatcher.py (2)
backend/app/events/core/dispatcher.py (1)
EventDispatcher(15-177)backend/tests/helpers/events.py (1)
make_execution_requested_event(7-50)
backend/app/services/k8s_worker/worker.py (1)
backend/app/domain/events/typed.py (5)
CreatePodCommandEvent(382-397)DeletePodCommandEvent(400-406)ExecutionFailedEvent(104-112)ExecutionStartedEvent(80-85)PodCreatedEvent(135-139)
backend/app/events/event_store.py (3)
backend/tests/unit/services/pod_monitor/test_monitor.py (1)
store_event(53-55)backend/app/db/repositories/event_repository.py (2)
store_event(39-52)get_event(68-72)backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
backend/tests/integration/events/test_event_store.py (3)
backend/tests/conftest.py (1)
app(79-91)backend/tests/unit/conftest.py (1)
app(62-63)backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
backend/app/services/saga/saga_orchestrator.py (2)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)SagaCancelledEvent(351-360)backend/app/services/saga/base_saga.py (1)
BaseSaga(8-52)
backend/tests/integration/test_sse_routes.py (1)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)PodCreatedEvent(135-139)
backend/app/services/notification_service.py (1)
backend/app/domain/events/typed.py (3)
ExecutionCompletedEvent(95-101)ExecutionFailedEvent(104-112)ExecutionTimeoutEvent(115-121)
backend/tests/unit/services/saga/test_execution_saga_steps.py (1)
backend/app/domain/events/typed.py (1)
ExecutionRequestedEvent(48-62)
backend/tests/unit/events/test_metadata_model.py (2)
backend/app/domain/events/typed.py (1)
EventMetadata(16-27)backend/app/services/coordinator/queue_manager.py (1)
user_id(34-35)
backend/app/api/routes/dlq.py (3)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/tests/integration/services/sse/test_redis_bus.py (1)
model_dump(27-28)backend/tests/unit/services/sse/test_kafka_redis_bridge.py (1)
model_dump(41-42)
backend/tests/integration/idempotency/test_consumer_idempotent.py (4)
backend/app/events/core/dispatcher.py (2)
EventDispatcher(15-177)register(36-57)backend/app/domain/enums/events.py (1)
EventType(4-87)backend/tests/helpers/events.py (1)
make_execution_requested_event(7-50)backend/app/events/core/producer.py (2)
producer(56-57)produce(111-148)
backend/app/services/coordinator/coordinator.py (6)
backend/app/domain/events/typed.py (4)
CreatePodCommandEvent(382-397)EventMetadata(16-27)ExecutionAcceptedEvent(65-70)ExecutionRequestedEvent(48-62)backend/app/events/core/dispatcher.py (1)
EventDispatcher(15-177)backend/app/events/core/consumer.py (1)
UnifiedConsumer(23-256)backend/app/events/core/producer.py (1)
UnifiedProducer(23-218)backend/app/events/event_store.py (1)
EventStore(19-314)backend/app/events/schema/schema_registry.py (1)
SchemaRegistryManager(35-119)
backend/tests/unit/services/saga/test_saga_step_and_base.py (2)
backend/app/domain/events/typed.py (2)
BaseEvent(30-42)EventMetadata(16-27)backend/app/domain/enums/events.py (1)
EventType(4-87)
backend/app/services/execution_service.py (1)
backend/app/domain/events/typed.py (3)
EventMetadata(16-27)ExecutionCancelledEvent(124-129)ExecutionRequestedEvent(48-62)
backend/tests/e2e/test_k8s_worker_create_pod.py (1)
backend/app/domain/events/typed.py (2)
CreatePodCommandEvent(382-397)EventMetadata(16-27)
backend/app/services/idempotency/middleware.py (3)
backend/tests/unit/events/test_event_dispatcher.py (2)
handler(37-38)handler(48-49)backend/app/services/idempotency/idempotency_manager.py (1)
IdempotencyManager(69-315)backend/tests/integration/idempotency/test_idempotency.py (1)
on_duplicate(351-352)
backend/app/events/event_store_consumer.py (2)
backend/app/services/saga/saga_orchestrator.py (1)
_handle_event(151-170)backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
backend/app/services/result_processor/processor.py (4)
backend/app/domain/events/typed.py (4)
EventMetadata(16-27)ExecutionCompletedEvent(95-101)ExecutionFailedEvent(104-112)ExecutionTimeoutEvent(115-121)backend/app/domain/execution/exceptions.py (1)
ExecutionNotFoundError(4-8)backend/app/domain/execution/models.py (1)
ExecutionResultDomain(40-51)backend/app/events/core/dispatcher.py (1)
EventDispatcher(15-177)
backend/tests/helpers/events.py (1)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)ExecutionRequestedEvent(48-62)
backend/app/events/core/producer.py (1)
backend/app/events/schema/schema_registry.py (1)
serialize_event(70-78)
backend/tests/integration/result_processor/test_result_processor.py (2)
backend/app/domain/events/typed.py (3)
EventMetadata(16-27)ExecutionCompletedEvent(95-101)ResultStoredEvent(195-200)backend/app/domain/execution/models.py (1)
ResourceUsageDomain(13-19)
backend/app/dlq/models.py (2)
backend/app/core/utils.py (1)
StringEnum(6-31)backend/app/domain/enums/events.py (1)
EventType(4-87)
backend/tests/integration/services/events/test_event_bus.py (1)
backend/app/services/event_bus.py (2)
EventBusEvent(21-29)subscribe(162-188)
backend/app/db/docs/dlq.py (2)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/app/dlq/models.py (1)
DLQMessageStatus(12-18)
backend/tests/unit/services/test_pod_builder.py (2)
backend/app/domain/events/typed.py (2)
CreatePodCommandEvent(382-397)EventMetadata(16-27)backend/app/services/coordinator/queue_manager.py (1)
user_id(34-35)
backend/app/api/routes/events.py (1)
backend/app/domain/events/typed.py (2)
BaseEvent(30-42)EventMetadata(16-27)
backend/tests/integration/services/sse/test_redis_bus.py (2)
backend/app/domain/events/typed.py (1)
EventMetadata(16-27)backend/tests/unit/services/sse/test_kafka_redis_bridge.py (1)
_make_metadata(32-33)
backend/tests/integration/dlq/test_dlq_manager.py (3)
backend/app/dlq/manager.py (1)
create_dlq_manager(443-480)backend/app/domain/enums/kafka.py (1)
KafkaTopic(7-54)backend/app/domain/events/typed.py (1)
DLQMessageReceivedEvent(522-532)
backend/tests/unit/services/saga/test_saga_comprehensive.py (2)
backend/app/domain/events/typed.py (1)
ExecutionRequestedEvent(48-62)backend/app/services/saga/saga_step.py (3)
SagaStep(73-100)execute(80-88)SagaContext(14-70)
backend/app/services/kafka_event_service.py (4)
backend/app/domain/events/typed.py (1)
EventMetadata(16-27)backend/app/db/repositories/event_repository.py (1)
store_event(39-52)backend/app/events/core/producer.py (2)
produce(111-148)metrics(52-53)backend/app/core/tracing/utils.py (1)
inject_trace_context(88-100)
backend/app/events/schema/schema_registry.py (1)
backend/app/domain/enums/events.py (1)
EventType(4-87)
backend/tests/unit/services/pod_monitor/test_monitor.py (2)
backend/app/domain/events/typed.py (3)
EventMetadata(16-27)ExecutionCompletedEvent(95-101)ExecutionStartedEvent(80-85)backend/app/events/core/producer.py (1)
produce(111-148)
backend/tests/integration/idempotency/test_decorator_idempotent.py (1)
backend/app/services/idempotency/middleware.py (1)
idempotent_handler(93-119)
backend/tests/unit/domain/events/test_event_schema_coverage.py (1)
backend/app/domain/events/typed.py (1)
BaseEvent(30-42)
backend/tests/integration/events/test_schema_registry_real.py (1)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)PodCreatedEvent(135-139)
backend/app/events/core/dispatcher.py (2)
backend/app/infrastructure/kafka/mappings.py (1)
get_event_class_for_type(91-93)backend/app/domain/enums/events.py (1)
EventType(4-87)
🪛 GitHub Actions: docs
backend/app/dlq/manager.py
[error] 26-26: ModuleNotFoundError: No module named 'app.infrastructure.kafka.events' while importing DLQManager. Step './deploy.sh openapi' failed with exit code 1.
🪛 GitHub Actions: MyPy Type Checking
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py
[error] 30-30: Class cannot subclass "BaseEvent" (has type "Any")
backend/workers/dlq_processor.py
[error] 69-69: Returning Any from function declared to return "bool"
[error] 69-69: "DLQMessage" has no attribute "age_seconds"
backend/tests/unit/services/sse/test_kafka_redis_bridge.py
[error] 36-36: Class cannot subclass "BaseEvent" (has type "Any")
backend/app/services/notification_service.py
[error] 632-632: Item "None" of "ResourceUsageDomain | None" has no attribute "execution_time_wall_seconds"
backend/tests/unit/services/saga/test_saga_step_and_base.py
[error] 50-50: Argument 1 to "add_event" of "SagaContext" has incompatible type "BaseEvent"; expected one of ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>
[error] 75-75: Type argument "BaseEvent" of "SagaStep" must be a subtype of "ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>"
[error] 82-82: Type argument "BaseEvent" of "SagaStep" must be a subtype of "ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>"
backend/app/services/result_processor/processor.py
[error] 178-178: Item "None" of "ResourceUsageDomain | None" has no attribute "execution_time_wall_seconds"
[error] 182-182: Item "None" of "ResourceUsageDomain | None" has no attribute "peak_memory_kb"
[error] 218-218: Argument 1 to "record_error" of "ExecutionMetrics" has incompatible type "ExecutionErrorType | None"; expected "str"
backend/tests/integration/services/sse/test_redis_bus.py
[error] 21-21: Class cannot subclass "BaseEvent" (has type "Any")
backend/tests/integration/dlq/test_dlq_manager.py
[error] 64-64: "ExecutionRequestedEvent" has no attribute "to_dict"
backend/tests/integration/events/test_schema_registry_real.py
[error] 24-24: "PodCreatedEvent" has no attribute "topic"
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Scan Backend
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
backend/tests/integration/services/sse/test_redis_bus.py (1)
21-28: Fix type incompatibility causing pipeline failure.The MyPy error indicates
_DummyEventis not part of theDomainEventunion expected bypublish_event. After the unification, only specific domain event types are accepted.Two options to fix:
Use a real event type (preferred): Import and use an actual event class like
ExecutionCompletedEventinstead of_DummyEvent.Cast for test purposes: If the dummy event structure is intentional for isolation, cast at the call site.
Option 1: Use a real event type
-from app.domain.events.typed import EventMetadata -from app.domain.events.typed import BaseEvent +from app.domain.events.typed import EventMetadata, ExecutionCompletedEventThen replace the
_DummyEventusage in the test:-class _DummyEvent(BaseEvent): - """Dummy event for testing.""" - execution_id: str = "" - status: str | None = None - topic: ClassVar[KafkaTopic] = KafkaTopic.EXECUTION_EVENTS - - def model_dump(self, **kwargs: object) -> dict[str, Any]: - return {"execution_id": self.execution_id, "status": self.status}# Publish event - evt = _DummyEvent( - event_type=EventType.EXECUTION_COMPLETED, - metadata=_make_metadata(), - execution_id="exec-1", - status="completed" - ) + evt = ExecutionCompletedEvent( + metadata=_make_metadata(), + execution_id="exec-1", + )Option 2: Cast for test isolation
+from typing import cast +from app.domain.events.typed import DomainEvent- await bus.publish_event("exec-1", evt) + await bus.publish_event("exec-1", cast(DomainEvent, evt))Also applies to: 90-96
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (1)
30-44: Fix MyPy type errors:_FakeEventis not compatible with expectedDomainEventunion.The pipeline reports 4 MyPy errors because
_FakeEventis not part of theDomainEventtype union expected bySagaStepandSagaOrchestratormethods. The type system now enforces that only registered domain event types can be used.Options to resolve:
- Use an existing event type (e.g.,
ExecutionRequestedEvent) instead of a custom fake event- Add
# type: ignorecomments on the affected lines if test flexibility is more important than strict typing- Cast the fake event where needed:
cast(DomainEvent, _make_event(...))Option 2: Add type ignore comments
-class _StepOK(SagaStep[_FakeEvent]): +class _StepOK(SagaStep[_FakeEvent]): # type: ignore[type-arg]- def get_steps(self) -> list[SagaStep[_FakeEvent]]: + def get_steps(self) -> list[SagaStep[_FakeEvent]]: # type: ignore[type-arg]- await orch._handle_event(_make_event(EventType.EXECUTION_REQUESTED, "e")) + await orch._handle_event(_make_event(EventType.EXECUTION_REQUESTED, "e")) # type: ignore[arg-type]- assert orch._should_trigger_saga(_Saga, _make_event(EventType.EXECUTION_REQUESTED, "e")) is True + assert orch._should_trigger_saga(_Saga, _make_event(EventType.EXECUTION_REQUESTED, "e")) is True # type: ignore[arg-type]backend/tests/integration/dlq/test_dlq_manager.py (1)
63-70: Fix:to_dict()method does not exist on domain events.The pipeline failure indicates
ExecutionRequestedEventhas no attributeto_dict. Domain events now use Pydantic'smodel_dump()method.Proposed fix
payload = { - "event": ev.to_dict(), + "event": ev.model_dump(mode="json"), "original_topic": f"{prefix}{str(KafkaTopic.EXECUTION_EVENTS)}", "error": "handler failed", "retry_count": 0, "failed_at": datetime.now(timezone.utc).isoformat(), "producer_id": "tests", }backend/app/dlq/manager.py (2)
1-32: Fix: Import block is un-sorted.The Ruff linter reports unsorted imports. The
app.domain.events.typedimport should be grouped with otherapp.*imports in alphabetical order.Run
ruff check --fixto auto-fix, or manually reorder:Suggested import ordering
from app.core.lifecycle import LifecycleEnabled from app.core.metrics.context import get_dlq_metrics from app.core.tracing import EventAttributes from app.core.tracing.utils import extract_trace_context, get_tracer, inject_trace_context from app.db.docs import DLQMessageDocument from app.dlq.models import ( DLQBatchRetryResult, DLQMessage, DLQMessageStatus, DLQMessageUpdate, DLQRetryResult, RetryPolicy, RetryStrategy, ) from app.domain.enums.kafka import GroupId, KafkaTopic -from app.events.schema.schema_registry import SchemaRegistryManager from app.domain.events.typed import ( DLQMessageDiscardedEvent, DLQMessageReceivedEvent, DLQMessageRetriedEvent, EventMetadata, ) +from app.events.schema.schema_registry import SchemaRegistryManager from app.settings import Settings
206-255: Non-atomic dual-topic send could leave inconsistent state.The method sends to
retry_topicthenoriginal_topicsequentially. If the secondsend_and_waitfails after the first succeeds, the message appears in the retry topic but not the original, and the status update won't run.Consider wrapping in a try/except to handle partial failure, or at minimum updating status to reflect the partial state.
Suggested improvement
+ # Send to both topics - if second fails, we still want to track partial retry + sent_to_retry = False + sent_to_original = False # Send to retry topic - await self.producer.send_and_wait( - topic=retry_topic, - value=json.dumps(event.model_dump(mode="json")).encode(), - key=message.event.event_id.encode(), - headers=kafka_headers, - ) - - # Send to original topic - await self.producer.send_and_wait( - topic=message.original_topic, - value=json.dumps(event.model_dump(mode="json")).encode(), - key=message.event.event_id.encode(), - headers=kafka_headers, - ) + try: + await self.producer.send_and_wait( + topic=retry_topic, + value=json.dumps(event.model_dump(mode="json")).encode(), + key=message.event.event_id.encode(), + headers=kafka_headers, + ) + sent_to_retry = True + + await self.producer.send_and_wait( + topic=message.original_topic, + value=json.dumps(event.model_dump(mode="json")).encode(), + key=message.event.event_id.encode(), + headers=kafka_headers, + ) + sent_to_original = True + except Exception as e: + self.logger.error( + f"Partial retry failure: retry_topic={sent_to_retry}, original={sent_to_original}, error={e}", + extra={"event_id": message.event.event_id}, + ) + raise
🧹 Nitpick comments (4)
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (1)
12-17: Consider consolidating imports from the same module.Both
EventMetadata(line 12) andBaseEvent(line 17) are imported fromapp.domain.events.typed. These can be combined into a single import statement.Suggested fix
-from app.domain.events.typed import EventMetadata from app.domain.saga.models import Saga, SagaConfig from app.events.core import UnifiedProducer from app.events.event_store import EventStore from app.events.schema.schema_registry import SchemaRegistryManager -from app.domain.events.typed import BaseEvent +from app.domain.events.typed import BaseEvent, EventMetadatabackend/tests/integration/dlq/test_dlq_manager.py (1)
48-61: Consider logging more context on deserialization failures.The exception handler only logs at debug level, which could hide issues during test debugging. For test code, consider logging at warning level or including the raw message value.
Suggested improvement
except Exception as e: - _test_logger.debug(f"Error deserializing DLQ event: {e}") + _test_logger.warning(f"Error deserializing DLQ event: {e}, raw: {msg.value[:100] if msg.value else None}")backend/app/dlq/manager.py (2)
72-76: Consider defensive handling for header decoding.If a header value contains non-UTF-8 bytes, the
.decode()call will raiseUnicodeDecodeError. Consider usingerrors="replace"orerrors="ignore"for resilience.Suggested fix
def _kafka_msg_to_message(self, msg: Any) -> DLQMessage: """Parse Kafka ConsumerRecord into DLQMessage.""" data = json.loads(msg.value) - headers = {k: v.decode() for k, v in (msg.headers or [])} + headers = {k: v.decode(errors="replace") for k, v in (msg.headers or [])} return DLQMessage(**data, dlq_offset=msg.offset, dlq_partition=msg.partition, headers=headers)
107-138: Span attribute uses enum value instead of full topic name.Line 125 sets
EventAttributes.KAFKA_TOPIC: self.dlq_topicwhich is theKafkaTopicenum value, not the prefixed topic name. For consistency with how topics are referenced elsewhere, consider using the full topic name.Suggested fix
with get_tracer().start_as_current_span( name="dlq.consume", context=ctx, kind=SpanKind.CONSUMER, attributes={ - EventAttributes.KAFKA_TOPIC: self.dlq_topic, + EventAttributes.KAFKA_TOPIC: f"{self.settings.KAFKA_TOPIC_PREFIX}{self.dlq_topic}", EventAttributes.EVENT_TYPE: dlq_msg.event.event_type, EventAttributes.EVENT_ID: dlq_msg.event.event_id, },
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/dlq/manager.pybackend/app/services/coordinator/queue_manager.pybackend/tests/integration/dlq/test_dlq_manager.pybackend/tests/integration/services/sse/test_redis_bus.pybackend/tests/unit/services/saga/test_execution_saga_steps.pybackend/tests/unit/services/saga/test_saga_orchestrator_unit.pybackend/tests/unit/services/sse/test_kafka_redis_bridge.py
✅ Files skipped from review due to trivial changes (1)
- backend/app/services/coordinator/queue_manager.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/tests/unit/services/sse/test_kafka_redis_bridge.py
- backend/tests/unit/services/saga/test_execution_saga_steps.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/tests/integration/services/sse/test_redis_bus.py (2)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)BaseEvent(30-42)backend/tests/unit/services/sse/test_kafka_redis_bridge.py (1)
_make_metadata(32-33)
backend/tests/integration/dlq/test_dlq_manager.py (5)
backend/app/dlq/manager.py (1)
create_dlq_manager(443-480)backend/app/domain/enums/events.py (1)
EventType(4-87)backend/app/domain/enums/kafka.py (1)
KafkaTopic(7-54)backend/app/events/schema/schema_registry.py (2)
SchemaRegistryManager(35-119)deserialize_event(80-95)backend/app/domain/events/typed.py (1)
DLQMessageReceivedEvent(522-532)
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (1)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)BaseEvent(30-42)
🪛 GitHub Actions: MyPy Type Checking
backend/tests/integration/services/sse/test_redis_bus.py
[error] 96-96: Argument 2 to "publish_event" of "SSERedisBus" has incompatible type "_DummyEvent"; expected a union of ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>
backend/tests/integration/dlq/test_dlq_manager.py
[error] 64-64: "ExecutionRequestedEvent" has no attribute "to_dict".
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py
[error] 97-97: Type argument "_FakeEvent" of "SagaStep" must be a subtype of ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>.
[error] 117-117: Type argument "_FakeEvent" of "SagaStep" must be a subtype of ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>.
[error] 145-145: Argument 1 to "_handle_event" of "SagaOrchestrator" has incompatible type "_FakeEvent"; expected one of ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>.
[error] 165-165: Argument 2 to "_should_trigger_saga" of "SagaOrchestrator" has incompatible type "_FakeEvent"; expected one of ExecutionRequestedEvent | ExecutionAcceptedEvent | ExecutionQueuedEvent | ExecutionStartedEvent | ExecutionRunningEvent | <51 more items>.
🪛 GitHub Actions: Ruff Linting
backend/app/dlq/manager.py
[error] 1-32: Ruff check failed: Import block is un-sorted or un-formatted. Found 1 error fixable with the --fix option. Tool: Ruff (Python linter). Run 'ruff --fix' to auto-fix imports.
🔇 Additional comments (10)
backend/tests/integration/services/sse/test_redis_bus.py (2)
11-12: LGTM on import migration.The imports are correctly updated to use the unified domain event model (
EventMetadataandBaseEventfromapp.domain.events.typed), consistent with the PR's objective to consolidate the event systems.
75-76: LGTM!The metadata factory correctly uses
EventMetadatawith the required fields, matching the pattern used in other test files (e.g.,test_kafka_redis_bridge.py).backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (1)
40-42: LGTM!The
metadatafield correctly uses the newEventMetadatatype with appropriate default values for testing. The migration fromAvroEventMetadatatoEventMetadataaligns with the PR objectives.backend/tests/integration/dlq/test_dlq_manager.py (2)
1-14: Imports look correct for the new unified event model.The imports properly reference the new domain event types (
EventType,DLQMessageReceivedEvent) from the unifiedapp.domainpackages, aligning with the PR's migration objectives.
88-103: LGTM! Solid async test structure with proper cleanup.The test correctly:
- Starts the consumer before the manager to ensure message capture
- Uses
asyncio.wait_forwith timeout to prevent hangs- Properly cancels and cleans up the consumer task in the
finallyblock- Validates key fields of the emitted
DLQMessageReceivedEventbackend/app/dlq/manager.py (5)
170-183: Event emission failure is silently logged, not propagated.The
_emit_message_received_eventcall at line 183 will swallow exceptions (as seen in_produce_dlq_event). This is reasonable for telemetry events, but consider whether failed emission should trigger any compensating action or metric.The design choice to log-and-continue for emission failures is appropriate—DLQ storage is the primary concern, and emission is secondary telemetry.
322-372: LGTM! Clean event emission pattern.The three emission methods follow a consistent pattern:
- Create typed event with appropriate fields
- Delegate to
_produce_dlq_eventfor serialization and send- Silent failure with logging is appropriate for telemetry events
The use of
schema_registry.serialize_eventensures proper Avro serialization aligned with the unified event model.
275-304: LGTM! Monitor loop handles scheduled retries appropriately.The monitor:
- Queries with sensible limit (100 messages)
- Processes individually, allowing partial progress
- Has different sleep intervals for normal (10s) vs error (60s) states
- Failed retries remain SCHEDULED for next iteration
374-440: LGTM! Manual operations have proper state guards.Both
retry_message_manuallyanddiscard_message_manually:
- Correctly query using nested
{"event.event_id": event_id}path- Guard against terminal states (DISCARDED, RETRIED)
- Return boolean success indicator
The batch retry properly aggregates results without short-circuiting on individual failures.
443-480: LGTM! Factory creates well-configured Kafka clients.Producer configuration includes:
acks="all"for durabilityenable_idempotence=Truefor exactly-once semantics- Compression and batching for efficiency
Consumer configuration properly uses manual commit with earliest offset reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 14 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/tests/integration/db/repositories/test_dlq_repository.py">
<violation number="1" location="backend/tests/integration/db/repositories/test_dlq_repository.py:89">
P1: Test will fail: `msg.event.event_id` will be a random UUID, not `"id1"`. The test data setup passes `event_id="id1"` as a top-level field (which the new model ignores), but the `event` dict doesn't include `event_id`. The test data needs to include `"event_id": "id1"` inside each `event` dict.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/tests/integration/db/repositories/test_dlq_repository.py (1)
24-33: Test assertion on line 89 is broken and will fail.The assertion
msg.event.event_id == "id1"has two critical issues:
Incorrect dict access syntax: The
eventfield is adict[str, Any], so accessing it with attribute notation (msg.event.event_id) raisesAttributeError. Use dict key access instead:msg.event["event_id"]ormsg.event.get("event_id").Missing
event_idin fixture data: Even with correct access syntax, theeventdicts in test fixtures (lines 28-33, 44-49, 60-65) do not contain theevent_idkey.Add
"event_id"to each event dict in the fixtures, and fix the assertion to use proper dict key access.backend/workers/dlq_processor.py (1)
76-107: DLQ manager is never started - processing will not occur.The
main()function initializes the manager and configures policies/filters, but it never callsmanager.start()(or usesasync with manager). The function just waits onstop_eventindefinitely without starting any processing. TheDLQManagerinherits fromLifecycleEnabledand requires explicit startup.🐛 Proposed fix
async with AsyncExitStack() as stack: stack.push_async_callback(container.close) + await manager.start() + stack.push_async_callback(manager.stop) await stop_event.wait()Alternatively, use the context manager pattern:
async with AsyncExitStack() as stack: stack.push_async_callback(container.close) + await stack.enter_async_context(manager) await stop_event.wait()
🧹 Nitpick comments (8)
backend/app/services/result_processor/processor.py (1)
155-165: Consider using runtime checks instead of assert for type narrowing.The
assert isinstance(...)pattern works for type narrowing, butassertstatements are removed when Python runs with optimization flags (-O). If the dispatcher ever routes an incorrect event type in production with optimizations enabled, these assertions would be silently skipped, leading to attribute errors downstream.♻️ Suggested alternative using explicit checks
async def _handle_completed_wrapper(self, event: DomainEvent) -> None: - assert isinstance(event, ExecutionCompletedEvent) + if not isinstance(event, ExecutionCompletedEvent): + raise TypeError(f"Expected ExecutionCompletedEvent, got {type(event).__name__}") await self._handle_completed(event) async def _handle_failed_wrapper(self, event: DomainEvent) -> None: - assert isinstance(event, ExecutionFailedEvent) + if not isinstance(event, ExecutionFailedEvent): + raise TypeError(f"Expected ExecutionFailedEvent, got {type(event).__name__}") await self._handle_failed(event) async def _handle_timeout_wrapper(self, event: DomainEvent) -> None: - assert isinstance(event, ExecutionTimeoutEvent) + if not isinstance(event, ExecutionTimeoutEvent): + raise TypeError(f"Expected ExecutionTimeoutEvent, got {type(event).__name__}") await self._handle_timeout(event)backend/tests/integration/db/repositories/test_dlq_repository.py (1)
90-91: Assertions always pass regardless of return value.
assert await repo.mark_message_retried("id1") in (True, False)will always pass since any boolean is in(True, False). If the intent is to verify these methods don't raise exceptions, consider making that explicit. If you expect specific outcomes (e.g., success on first call), assert those values directly.💡 Suggested improvement
- assert await repo.mark_message_retried("id1") in (True, False) - assert await repo.mark_message_discarded("id1", "r") in (True, False) + # Verify methods execute without error; first call should succeed + result_retried = await repo.mark_message_retried("id1") + assert result_retried is True, "Expected successful retry mark" + result_discarded = await repo.mark_message_discarded("id1", "r") + # After retry, discard may or may not succeed depending on business logic + assert isinstance(result_discarded, bool)backend/tests/integration/dlq/test_dlq_manager.py (1)
48-61: Silent exception handling may hide deserialization issues.The
consume_dlq_eventsfunction catches all exceptions and logs atdebuglevel, which could mask real problems during test execution. Consider logging at a higher level or re-raising after a certain number of failures.♻️ Suggested improvement
except Exception as e: - _test_logger.debug(f"Error deserializing DLQ event: {e}") + _test_logger.warning(f"Error deserializing DLQ event: {e}")backend/app/dlq/manager.py (5)
107-138: Message commit occurs even if storage or event emission fails.In
_process_messages, the commit happens after_process_dlq_messagereturns, but_process_dlq_messagecould partially succeed (e.g., store the message but fail during event emission). If the emit fails silently (caught in_produce_dlq_event), the message is committed but the received event was never emitted. Consider whether this is the desired behavior.Also, if an exception occurs in
_process_dlq_message, the message is not committed (good), but the loop continues to the next message without any backoff, which could cause tight-loop failures.♻️ Consider adding backoff on repeated failures
except Exception as e: self.logger.error(f"Error processing DLQ message: {e}") + await asyncio.sleep(1) # Brief backoff to prevent tight-loop failures
170-183: Upsert pattern may have race condition.The
_store_messagemethod performs a find-then-save pattern which is not atomic. If two instances process the same message concurrently, both could find no existing document and attempt to insert, potentially causing duplicate key errors (though the unique index onevent.event_idwould prevent actual duplicates).Consider using
upsert=Truewithfind_one_and_updateor catching the duplicate key exception.♻️ Consider using Beanie's upsert capability
- existing = await DLQMessageDocument.find_one({"event.event_id": message.event.event_id}) - if existing: - doc.id = existing.id - await doc.save() + await DLQMessageDocument.find_one( + {"event.event_id": message.event.event_id} + ).upsert( + Set(message.model_dump()), + on_insert=doc + )Alternatively, catch
DuplicateKeyErrorand update:try: await doc.insert() except DuplicateKeyError: existing = await DLQMessageDocument.find_one({"event.event_id": message.event.event_id}) if existing: doc.id = existing.id await doc.save()
360-372: DLQ event emission failures are silently logged - consider retry or metrics.The
_produce_dlq_eventmethod catches all exceptions and only logs the error. For observability events, this is acceptable, but you lose visibility into emission failures. Consider:
- Recording a metric for emission failures
- Adding structured logging with event details
♻️ Add metrics for emission failures
except Exception as e: self.logger.error(f"Failed to emit DLQ event {event.event_type}: {e}") + self.metrics.record_dlq_event_emission_failure(str(event.event_type))
389-416: Batch retry processes sequentially - consider concurrency for large batches.The
retry_messages_batchmethod processes each event ID sequentially. For large batches, this could be slow. Consider usingasyncio.gatherwith a concurrency limit for better throughput.♻️ Concurrent batch processing with semaphore
async def retry_messages_batch(self, event_ids: list[str]) -> DLQBatchRetryResult: semaphore = asyncio.Semaphore(10) # Limit concurrency async def retry_with_limit(event_id: str) -> DLQRetryResult: async with semaphore: try: success = await self.retry_message_manually(event_id) return DLQRetryResult( event_id=event_id, status="success" if success else "failed", error=None if success else "Retry failed" ) except Exception as e: self.logger.error(f"Error retrying message {event_id}: {e}") return DLQRetryResult(event_id=event_id, status="failed", error=str(e)) results = await asyncio.gather(*[retry_with_limit(eid) for eid in event_ids]) successful = sum(1 for r in results if r.status == "success") return DLQBatchRetryResult( total=len(event_ids), successful=successful, failed=len(event_ids) - successful, details=list(results) )
69-70: Hardcoded service metadata - consider making configurable.The
_event_metadatauses hardcoded values forservice_nameandservice_version. Consider deriving these from settings or environment variables for better maintainability across deployments.♻️ Use settings for service metadata
- self._event_metadata = EventMetadata(service_name="dlq-manager", service_version="1.0.0") + self._event_metadata = EventMetadata( + service_name=getattr(settings, "SERVICE_NAME", "dlq-manager"), + service_version=getattr(settings, "SERVICE_VERSION", "1.0.0") + )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
backend/app/dlq/manager.pybackend/app/services/notification_service.pybackend/app/services/result_processor/processor.pybackend/tests/integration/db/repositories/test_dlq_repository.pybackend/tests/integration/dlq/test_dlq_manager.pybackend/tests/integration/events/test_producer_roundtrip.pybackend/tests/integration/events/test_schema_registry_real.pybackend/tests/integration/events/test_schema_registry_roundtrip.pybackend/tests/integration/services/sse/test_redis_bus.pybackend/tests/unit/services/saga/test_execution_saga_steps.pybackend/tests/unit/services/saga/test_saga_orchestrator_unit.pybackend/tests/unit/services/saga/test_saga_step_and_base.pybackend/tests/unit/services/sse/test_kafka_redis_bridge.pybackend/workers/dlq_processor.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/tests/integration/services/sse/test_redis_bus.py
🧰 Additional context used
🧬 Code graph analysis (12)
backend/tests/integration/db/repositories/test_dlq_repository.py (1)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
backend/app/services/result_processor/processor.py (5)
backend/app/domain/events/typed.py (4)
EventMetadata(16-27)ExecutionCompletedEvent(95-101)ExecutionFailedEvent(104-112)ExecutionTimeoutEvent(115-121)backend/app/domain/execution/models.py (1)
ExecutionResultDomain(40-51)backend/app/events/core/dispatcher.py (1)
EventDispatcher(15-177)backend/app/events/schema/schema_registry.py (1)
SchemaRegistryManager(35-119)backend/app/core/metrics/execution.py (3)
record_execution_duration(68-69)record_memory_usage(77-78)record_error(80-81)
backend/tests/unit/services/saga/test_saga_step_and_base.py (2)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)SystemErrorEvent(497-502)backend/app/services/saga/saga_step.py (4)
SagaStep(73-100)execute(80-88)SagaContext(14-70)can_execute(95-97)
backend/app/services/notification_service.py (1)
backend/app/domain/events/typed.py (3)
ExecutionCompletedEvent(95-101)ExecutionFailedEvent(104-112)ExecutionTimeoutEvent(115-121)
backend/workers/dlq_processor.py (3)
backend/app/services/coordinator/queue_manager.py (1)
age_seconds(38-39)backend/app/dlq/manager.py (1)
add_filter(319-320)backend/app/db/docs/dlq.py (1)
Settings(34-43)
backend/tests/integration/events/test_producer_roundtrip.py (1)
backend/app/infrastructure/kafka/mappings.py (1)
get_topic_for_event(97-99)
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (3)
backend/app/domain/events/typed.py (1)
ExecutionRequestedEvent(48-62)backend/app/services/saga/saga_step.py (4)
CompensationStep(103-121)SagaContext(14-70)SagaStep(73-100)execute(80-88)backend/tests/helpers/events.py (1)
make_execution_requested_event(7-50)
backend/tests/integration/dlq/test_dlq_manager.py (6)
backend/app/dlq/manager.py (1)
create_dlq_manager(443-480)backend/app/domain/enums/events.py (1)
EventType(4-87)backend/app/domain/enums/kafka.py (1)
KafkaTopic(7-54)backend/app/domain/events/typed.py (1)
DLQMessageReceivedEvent(522-532)backend/app/settings.py (1)
Settings(11-169)backend/app/events/schema/schema_registry.py (2)
SchemaRegistryManager(35-119)deserialize_event(80-95)
backend/tests/unit/services/sse/test_kafka_redis_bridge.py (2)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)ExecutionStartedEvent(80-85)backend/app/services/sse/redis_bus.py (1)
publish_event(65-71)
backend/tests/integration/events/test_schema_registry_real.py (2)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)PodCreatedEvent(135-139)backend/app/infrastructure/kafka/mappings.py (1)
get_topic_for_event(97-99)
backend/tests/integration/events/test_schema_registry_roundtrip.py (2)
backend/app/infrastructure/kafka/mappings.py (1)
get_topic_for_event(97-99)backend/app/events/schema/schema_registry.py (1)
deserialize_event(80-95)
backend/tests/unit/services/saga/test_execution_saga_steps.py (1)
backend/app/domain/events/typed.py (1)
ExecutionRequestedEvent(48-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Integration Tests
- GitHub Check: E2E Tests
🔇 Additional comments (32)
backend/tests/integration/events/test_schema_registry_roundtrip.py (3)
5-5: LGTM!Import correctly added to support centralized topic resolution via the mappings module.
23-24: LGTM!Topic derivation correctly updated to use the centralized
get_topic_for_eventmapping. Thestr()conversion is appropriate sinceget_topic_for_eventreturns aKafkaTopicenum whiledeserialize_eventexpects a string parameter.
27-28: LGTM!Good addition to exercise the idempotent initialization path of
initialize_schemas(). This helps ensure coverage of the lazy initialization behavior mentioned in the comment at line 19.backend/tests/integration/events/test_schema_registry_real.py (2)
4-6: LGTM!Imports correctly updated to use the unified domain event model. The switch from
AvroEventMetadatatoEventMetadataand sourcingPodCreatedEventfromapp.domain.events.typedaligns with the PR objective of consolidating to a single event system.
18-26: LGTM!Test correctly updated to use the unified event model:
EventMetadatareplacesAvroEventMetadatawith the same required fields- Topic derivation via
get_topic_for_event(ev.event_type)follows the new centralized mapping pattern- The
str()conversion appropriately handles theKafkaTopicenum for thedeserialize_eventstring parameterbackend/tests/unit/services/saga/test_execution_saga_steps.py (2)
3-3: LGTM!Import correctly updated from the infrastructure layer to the domain layer, aligning with the unified event model migration.
112-120: LGTM!The
_FakeProducercorrectly updated to useDomainEventtype hints instead ofBaseEvent, consistent with the unified event model. The minimal implementation (skipping parent__init__) is appropriate for test isolation.backend/tests/integration/events/test_producer_roundtrip.py (2)
7-7: LGTM!Import correctly reflects the migration to the centralized topic mapping helper.
35-36: LGTM!The topic derivation now correctly uses the centralized mapping via
get_topic_for_event, and the explicitstr()cast properly converts theKafkaTopicenum to the string expected bysend_to_dlq.backend/tests/unit/services/sse/test_kafka_redis_bridge.py (4)
7-7: LGTM!Import correctly updated to use the unified domain event types from
app.domain.events.typed, consistent with the PR's migration to the single event model.
19-26: LGTM!The
_FakeBustype hints correctly updated toDomainEvent, matching the productionSSERedisBus.publish_eventsignature.
29-30: LGTM!Helper correctly updated to use
EventMetadatawith the required fields. The rename fromAvroEventMetadataaligns with the PR migration.
54-61: LGTM!Good improvement: using the actual
ExecutionStartedEventtype instead of a custom dummy class provides more realistic test coverage and validates the real event structure works correctly with the routing handlers.backend/app/services/notification_service.py (4)
22-27: LGTM!The imports correctly reflect the migration to the unified
DomainEventmodel. All imported types are used appropriately within the file.
330-330: Verify that blocking on delivery is intentional.This change from fire-and-forget to awaiting
_deliver_notificationmeanscreate_notificationwill block until delivery completes. For webhook and Slack channels, this could add up to 30 seconds of latency (the configured timeout) if external services are slow.If immediate API response is desired, consider restoring async task dispatch or applying this only to the
IN_APPchannel.
631-632: LGTM!The defensive null-check on
event.resource_usagepreventsAttributeErrorwhen the field isNone, with a sensible fallback to0.0.
644-656: LGTM!The unified handler correctly uses
isinstancechecks to narrow the genericDomainEventto specific typed handlers, with appropriate fallback logging for unhandled types and exception handling.backend/tests/unit/services/saga/test_saga_step_and_base.py (3)
6-6: LGTM!Import correctly updated to use the new domain-typed event system.
46-51: LGTM!Test event construction correctly uses
SystemErrorEventwith proper required fields (error_type,message,service_name) and embeddedEventMetadata. This aligns with the unified event model.
77-78: LGTM!Type annotations consistently updated to
SystemErrorEventacross the saga step definition, execute signature, and mock spec. The changes align with the broader migration to domain-typed events.Also applies to: 84-86, 94-94
backend/tests/unit/services/saga/test_saga_orchestrator_unit.py (4)
9-9: LGTM!Imports correctly updated to use
DomainEventandExecutionRequestedEventfrom the domain layer, with the test helpermake_execution_requested_eventproperly separated as a local import.Also applies to: 19-20
48-51: LGTM!The fake producer signature correctly accepts
DomainEvent, matching the updatedUnifiedProducerinterface.
78-86: LGTM!
_StepOKand_Sagacorrectly parameterized withExecutionRequestedEvent. Theexecutemethod signature matches theSagaStepabstract interface.Also applies to: 98-99
122-122: LGTM!Consistent use of
make_execution_requested_eventhelper withexecution_idoverride. This is cleaner than the previous approach and leverages the test helper's sensible defaults.Also applies to: 142-142, 146-146
backend/app/services/result_processor/processor.py (3)
15-26: LGTM!Imports correctly updated to include
DomainEventandEventMetadatafrom the unified domain events module.
178-192: LGTM on conditional metrics; acknowledge the brittle memory limit parsing.The conditional check for
resource_usageis a good defensive change that preventsAttributeErrorwhen resource data is unavailable.However, line 188's parsing of
K8S_POD_MEMORY_LIMITis fragile as noted in the TODO. It assumes the format ends with"Mi", but will fail for:
"256Gi"→int("256G")→ValueError"256M"→int("256")but semantically wrong (megabytes vs mebibytes)"268435456"(raw bytes) →int("268435456")→ incorrect valueConsider extracting a utility function to parse Kubernetes resource quantities properly.
219-219: LGTM!Reasonable fallback to
"unknown"whenerror_typeis not present, ensuring metrics recording doesn't fail.backend/workers/dlq_processor.py (1)
68-71: Age calculation approach is correct.The switch to explicit datetime calculation using
datetime.now(timezone.utc) - message.failed_atis appropriate sinceDLQMessagedoesn't have anage_secondsproperty (that exists inQueuedItemfrom the queue_manager, per the relevant snippets).backend/tests/integration/dlq/test_dlq_manager.py (2)
84-103: Good async-first test pattern using Future instead of polling.The approach of using
asyncio.Futurewithasyncio.wait_foris cleaner than polling-based verification. The cleanup in thefinallyblock properly cancels the consumer task and stops the consumer.
27-28: No changes needed. Beanie is properly initialized through the app's lifespan context, which runs when theappfixture is created. Although the test doesn't explicitly request thedbfixture, theautouse_cleanupfixture inbackend/tests/integration/conftest.pyrequires it, which establishes the dependency chain:_cleanup→db→scope→app_container→app(with lifespan). This ensuresinit_beanieis called at app startup before the test runs.Likely an incorrect or invalid review comment.
backend/app/dlq/manager.py (2)
206-255: Retry sends to both retry and original topics - potential duplicate processing.The
_retry_messagemethod sends the event to both a retry topic (line 222-227) and the original topic (line 230-235). This design choice means:
- The retry topic serves as an audit/monitoring trail
- The original topic gets the actual retry
This is valid if intentional, but consumers on the original topic should be idempotent since they'll receive the same event again.
Confirm this dual-publish behavior is intentional for the retry flow. The original consumers should handle idempotency to prevent duplicate processing.
322-358: DLQ lifecycle event emission methods are well-structured.The three emit methods (
_emit_message_received_event,_emit_message_retried_event,_emit_message_discarded_event) follow a consistent pattern and correctly construct the domain events with proper metadata. The common_produce_dlq_eventmethod handles serialization and publishing with error resilience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @backend/app/db/repositories/dlq_repository.py:
- Around line 26-32: Update usages that assume a top-level event_type to
reference the nested event object: in the aggregation pipeline change the group
key from "$event_type" to "$event.event_type" so grouping uses the nested
DomainEvent field, and in query conditions replace any
DLQMessageDocument.event_type references with
DLQMessageDocument.event.event_type (e.g., in the code that builds filters or
queries) so they access the nested event.event_type property.
🧹 Nitpick comments (1)
backend/app/domain/events/typed.py (1)
16-27: Consider consistency:use_enum_values=Truepresent here but missing inBaseEvent.
EventMetadatahasuse_enum_values=Truein itsConfigDict, butBaseEvent(line 33) only hasfrom_attributes=True. If enums should serialize consistently across all event types, consider addinguse_enum_values=TruetoBaseEventas well to ensure uniform serialization behavior.♻️ Optional: Add use_enum_values for consistency
class BaseEvent(AvroBase): """Base fields for all domain events.""" - model_config = ConfigDict(from_attributes=True) + model_config = ConfigDict(from_attributes=True, use_enum_values=True)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/db/repositories/dlq_repository.pybackend/app/domain/events/typed.pybackend/app/services/idempotency/idempotency_manager.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/db/repositories/dlq_repository.py (2)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/app/infrastructure/kafka/mappings.py (1)
get_event_class_for_type(91-93)
backend/app/domain/events/typed.py (1)
backend/app/domain/enums/events.py (1)
EventType(4-87)
backend/app/services/idempotency/idempotency_manager.py (1)
backend/app/domain/events/typed.py (1)
BaseEvent(30-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Integration Tests
- GitHub Check: E2E Tests
🔇 Additional comments (7)
backend/app/domain/events/typed.py (5)
1-6: LGTM - Clean migration to AvroBase.The import restructuring correctly brings in
AvroBasefrompydantic_avroto enable Avro schema generation across all domain events.
30-40: LGTM - BaseEvent properly extends AvroBase.The
event_type: EventTypefield is correctly added as a required discriminator field, enabling polymorphic deserialization via theDomainEventunion.
517-553: LGTM - Well-structured DLQ event classes.The three new DLQ event types (
DLQMessageReceivedEvent,DLQMessageRetriedEvent,DLQMessageDiscardedEvent) follow the established pattern with appropriate fields and correctLiteraldiscriminators matching theEventTypeenum values fromapp.domain.enums.events.
558-574: Intentional design:ArchivedEventduplicates base fields rather than inheriting.
ArchivedEventextendsAvroBasedirectly and duplicates fields fromBaseEvent. This appears intentional since it's a wrapper for archived event data with additional archive-specific metadata (deleted_at,deleted_by,deletion_reason). The separate hierarchy avoidsArchivedEventbeing part of theDomainEventunion.
643-649: LGTM - DomainEvent union correctly updated.The discriminated union now includes all three new DLQ event types, maintaining the
event_typediscriminator for polymorphic deserialization.backend/app/services/idempotency/idempotency_manager.py (2)
12-12: LGTM - Import aligned with unified event model.The import path change from the infrastructure layer to
app.domain.events.typedaligns with the PR's goal of consolidating to a single domain event system.
43-53: LGTM -mode="json"ensures consistent content hashing.Using
model_dump(mode="json")correctly serializes complex types (datetime, enums, UUIDs) to their JSON string representations before hashing. This prevents hash inconsistencies that could occur from serializing Python objects directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/db/repositories/dlq_repository.py (2)
61-69: Aggregation pipeline uses incorrect field path for event_type.The pipeline groups by
"$event_type"(top-level), but according to theDLQMessageDocumentschema,event_typeis now nested under theeventfield. This should be"$event.event_type"to match the new document structure.Proposed fix
# Get counts by event type event_type_pipeline: list[Mapping[str, object]] = [ - {"$group": {"_id": "$event_type", "count": {"$sum": 1}}}, + {"$group": {"_id": "$event.event_type", "count": {"$sum": 1}}}, {"$sort": {"count": -1}}, {"$limit": 10}, ]
107-111: Filter uses non-existent top-levelevent_typeattribute.
DLQMessageDocument.event_typereferences a top-level field that no longer exists. According to the document schema,event_typeis nested underevent. This condition will either raise anAttributeErroror silently fail to match documents.Proposed fix
conditions: list[Any] = [ DLQMessageDocument.status == status if status else None, DLQMessageDocument.original_topic == topic if topic else None, - DLQMessageDocument.event_type == event_type if event_type else None, + {"event.event_type": str(event_type)} if event_type else None, ]Note: Beanie's
find()accepts raw query dicts mixed with ODM expressions. Alternatively, verify if Beanie 2.0.1 supports nested attribute access likeDLQMessageDocument.event.event_type.
🧹 Nitpick comments (1)
backend/tests/integration/dlq/test_dlq_discard.py (1)
19-44: Consider extracting the shared helper to reduce duplication.The
_create_dlq_documenthelper is identical to the one intest_dlq_retry.py. Consider extracting it to a shared test utility (e.g., intests/helpers/) to avoid maintaining duplicate code.# In tests/helpers/dlq.py or similar async def create_test_dlq_document( event_id: str | None = None, status: DLQMessageStatus = DLQMessageStatus.PENDING, ) -> DLQMessageDocument: ...
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/db/repositories/dlq_repository.pybackend/tests/integration/db/repositories/test_dlq_repository.pybackend/tests/integration/dlq/test_dlq_discard.pybackend/tests/integration/dlq/test_dlq_retry.py
🧰 Additional context used
🧬 Code graph analysis (4)
backend/tests/integration/dlq/test_dlq_retry.py (2)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/app/db/docs/dlq.py (1)
DLQMessageDocument(11-43)
backend/app/db/repositories/dlq_repository.py (3)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/app/infrastructure/kafka/mappings.py (1)
get_event_class_for_type(91-93)backend/app/db/docs/dlq.py (1)
DLQMessageDocument(11-43)
backend/tests/integration/dlq/test_dlq_discard.py (2)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)backend/app/db/docs/dlq.py (1)
DLQMessageDocument(11-43)
backend/tests/integration/db/repositories/test_dlq_repository.py (1)
backend/tests/unit/services/idempotency/test_middleware.py (1)
event(30-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
- GitHub Check: Integration Tests
🔇 Additional comments (7)
backend/app/db/repositories/dlq_repository.py (2)
26-32: LGTM!The
_doc_to_messagemethod correctly extractsevent_typefrom the nesteddoc.event.event_typeand reconstructs the event using the appropriate event class. The error message now properly references the localevent_typevariable.
125-127: LGTM!The query pattern
{"event.event_id": event_id}is consistently applied acrossget_message_by_id,mark_message_retried, andmark_message_discarded. This aligns with the new nested event structure and the index defined onevent.event_id.Also applies to: 165-174, 176-186
backend/tests/integration/db/repositories/test_dlq_repository.py (2)
24-69: LGTM!Test documents are correctly constructed with the nested
eventdictionary structure containingevent_id,event_type,metadata, and event-specific fields. This matches the newDLQMessageDocumentschema.
85-86: LGTM!The assertion correctly validates the nested event structure by accessing
msg.event.event_id.backend/tests/integration/dlq/test_dlq_retry.py (2)
27-44: LGTM!The helper correctly constructs a DLQ document by:
- Creating a valid event via
make_execution_requested_event- Dumping to dict and overriding
event_idfor test predictability- Passing the dict to
DLQMessageDocument(event=event_dict, ...)This approach maintains type consistency while allowing controlled test data.
62-65: LGTM!All
find_onequeries consistently use{"event.event_id": event_id}to match the new document structure. The test assertions properly validate status transitions, timestamps, and reason fields.Also applies to: 94-97, 115-117, 158-160, 181-183, 201-204, 222-225
backend/tests/integration/dlq/test_dlq_discard.py (1)
63-67: LGTM!All
find_onequeries consistently use{"event.event_id": event_id}. The tests comprehensively cover discard scenarios including status transitions, timestamps, and custom reasons.Also applies to: 99-102, 119-121, 139-141
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/db/repositories/user_settings_repository.py (1)
73-78: Delete orphanedUserSettingsDocumentrecords indelete_user_settings.The
delete_user_settingsmethod deletes fromUserSettingsSnapshotDocumentandEventDocument, but not fromUserSettingsDocument—the primary settings collection wherecreate_snapshotandget_snapshotoperate. This leaves orphanedUserSettingsDocumentrecords when a user is deleted.Add deletion of
UserSettingsDocumentalongside the snapshot and event deletions:Suggested fix
async def delete_user_settings(self, user_id: str) -> None: doc = await UserSettingsDocument.find_one(UserSettingsDocument.user_id == user_id) if doc: await doc.delete() await UserSettingsSnapshotDocument.find(UserSettingsSnapshotDocument.user_id == user_id).delete() await EventDocument.find(EventDocument.aggregate_id == f"user_settings_{user_id}").delete() self.logger.info(f"Deleted all settings data for user {user_id}")backend/app/db/repositories/resource_allocation_repository.py (1)
23-28: The TOCTOU race is real, but the suggested fix is incomplete; useupdate_one()for true atomicity.Two concurrent callers can both read the same doc in an "active" state and both update it to "released"—the check-then-act pattern is inherently racy. Adding a status filter to the query is a good step, but the suggested code still uses
doc.set(), which is a separate operation after theifcheck and does not close the race window.For atomic conditional updates with Beanie 2.0.1, use
find_one(...filters...).update_one()instead:Atomic fix with Beanie 2.0.1 API
from beanie.operators import Set async def release_allocation(self, allocation_id: str) -> bool: now = datetime.now(timezone.utc) result = await ResourceAllocationDocument.find_one( (ResourceAllocationDocument.allocation_id == allocation_id) & (ResourceAllocationDocument.status == "active") ).update_one( Set({ ResourceAllocationDocument.status: "released", ResourceAllocationDocument.released_at: now }) ) return result.modified_count > 0This ensures the status filter and update happen in a single server-side operation, eliminating the race.
backend/app/db/repositories/user_repository.py (1)
29-29: Type annotation mismatch with field expressions in conditions list.At line 29,
conditionsis typed aslist[BaseFindOperator], but at line 41 it appendsUserDocument.role == role, which is a field expression, not aBaseFindOperator. Whilefind()accepts both types at runtime, this type annotation is incomplete. Line 34–37 appendsOr(...)which does conform toBaseFindOperator, creating a heterogeneous list that violates the declared type.Suggested fix
- conditions: list[BaseFindOperator] = [] + conditions: list[BaseFindOperator | Any] = []Alternatively, use
list[Any]for simplicity, or investigate whether Beanie exports a more specific union type that encompasses both field expressions and operators.
🧹 Nitpick comments (4)
backend/app/db/repositories/notification_repository.py (1)
202-213: Consider batching the subscription queries to avoid N+1 pattern.Currently makes one database query per channel. While the number of channels is likely small, fetching all subscriptions in a single query would be more efficient.
♻️ Suggested optimization
async def get_all_subscriptions(self, user_id: str) -> dict[NotificationChannel, DomainNotificationSubscription]: subs: dict[NotificationChannel, DomainNotificationSubscription] = {} - for channel in NotificationChannel: - doc = await NotificationSubscriptionDocument.find_one( - NotificationSubscriptionDocument.user_id == user_id, - NotificationSubscriptionDocument.channel == channel, - ) - if doc: - subs[channel] = DomainNotificationSubscription.model_validate(doc, from_attributes=True) - else: - subs[channel] = DomainNotificationSubscription(user_id=user_id, channel=channel, enabled=True) + docs = await NotificationSubscriptionDocument.find( + NotificationSubscriptionDocument.user_id == user_id, + ).to_list() + existing = {doc.channel: doc for doc in docs} + for channel in NotificationChannel: + if channel in existing: + subs[channel] = DomainNotificationSubscription.model_validate(existing[channel], from_attributes=True) + else: + subs[channel] = DomainNotificationSubscription(user_id=user_id, channel=channel, enabled=True) return subsbackend/app/db/repositories/admin/admin_events_repository.py (1)
71-73: Consider using Beanie operators for consistency.This query still uses dict syntax while the surrounding code now uses Beanie expressions. For full consistency, you could use Beanie's
NEoperator:from beanie.operators import NE related_query = EventDocument.find( EventDocument.metadata.correlation_id == doc.metadata.correlation_id, NE(EventDocument.event_id, event_id) )However, the current dict-based approach works correctly and this is purely a stylistic consideration.
backend/app/db/repositories/resource_allocation_repository.py (1)
9-13: Multi-argfind()syntax is correct in Beanie 2.0.1. Beanie'sfind()method explicitly accepts multiple filter expressions as separate positional arguments, so the code at lines 10–13 is valid and requires no changes.The
release_allocation()method (lines 23–28) follows a read-then-write pattern without an atomic check. If the document is deleted or modified between thefind_one()anddoc.set()calls, the update may fail or succeed silently depending on Beanie's behavior. Consider guarding the release with a status check (e.g., only release if currently "active") or using an atomic update if the business logic requires it, otherwise document that releases are intentionally idempotent.backend/app/db/repositories/admin/admin_user_repository.py (1)
4-5: Type annotation may be inaccurate for Beanie 2.0 expression API.In Beanie 2.0, direct field equality expressions (e.g.,
UserDocument.role == role) return expression objects that may not beBaseFindOperatorinstances. Sincefind()acceptsMapping | boolas well, this works at runtime but the type hint could be more accurate.Consider using a broader type or
Anyfor the conditions list if strict type checking is enabled.Also applies to: 33-33
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
backend/app/db/repositories/admin/admin_events_repository.pybackend/app/db/repositories/admin/admin_settings_repository.pybackend/app/db/repositories/admin/admin_user_repository.pybackend/app/db/repositories/event_repository.pybackend/app/db/repositories/execution_repository.pybackend/app/db/repositories/notification_repository.pybackend/app/db/repositories/replay_repository.pybackend/app/db/repositories/resource_allocation_repository.pybackend/app/db/repositories/saga_repository.pybackend/app/db/repositories/saved_script_repository.pybackend/app/db/repositories/sse_repository.pybackend/app/db/repositories/user_repository.pybackend/app/db/repositories/user_settings_repository.py
🧰 Additional context used
🧬 Code graph analysis (10)
backend/app/db/repositories/event_repository.py (2)
backend/app/db/docs/event.py (1)
EventDocument(13-71)backend/app/services/coordinator/queue_manager.py (1)
execution_id(30-31)
backend/app/db/repositories/admin/admin_user_repository.py (2)
backend/app/domain/user/user_models.py (2)
User(44-57)UserUpdate(60-80)backend/app/db/repositories/user_repository.py (1)
update_user(53-62)
backend/app/db/repositories/saved_script_repository.py (3)
backend/app/api/routes/saved_scripts.py (1)
list_saved_scripts(31-38)backend/app/services/saved_script_service.py (1)
list_saved_scripts(104-118)backend/app/domain/saved_script/models.py (1)
DomainSavedScript(21-28)
backend/app/db/repositories/sse_repository.py (1)
backend/app/services/coordinator/queue_manager.py (1)
execution_id(30-31)
backend/app/db/repositories/replay_repository.py (2)
backend/app/services/event_replay/replay_service.py (1)
get_session(324-325)backend/app/domain/replay/models.py (1)
ReplaySessionState(119-146)
backend/app/db/repositories/admin/admin_events_repository.py (1)
backend/app/db/docs/event.py (1)
EventDocument(13-71)
backend/app/db/repositories/notification_repository.py (1)
backend/app/domain/notification/models.py (1)
DomainNotification(16-45)
backend/app/db/repositories/user_repository.py (1)
backend/app/services/coordinator/queue_manager.py (1)
user_id(34-35)
backend/app/db/repositories/execution_repository.py (1)
backend/app/services/coordinator/queue_manager.py (1)
execution_id(30-31)
backend/app/db/repositories/user_settings_repository.py (2)
backend/app/services/coordinator/queue_manager.py (1)
user_id(34-35)backend/app/domain/user/settings_models.py (1)
DomainUserSettings(34-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Integration Tests
- GitHub Check: E2E Tests
🔇 Additional comments (37)
backend/app/db/repositories/sse_repository.py (2)
9-17: LGTM!The migration from dict-style filter to Beanie expression-based query (
ExecutionDocument.execution_id == execution_id) is correct and follows Beanie 2.x idiomatic patterns. This improves type safety and IDE support.
19-23: LGTM!Consistent use of Beanie expression-based query matching the pattern in
get_execution_status. Themodel_validatewithfrom_attributes=Trueis the correct Pydantic v2 approach for converting ORM/document objects to domain models.backend/app/db/repositories/replay_repository.py (3)
18-29: LGTM! Clean migration to Beanie expression syntax.The conversion from dict-based filters to Beanie field-expression syntax (
ReplaySessionDocument.session_id == session_id) is the correct approach for Beanie 2.x. This provides better type safety and IDE support while maintaining identical behavior.
48-54: LGTM!Consistent use of Beanie expression syntax with proper null handling.
71-79: LGTM!The
doc.set()method is the correct Beanie approach for partial atomic updates, and the expression syntax change is consistent with the rest of the file.backend/app/db/repositories/execution_repository.py (1)
29-29: LGTM! Clean migration to Beanie field expressions.The changes consistently replace dict-based queries with Beanie's type-safe field expression syntax across
get_execution,update_execution,write_terminal_result, anddelete_execution. This improves type safety and IDE support while maintaining identical runtime behavior.Also applies to: 38-38, 49-49, 84-84
backend/app/db/repositories/saved_script_repository.py (4)
11-16: LGTM!The migration from
Eq()to direct field equality expressions is the idiomatic Beanie query syntax. Thefind_onecorrectly combines both conditions with AND logic.
24-27: LGTM!Consistent use of field equality expressions for the query filter.
35-43: LGTM!Consistent query pattern for the delete operation.
45-47: LGTM!The
find()with field equality expression followed byto_list()is the standard Beanie pattern for retrieving multiple documents.backend/app/db/repositories/notification_repository.py (7)
32-35: LGTM!Clean migration to Beanie expression-based filters. The compound query ensures user-scoped access control.
44-47: LGTM!Consistent use of Beanie expressions for the lookup query.
53-56: LGTM!Expression syntax is correct and consistent with the repository pattern.
64-65: LGTM!Correct usage of Beanie expressions with
update_many()for bulk status updates.
70-73: LGTM!Properly scoped deletion with expression-based filter.
171-174: LGTM!Expression-based filter with appropriate default fallback when no subscription exists.
183-186: LGTM!Expression-based filter correctly used in the upsert pattern.
backend/app/db/repositories/admin/admin_settings_repository.py (1)
19-19: LGTM! Clean migration to Beanie expression syntax.The refactor from dict-based queries (
{"settings_id": "global"}) to Beanie's typed field comparison (SystemSettingsDocument.settings_id == "global") is correct and consistent across all three methods. This provides better type safety and IDE support while maintaining identical behavior.Also applies to: 43-43, 71-71
backend/app/db/repositories/admin/admin_events_repository.py (3)
65-65: LGTM! Consistent use of Beanie expression syntax for document lookups.The refactor to typed field comparisons (
EventDocument.event_id == event_id) is correct and aligns with the indexed field defined inEventDocument(seeevent.pyline 19:event_id: Indexed(str, unique=True)).Also applies to: 89-89
193-196: LGTM! Clean migration of replay session queries.The conversion to Beanie expression syntax for
ReplaySessionDocumentlookups is correct. The surrounding logic (None checks, model validation) remains unchanged and appropriate.Also applies to: 202-206, 209-211
254-254: LGTM!Correctly migrated to Beanie expression syntax for
ExecutionDocumentlookup within the replay progress calculation.backend/app/db/repositories/event_repository.py (3)
68-72: LGTM!The migration from dict-based query
{"event_id": event_id}to Beanie field expressionEventDocument.event_id == event_idis correct and aligns with Beanie 2.x typed query patterns. This provides better type safety and IDE support.
148-152: LGTM!The
Orcondition correctly uses Beanie field expressions for bothexecution_idandaggregate_idcomparisons. This maintains the existing query semantics while adopting the typed query style.
325-330: LGTM!Consistent migration to Beanie field expression syntax, matching the pattern used in
get_event.backend/app/db/repositories/user_settings_repository.py (2)
16-20: LGTM!Correct migration to Beanie field expression syntax for querying user settings.
22-28: LGTM!The upsert pattern is correctly implemented: find existing document, copy the
idif found, thensave()will update the existing record or insert a new one. The added logging improves observability.backend/app/db/repositories/saga_repository.py (4)
30-36: LGTM!The upsert pattern correctly migrates to Beanie field expressions. Finding the existing document by
saga_id, copying theidif found, then callingsave()ensures proper update-or-insert semantics.
45-47: LGTM!Consistent migration to Beanie field expression syntax.
80-90: LGTM!The migration to Beanie field expression is correct. The update logic properly modifies the document fields and saves.
Note: The condition
if error_message:means an empty string won't update the field. If clearing the error message is needed, consider usingif error_message is not None:instead.
92-94: LGTM!Correct migration to Beanie field expression for the
ExecutionDocumentquery.backend/app/db/repositories/admin/admin_user_repository.py (5)
44-45: LGTM!Direct field equality expression is idiomatic for Beanie 2.0.
53-84: LGTM!Field expression queries are correctly implemented and consistent with the pattern used in
user_repository.py.
86-96: LGTM!Cascade delete queries using field expressions are correctly implemented.
106-114: LGTM!Field expression query is correctly implemented.
98-102: Both nested field query patterns are valid for Beanie 2.0.Line 98's
EventDocument.metadata.user_idcorrectly accesses the nested Pydantic model (EventMetadata), producing the expected MongoDB dot-notation querymetadata.user_id. This is confirmed by the multiple indexes in EventDocument.Settings that referencemetadata.user_id,metadata.correlation_id, andmetadata.service_name.Line 101's
SagaDocument.context_data["user_id"]correctly queries the dict field using bracket notation, which is the standard Beanie pattern for dynamic dict fields. This same pattern is already used elsewhere in the codebase (saga_repository.py:19).backend/app/db/repositories/user_repository.py (2)
4-5: LGTM!Import cleanup correctly removes
Eqsince direct field equality (==) is now used instead. The retainedOrandRegExoperators are still in use.
13-15: LGTM!The migration from dictionary-based queries to Beanie field expressions (
UserDocument.field == value) is idiomatic and provides better type safety and IDE refactoring support.Also applies to: 22-24, 53-56, 64-68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 issues found across 11 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/pyproject.toml">
<violation number="1" location="backend/pyproject.toml:126">
P2: Inconsistent version pinning: all other dependencies use exact versions with `==`, but this uses `>=`. This breaks reproducible builds and is inconsistent with the project's dependency management approach. Consider pinning to an exact version like `monggregate==0.22.1`.</violation>
</file>
<file name="backend/app/db/repositories/dlq_repository.py">
<violation number="1" location="backend/app/db/repositories/dlq_repository.py:74">
P1: Incorrect average age calculation. This formula computes the midpoint between oldest and newest timestamps, not the true average age across all messages. Consider computing the sum of ages in the aggregation pipeline or iterating over all timestamps to calculate the actual mean.</violation>
</file>
<file name="backend/app/db/repositories/event_repository.py">
<violation number="1" location="backend/app/db/repositories/event_repository.py:349">
P1: Using `EventDocument.aggregate_id` as a dictionary key in `$match` will not work correctly. MongoDB expects string field names as keys. The field descriptor object won't serialize to `"aggregate_id"` - it should be a string literal.</violation>
<violation number="2" location="backend/app/db/repositories/event_repository.py:350">
P1: Using `EventDocument.timestamp` as a dictionary key in `$sort` will not work correctly. MongoDB expects string field names as keys. Use a string literal instead.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
backend/tests/integration/test_dlq_routes.py (3)
97-108: Fix broken attribute references causing MyPy failures.
DLQMessageResponseno longer hasevent_id,event_type, orage_secondsattributes. Per the migration, access event properties viamessage.event.event_idandmessage.event.event_type. Theage_secondsfield was removed entirely.🐛 Proposed fix
# If there are messages, validate their structure for message in messages_response.messages: assert isinstance(message, DLQMessageResponse) - assert message.event_id is not None - assert message.event_type is not None + assert message.event.event_id is not None + assert message.event.event_type is not None assert message.original_topic is not None assert message.retry_count >= 0 assert message.failed_at is not None assert message.status in DLQMessageStatus.__members__.values() - - # Check age_seconds is reasonable - if message.age_seconds is not None: - assert message.age_seconds >= 0
148-178: Fix broken attribute references in test_get_single_dlq_message_detail.Similar to the previous issue,
DLQMessageDetailno longer hasevent_idorevent_typeat the top level. Additionally, the test extractsevent_idfrom the raw JSON response but then asserts against the model's non-existent attribute.🐛 Proposed fix
# Validate detailed response detail_data = detail_response.json() message_detail = DLQMessageDetail(**detail_data) # Verify all fields are present - assert message_detail.event_id == event_id + assert message_detail.event.event_id == event_id assert message_detail.event is not None - assert isinstance(message_detail.event, dict) - assert message_detail.event_type is not None + assert message_detail.event.event_type is not None assert message_detail.original_topic is not NoneNote: Also update the extraction of
event_idfrom JSON at line 150 to use the nested path:- event_id = messages_data["messages"][0]["event_id"] + event_id = messages_data["messages"][0]["event"]["event_id"]
346-351: Fix broken attribute references in pagination test.The set comprehensions reference
msg.event_idwhich no longer exists.🐛 Proposed fix
# Messages should be different if page1.messages and page2.messages: - page1_ids = {msg.event_id for msg in page1.messages} - page2_ids = {msg.event_id for msg in page2.messages} + page1_ids = {msg.event.event_id for msg in page1.messages} + page2_ids = {msg.event.event_id for msg in page2.messages} # Should have no overlap assert len(page1_ids.intersection(page2_ids)) == 0backend/app/db/repositories/dlq_repository.py (1)
89-94: Fix incorrectevent_typefield path inget_messagesfilter.Line 92 uses
DLQMessageDocument.event_typeat the root level, butevent_typeis nested inside theeventobject. The docstring at line 12 ofdlq.pyexplicitly states this: "Access event_id/event_type via event.event_id, event.event_type". The database index on line 39 ofdlq.pyalso confirms the nested path:("event.event_type", ASCENDING).Other methods in this file access it correctly: line 49 uses
DLQMessageDocument.event.event_type, and line 108 uses the nested path string"event.event_id".The current code will silently fail to match any documents when filtering by
event_type.Proposed fix
conditions: list[Any] = [ DLQMessageDocument.status == status if status else None, DLQMessageDocument.original_topic == topic if topic else None, - DLQMessageDocument.event_type == event_type if event_type else None, + DLQMessageDocument.event.event_type == event_type if event_type else None, ]backend/app/db/repositories/event_repository.py (2)
74-93: Type mismatch:event_typeparameter isstrbut should beEventType.The
EventDocument.event_typefield is typed asEventType(per the model definition), but this method acceptsstr. This inconsistency may cause comparison issues or require implicit conversion. Other methods likeget_events_by_aggregateuseEventTypeorlist[EventType].Suggested fix
async def get_events_by_type( self, - event_type: str, + event_type: EventType, start_time: datetime | None = None, end_time: datetime | None = None, limit: int = 100, skip: int = 0, ) -> list[DomainEvent]:
347-362: Critical: Invalid aggregation pipeline keys - Beanie field references cannot be used as dict keys.Lines 349-350 use
EventDocument.aggregate_idandEventDocument.timestampas dictionary keys in raw MongoDB aggregation stages. These resolve toFieldInfoobjects at runtime, not strings. MongoDB expects string field names like"aggregate_id".This will cause the aggregation to fail or produce incorrect results.
Suggested fix
async def get_aggregate_replay_info(self, aggregate_id: str) -> EventReplayInfo | None: pipeline = [ - {"$match": {EventDocument.aggregate_id: aggregate_id}}, - {"$sort": {EventDocument.timestamp: 1}}, + {"$match": {"aggregate_id": aggregate_id}}, + {"$sort": {"timestamp": 1}}, { "$group": { "_id": None, "events": {"$push": "$$ROOT"}, "event_count": {"$sum": 1}, "event_types": {"$addToSet": S.field(EventDocument.event_type)}, "start_time": {"$min": S.field(EventDocument.timestamp)}, "end_time": {"$max": S.field(EventDocument.timestamp)}, } }, {"$project": {"_id": 0}}, ]
🤖 Fix all issues with AI agents
In @backend/app/db/repositories/dlq_repository.py:
- Around line 71-75: The avg_age_seconds currently sets a midpoint; replace it
with the true mean by summing all failed_at timestamps and dividing by count:
query for sum_failed_at using S.sum(S.field(DLQMessageDocument.failed_at)) and
compute avg_age_seconds as (now_timestamp * count - sum_failed_at) / count (or
equivalently total_age_seconds / count), making sure to handle count == 0 to
avoid division by zero; update the AgeStatistics construction in the age_stats
assignment to use this computed value (or rename if you intend to keep the
midpoint).
In @backend/app/db/repositories/event_repository.py:
- Around line 18-20: The import block is unsorted; reorder the imports into
proper lexical order by module and by symbol to satisfy Ruff (I001). Ensure the
module order is beanie.odm.enums, then beanie.operators, then monggregate, and
alphabetize the names in the beanie.operators import so that GTE, In, LT, LTE,
Not, Or, RegEx are listed in alphabetical order; alternatively run `uv run ruff
check . --fix` to auto-fix.
In @backend/pyproject.toml:
- Line 126: The dependency declaration for monggregate uses a range specifier
(monggregate>=0.22.1) which is inconsistent with the repo's exact-pin policy;
update the pyproject dependency for monggregate to an exact version (e.g.,
change monggregate>=0.22.1 to monggregate==0.22.1) so it matches other entries
and ensures reproducible builds.
🧹 Nitpick comments (5)
backend/app/api/routes/dlq.py (2)
47-50: Consider addingfrom_attributes=Truefor consistency.Lines 31 and 58 use
model_validate(obj, from_attributes=True), but this line omits it. While it may work ifmsgis already a dict-like object, addingfrom_attributes=Truewould be consistent with other endpoints and more robust if the source type changes.♻️ Suggested change for consistency
- messages = [DLQMessageResponse.model_validate(msg) for msg in result.messages] + messages = [DLQMessageResponse.model_validate(msg, from_attributes=True) for msg in result.messages]
104-106: Accessing private method_discard_messagebreaks encapsulation.The route directly calls
dlq_manager._discard_message(), which is a private method (indicated by the underscore prefix). Consider exposing a public method onDLQManagerfor discarding messages, or document that this private method is intended for internal use by the route layer.backend/app/db/repositories/dlq_repository.py (1)
111-150: LGTM!The two-stage aggregation pipeline correctly computes topic summaries with status breakdowns. The use of
$arrayToObjectto pivot status counts is a clean approach.Minor observation:
doc_count(line 133) andtotal_messages(line 129) compute the same value. You could reuse$total_messagesin the divide expression on line 145 to avoid the redundant field.backend/app/db/repositories/admin/admin_events_repository.py (2)
126-131: Inconsistent query pattern with Beanie field references as dict keys.This query uses Beanie field objects as dict keys (
EventDocument.timestamp,EventDocument.event_type), which differs from the standard string-key pattern used elsewhere in this file (e.g., line 71:{"metadata.correlation_id": doc.metadata.correlation_id}).For consistency and reliability, consider using either:
- String keys:
{"timestamp": {"$gte": start_time}, "event_type": {"$regex": ...}}- Beanie operators: Use
And(),GTE(),RegEx()operators as in_event_filter_conditions()
157-158: Type annotation is overly broad.The type annotation
list[HourlyEventCount | dict[str, Any]]is misleading since the list comprehension only producesHourlyEventCountinstances. Thedict[str, Any]alternative is never used.♻️ Suggested fix
-events_by_hour: list[HourlyEventCount | dict[str, Any]] = [HourlyEventCount(**doc) for doc in hourly_result] +events_by_hour = [HourlyEventCount(**doc) for doc in hourly_result]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
backend/app/api/routes/dlq.pybackend/app/db/repositories/admin/admin_events_repository.pybackend/app/db/repositories/dlq_repository.pybackend/app/db/repositories/event_repository.pybackend/app/db/repositories/saga_repository.pybackend/app/dlq/models.pybackend/app/domain/events/query_builders.pybackend/app/schemas_pydantic/dlq.pybackend/pyproject.tomlbackend/tests/integration/test_dlq_routes.py
💤 Files with no reviewable changes (1)
- backend/app/domain/events/query_builders.py
🧰 Additional context used
🧬 Code graph analysis (7)
backend/app/db/repositories/saga_repository.py (2)
backend/app/domain/events/event_models.py (1)
SortDirection(20-22)backend/app/domain/saga/models.py (1)
Saga(10-27)
backend/app/schemas_pydantic/dlq.py (1)
backend/app/dlq/models.py (5)
AgeStatistics(138-145)DLQMessageStatus(12-18)EventTypeStatistic(129-135)RetryStrategy(21-28)TopicStatistic(119-126)
backend/app/db/repositories/admin/admin_events_repository.py (3)
backend/app/db/docs/event.py (1)
EventDocument(13-71)backend/app/services/coordinator/queue_manager.py (2)
user_id(34-35)execution_id(30-31)backend/app/domain/events/event_models.py (2)
HourlyEventCount(114-116)UserEventCount(120-122)
backend/app/api/routes/dlq.py (1)
backend/app/schemas_pydantic/dlq.py (3)
DLQStats(10-19)DLQMessageDetail(93-113)DLQTopicSummaryResponse(79-90)
backend/tests/integration/test_dlq_routes.py (1)
backend/app/dlq/models.py (4)
AgeStatistics(138-145)DLQMessageStatus(12-18)EventTypeStatistic(129-135)TopicStatistic(119-126)
backend/app/db/repositories/event_repository.py (2)
backend/app/db/docs/event.py (1)
EventDocument(13-71)backend/app/events/event_store.py (1)
get_events_by_type(123-147)
backend/app/dlq/models.py (2)
backend/app/core/utils.py (1)
StringEnum(6-31)backend/app/domain/enums/events.py (1)
EventType(4-87)
🪛 GitHub Actions: MyPy Type Checking
backend/tests/integration/test_dlq_routes.py
[error] 99-99: "DLQMessageResponse" has no attribute "event_id"; maybe "event"? [attr-defined]
[error] 100-100: "DLQMessageResponse" has no attribute "event_type" [attr-defined]
[error] 107-107: "DLQMessageResponse" has no attribute "age_seconds" [attr-defined]
[error] 108-108: "DLQMessageResponse" has no attribute "age_seconds" [attr-defined]
[error] 160-160: "DLQMessageDetail" has no attribute "event_id"; maybe "event"? [attr-defined]
[error] 348-348: "DLQMessageResponse" has no attribute "event_id"; maybe "event"? [attr-defined]
[error] 349-349: "DLQMessageResponse" has no attribute "event_id"; maybe "event"? [attr-defined]
🪛 GitHub Actions: Ruff Linting
backend/app/db/repositories/saga_repository.py
[error] 1-1: I001 Import block is un-sorted or un-formatted. Organize imports. Fixable with '--fix'. Command: uv run ruff check . --config pyproject.toml
backend/app/db/repositories/event_repository.py
[error] 1-1: I001 Import block is un-sorted or un-formatted. Organize imports. Fixable with '--fix'. Command: uv run ruff check . --config pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Frontend
- GitHub Check: Build Backend
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
- GitHub Check: Integration Tests
🔇 Additional comments (33)
backend/app/schemas_pydantic/dlq.py (4)
6-7: LGTM! Clean migration to typed models and DomainEvent.The imports correctly pull in the typed statistics models from
app.dlqandDomainEventfrom the domain layer, aligning with the unified event model architecture.
15-18: Type safety improvement for DLQStats.Using
dict[DLQMessageStatus, int],list[TopicStatistic],list[EventTypeStatistic], andAgeStatisticsinstead of genericdict[str, int]andlist[dict[str, Any]]provides better type checking and IDE support.
22-37: DLQMessageResponse correctly mirrors DLQMessage with nested DomainEvent.The model now embeds
event: DomainEventas the central payload container. Consumers should accessmessage.event.event_idandmessage.event.event_typeinstead of the previous top-level fields, as noted in the migration guide.
93-113: DLQMessageDetail aligned with DLQMessage structure.Consistent with DLQMessageResponse, this detailed model now uses the embedded
event: DomainEventpattern withproducer_idexplicitly exposed.backend/app/dlq/models.py (3)
31-52: Well-structured migration from dataclass to Pydantic BaseModel.The DLQMessage model correctly:
- Uses
from_attributes=Truefor ORM compatibility- Embeds
DomainEventas the unified event payload- Provides sensible defaults for error handling scenarios
- Documents the access pattern change in the docstring
The
Field(default_factory=...)pattern for mutable defaults (datetime, dict) is the correct Pydantic approach.
118-157: Statistics models converted to Pydantic BaseModel with from_attributes.TopicStatistic, EventTypeStatistic, AgeStatistics, and DLQStatistics are now Pydantic models enabling
model_validatefrom ORM/domain objects. This aligns with the route layer changes that usemodel_validate(obj, from_attributes=True).
189-200: DLQTopicSummary converted to Pydantic BaseModel.Consistent with other statistics models, enabling direct validation from repository results.
backend/tests/integration/test_dlq_routes.py (1)
50-76: LGTM! Statistics assertions correctly updated to typed models.The test properly validates:
AgeStatisticsinstance for age stats- Iteration over
DLQMessageStatusenum valuesTopicStatisticandEventTypeStatistictyped instancesbackend/app/api/routes/dlq.py (2)
28-31: LGTM! Correct use of model_validate with from_attributes.The
DLQStats.model_validate(stats, from_attributes=True)pattern correctly converts the domain statistics object to the Pydantic response model.
109-112: LGTM! Updated to modern Python type hints.Using
list[DLQTopicSummaryResponse](lowercase) instead ofList[DLQTopicSummaryResponse]aligns with Python 3.9+ conventions.backend/app/db/repositories/dlq_repository.py (4)
1-20: LGTM!Imports are correctly updated to use monggregate's
PipelineandShelpers, and unused typing imports have been cleaned up.
100-105: LGTM!The use of
model_validate(d, from_attributes=True)is the correct Pydantic v2 pattern for converting Beanie documents to domain models.
107-109: LGTM!The nested path query for
event.event_idis correct for the new data model structure.
152-173: LGTM!Both methods correctly use the nested
event.event_idpath for lookups, consistent with the migrated data model. The find-then-save pattern is appropriate for DLQ operations where concurrent updates to the same message are unlikely.backend/app/db/repositories/event_repository.py (5)
68-72: LGTM!The migration to Beanie field-based query syntax is correct.
241-259: LGTM!The
cleanup_old_eventsmethod correctly uses Beanie operators for building query conditions.
316-328: LGTM!The
list_event_typesmethod correctly usesS.field()within aggregation pipeline values.
330-342: LGTM!The
delete_event_with_archivalmethod correctly uses Beanie field-based query syntax.
181-234: The usage ofS.field()with Beanie field references is correct. Integration tests confirm the aggregation pipeline executes successfully and produces valid EventStatistics objects at runtime with monggregate 0.22.1. The pattern is used consistently across multiple repositories (saga_repository.py, dlq_repository.py, admin_events_repository.py) and correctly handles both simple fields (e.g.,event_type) and nested fields (e.g.,metadata.service_name).backend/app/db/repositories/admin/admin_events_repository.py (6)
6-6: LGTM!The monggregate import for
PipelineandSis appropriate for the new aggregation pipeline implementations.
65-65: LGTM!Correct use of Beanie's field-based expression syntax for
find_one(). This is more readable and type-safe than raw dict queries.
134-142: Same field reference concern applies here.The
by=EventDocument.event_typeparameter passes a Beanie field object to monggregate'sgroup(). Covered by the verification request above.
169-169: Good defensive filtering for null user_ids.The
if doc["user_id"]filter correctly excludes documents whereuser_idisNoneor empty, preventing invalidUserEventCountinstances.
232-232: LGTM!Consistent use of Beanie field-based expressions for
find_one()lookups acrossReplaySessionDocumentandExecutionDocumentqueries (lines 232, 241, 248, 293).
98-119: This pattern is correct and follows established codebase conventions.The use of Beanie field references (e.g.,
EventDocument.timestamp) as dict keys in monggregate Pipeline is not non-standard—it's the standard pattern used consistently across the entire codebase. The same pattern appears insaga_repository.py,dlq_repository.py, and multiple other repositories withSagaDocument,DLQMessageDocument, andExecutionDocument.Beanie field references are designed to serialize correctly when used with monggregate Pipeline operations like
.match(),.group(), andS.field(). This integration works reliably throughout the codebase. The alternative of using string field names would be inconsistent with established conventions and is not used anywhere in the repository.No changes are needed.
backend/app/db/repositories/saga_repository.py (8)
30-36: LGTM!The migration to field-equality query syntax (
SagaDocument.saga_id == saga.saga_id) aligns with Beanie 2.x patterns. The upsert logic correctly preserves the documentidfor updates.
45-47: LGTM!Clean field-equality query pattern.
49-66: LGTM!Formatting adjustments with unchanged semantics.
80-90: LGTM!Field-equality query migration looks correct. The conditional
if error_message:appropriately only updates the error when provided.
92-94: LGTM!Field-equality query on
ExecutionDocumentis correct.
103-118: LGTM!Signature formatting adjusted; logic unchanged.
137-145: S.field() with Beanie field descriptors is the established pattern across the codebase.The code correctly uses
S.field(SagaDocument.completed_at)andS.field(SagaDocument.created_at). This same pattern is used consistently throughout dlq_repository.py, event_repository.py, and admin_events_repository.py with various document fields and nested fields. No changes needed.Likely an incorrect or invalid review comment.
96-101: Use string field names instead of Beanie field descriptors withPipeline.group().The
byparameter should receive a string field name (e.g.,by="state"orby="$state"), not a Beanie field descriptor likeSagaDocument.state. Monggregate is driver-agnostic and does not have special integration for Beanie field descriptors; it expects standard MongoDB aggregation pipeline inputs. The correct pattern is:pipeline = Pipeline().group(by="state", query={"count": S.sum(1)})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/tests/integration/test_dlq_routes.py (3)
144-148: Inconsistent JSON access pattern will cause test failure.Line 146 accesses the raw JSON dict using
["event_id"], but per the new domain event structure,event_idis nested underevent. This is inconsistent with the updated Pydantic model access elsewhere (e.g., lines 99-100, 157-158).🐛 Proposed fix
- event_id = messages_data["messages"][0]["event_id"] + event_id = messages_data["messages"][0]["event"]["event_id"]
220-222: Inconsistent JSON access pattern will cause test failure.Same issue as line 146: raw JSON dict access uses old
["event_id"]pattern instead of["event"]["event_id"].🐛 Proposed fix
- event_ids = [msg["event_id"] for msg in messages_data["messages"][:2]] + event_ids = [msg["event"]["event_id"] for msg in messages_data["messages"][:2]]
256-258: Inconsistent JSON access pattern will cause test failure.Same issue: raw JSON dict access uses old
["event_id"]pattern instead of["event"]["event_id"].🐛 Proposed fix
- event_id = messages_data["messages"][0]["event_id"] + event_id = messages_data["messages"][0]["event"]["event_id"]backend/app/db/repositories/event_repository.py (1)
348-363: Bug: Invalid field references in$matchand$sortstages.Lines 350-351 use
EventDocument.aggregate_idandEventDocument.timestampdirectly as dictionary keys. In Python, this evaluates to Beanie field descriptor objects, not string field names. MongoDB aggregation expects string keys in$matchand$sortstages, so this pipeline will fail to match any documents.The
S.field()calls on lines 357-359 are correct because those aggregation operators ($addToSet,$min,$max) expect field reference strings like"$event_type".🐛 Proposed fix
async def get_aggregate_replay_info(self, aggregate_id: str) -> EventReplayInfo | None: pipeline = [ - {"$match": {EventDocument.aggregate_id: aggregate_id}}, - {"$sort": {EventDocument.timestamp: 1}}, + {"$match": {"aggregate_id": aggregate_id}}, + {"$sort": {"timestamp": 1}}, { "$group": { "_id": None, "events": {"$push": "$$ROOT"}, "event_count": {"$sum": 1}, "event_types": {"$addToSet": S.field(EventDocument.event_type)}, "start_time": {"$min": S.field(EventDocument.timestamp)}, "end_time": {"$max": S.field(EventDocument.timestamp)}, } }, {"$project": {"_id": 0}}, ]
🤖 Fix all issues with AI agents
In @backend/app/db/repositories/saga_repository.py:
- Around line 138-147: S.field() is currently being called with Beanie document
attributes; change calls like S.field(SagaDocument.completed_at) and
S.field(SagaDocument.created_at) to use string field names ("completed_at",
"created_at") inside the duration_pipeline projection, and apply the same fix
wherever S.field(...) is used in dlq_repository.py, event_repository.py, and
admin_events_repository.py so all aggregates pass string field names; keep the
rest of the aggregation (group and milliseconds-to-seconds conversion) unchanged
and ensure references to SagaDocument.find(...) aggregate remain the same.
- Around line 97-102: The aggregation group calls are using raw Beanie
FieldDescriptor objects (e.g., SagaDocument.state, EventDocument.event_type,
EventDocument.metadata.user_id, DLQMessageDocument.status,
DLQMessageDocument.event.event_type) which prevents monggregate from resolving
field paths; update each Pipeline.group(by=...) to wrap the field descriptors
with S.field(...) (for example change by=SagaDocument.state to
by=S.field(SagaDocument.state)) in the methods count_sagas_by_state and the
other group usages noted so the group key is a proper field path.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/db/repositories/event_repository.pybackend/app/db/repositories/saga_repository.pybackend/tests/integration/test_dlq_routes.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/db/repositories/event_repository.py (1)
backend/app/db/docs/event.py (1)
EventDocument(13-71)
backend/app/db/repositories/saga_repository.py (1)
backend/app/domain/saga/models.py (1)
Saga(10-27)
backend/tests/integration/test_dlq_routes.py (1)
backend/app/dlq/models.py (4)
AgeStatistics(138-145)DLQMessageStatus(12-18)EventTypeStatistic(129-135)TopicStatistic(119-126)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
- GitHub Check: Integration Tests
🔇 Additional comments (20)
backend/tests/integration/test_dlq_routes.py (6)
5-5: LGTM!Import updated correctly to use the new typed models from
app.dlq.
50-76: LGTM!Tests correctly updated to validate the new typed statistic models (
AgeStatistics,TopicStatistic,EventTypeStatistic) and iterate overDLQMessageStatusenum values.
99-104: LGTM!Access pattern correctly updated to use
message.event.event_idandmessage.event.event_typefor Pydantic model access.
155-158: LGTM!Correctly updated to access event fields via
message_detail.event.event_idandmessage_detail.event.event_type.
188-190: LGTM!Minor formatting adjustment, no functional change.
343-344: LGTM!Pagination test correctly updated to use
msg.event.event_idfor Pydantic model access.backend/app/db/repositories/event_repository.py (7)
7-7: LGTM!The
Shelper import from monggregate is correctly added to support type-safe field references in aggregation pipelines.
69-73: LGTM!The refactor from dict-based query to Beanie field comparison (
EventDocument.event_id == event_id) is correct and provides better type safety.
146-167: LGTM!The
get_execution_eventsmethod correctly uses Beanie field comparisons within theOroperator for matching eitherexecution_idoraggregate_id.
182-235: LGTM!The aggregation pipeline correctly uses
S.field()for type-safe field references in grouping and date operations. The faceted aggregation for statistics by type, service, and hour is well-structured.
317-329: LGTM!The
list_event_typesmethod correctly usesS.field()for the group stage, consistent with other aggregation methods in this file.
331-343: LGTM!The refactor to use Beanie field comparison for finding the event is correct and consistent with other methods.
242-260: LGTM!The
cleanup_old_eventsmethod correctly uses Beanie operators (LT,In) with field references for the time-based cleanup query.backend/app/db/repositories/saga_repository.py (7)
7-7: LGTM!The monggregate import aligns with the migration to Pipeline-based aggregations across the repository.
31-37: LGTM!The migration to field equality expressions and the upsert pattern (assigning
existing.idto preserve the document identity) is correct for Beanie.
46-48: LGTM!Clean migration to field equality expression.
50-52: LGTM!Formatting adjustment with no semantic change.
81-91: LGTM!Field equality expression is consistent with the other query migrations in this file.
93-95: LGTM!Consistent migration to field equality for
ExecutionDocument.
104-109: LGTM!Formatting adjustment with no semantic change.
| async def count_sagas_by_state(self) -> dict[str, int]: | ||
| pipeline = [{"$group": {"_id": "$state", "count": {"$sum": 1}}}] | ||
| pipeline = Pipeline().group(by=SagaDocument.state, query={"count": S.sum(1)}) | ||
| result = {} | ||
| async for doc in SagaDocument.aggregate(pipeline): | ||
| async for doc in SagaDocument.aggregate(pipeline.export()): | ||
| result[doc["_id"]] = doc["count"] | ||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
monggregate Pipeline group by parameter documentation
💡 Result:
You likely mean MongoDB's aggregation $group stage (used to "group by" in pipelines). Key points:
- Syntax: { $group: { _id: , : { : }, ... } } — _id is the group key (use null to aggregate all docs). [1]
- Group key: any valid expression (field path like "$field", compound object, or computed expression). [1]
- Accumulators: common operators include $sum, $avg, $min, $max, $first, $last, $push, $addToSet, $count, and newer ones like $top/$bottom/$firstN/$lastN. [1]
- Memory/disk: $group uses up to 100 MB RAM by default; set allowDiskUse: true on db.collection.aggregate(...) to spill to disk for large groups. [1]
- Performance: $group is a blocking stage; sorting+grouping patterns can be optimized with indexes when using $first/$last. [1]
- Examples: grouping by a field: db.collection.aggregate([{ $group: { _id: "$item", total: { $sum: "$amount" } } }]). [1]
For full reference and complete list of accumulators, examples, and behavior, see the official MongoDB docs: [1] and an alternate manual copy: [2].
Sources
[1] MongoDB Manual — $group (aggregation stage).
[2] $group (aggregation) — MongoDB Manual (mirror).
🏁 Script executed:
# Find SagaDocument definition and understand its structure
find . -name "*.py" -type f | xargs grep -l "class SagaDocument" | head -5Repository: HardMax71/Integr8sCode
Length of output: 95
🏁 Script executed:
# Look for other uses of Pipeline().group() in the codebase to see patterns
rg "Pipeline\(\)\.group" --type py -A 2 -B 1Repository: HardMax71/Integr8sCode
Length of output: 1722
🏁 Script executed:
# Read the SagaDocument definition
cat -n ./backend/app/db/docs/saga.py | head -50Repository: HardMax71/Integr8sCode
Length of output: 1905
🌐 Web query:
monggregate Python library documentation group method
💡 Result:
The monggregate Pipeline.group(...) method builds a $group stage; call signature (examples from the docs):
pipeline.group(
by="year",
query={
"movie_count": {"$sum": 1},
"movie_titles": {"$push": "$title"}
}
)
Notes:
- monggregate accepts a "by" argument instead of MongoDB's _id (convenience alias).
- The "query" argument contains accumulator expressions (e.g., $sum, $push) as in standard MongoDB $group usage. (vianneymi.github.io)
Citations:
🏁 Script executed:
# Check if there's a requirements.txt or pyproject.toml to understand monggregate version
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "poetry.lock" | head -5Repository: HardMax71/Integr8sCode
Length of output: 90
🏁 Script executed:
# Search for any monggregate imports or usage examples in the codebase
rg "from monggregate" --type py -A 2 | head -30Repository: HardMax71/Integr8sCode
Length of output: 988
🏁 Script executed:
# Read pyproject.toml to see monggregate version
cat ./backend/pyproject.toml | grep -A 5 -B 5 monggregateRepository: HardMax71/Integr8sCode
Length of output: 284
🏁 Script executed:
# Search for test files related to saga_repository
find . -path "*/test*" -name "*saga*" -type f | head -10Repository: HardMax71/Integr8sCode
Length of output: 430
🏁 Script executed:
# Look for any tests of aggregation pipelines
rg "count_sagas_by_state" --type pyRepository: HardMax71/Integr8sCode
Length of output: 172
🌐 Web query:
monggregate Beanie field reference support MongoDB aggregation
💡 Result:
Short answer: Yes — Beanie lets you run MongoDB aggregation pipelines (including stages that reference other collections such as $lookup/$graphLookup), but Beanie does not automatically "follow" Beanie Link/Reference fields for you inside an aggregation; you must express the join/field-reference with standard aggregation stages or project fields yourself. Use Document.aggregate(...) (or .find(...).aggregate(...)) with a pipeline and, if you want typed outputs, pass a projection_model. See examples and notes below.
Key refs:
- Beanie aggregation usage and projection_model: Beanie docs (Aggregation). [1]
- Beanie relations (Link fields are supported for top‑level fields, but fetching/resolving is an explicit operation — not automatic in arbitrary aggregation pipelines). [2]
- MongoDB aggregation reference (use $lookup/$graphLookup and standard aggregation expressions to reference fields/other collections). [3]
If you want, tell me your Beanie Document models and the aggregation you want to run and I’ll draft the exact pipeline + projection_model code.
🌐 Web query:
monggregate GitHub source code Pipeline group method implementation
💡 Result:
The Pipeline.group implementation is in the monggregate GitHub repo at src/monggregate/pipeline.py — see the repository and documentation for the group method and examples. [1][2]
Sources:
- VianneyMI/monggregate — GitHub repository (root; file path: src/monggregate/pipeline.py). https://github.com/VianneyMI/monggregate
- Monggregate documentation (pipeline/group usage examples). https://vianneymi.github.io/monggregate/
If you want, I can fetch and paste the exact group() method source from src/monggregate/pipeline.py.
Citations:
- 1: https://github.com/VianneyMI/monggregate
- 2: https://vianneymi.github.io/monggregate/?utm_source=openai
🏁 Script executed:
# Check integration tests to see if count_sagas_by_state is actually invoked
grep -r "count_sagas_by_state" ./backend/tests/ --type pyRepository: HardMax71/Integr8sCode
Length of output: 183
🏁 Script executed:
# Look for any admin routes or tests that call aggregation methods
rg "count_sagas" --type py -B 2 -A 5Repository: HardMax71/Integr8sCode
Length of output: 794
🏁 Script executed:
# Check what type SagaDocument.state actually is at runtime
# by looking at how it's used in other parts of the code
rg "SagaDocument\.state" --type py | head -20Repository: HardMax71/Integr8sCode
Length of output: 772
🌐 Web query:
site:github.com/VianneyMI/monggregate group method by parameter type handling
💡 Result:
I can look that up in the repository and show how the group(...) method handles different parameter types. Do you want me to search the monggregate code on GitHub now and return the exact implementation and examples?
🌐 Web query:
monggregate "by=" parameter field descriptor Python object
💡 Result:
The "by=" parameter in monggregate is the field/field-expression descriptor used to specify the field(s) (or expression) to operate on for stages like group() and sort(). In practice you pass:
- a string field name (e.g. by="year"),
- an _id/compound/group expression (for group by) or
- an expression built with monggregate's S helpers (S.*) when you need computed keys.
Examples in the official docs show sort(by="year") and group(by="year", query={...}) and using S expressions for more complex keys. [1][2]
Sources:
- monggregate usage examples (sort/group). [1]
- monggregate docs / tutorial pages (expressions & S helper usage). [2]
🏁 Script executed:
# Check if there are any exception handlers or error logs around aggregation calls
rg "aggregate\(" ./backend/app/db/repositories/ --type py -A 3 | head -40Repository: HardMax71/Integr8sCode
Length of output: 3659
🏁 Script executed:
# Check imports to see what S is and understand monggregate API better
rg "from monggregate import" ./backend/app/db/repositories/ --type pyRepository: HardMax71/Integr8sCode
Length of output: 409
🏁 Script executed:
# Look at pyproject.toml more carefully for any monggregate-related config
cat ./backend/pyproject.toml | head -100Repository: HardMax71/Integr8sCode
Length of output: 3029
🏁 Script executed:
# Search for any code that calls count_sagas_by_state
rg "count_sagas_by_state" . --type pyRepository: HardMax71/Integr8sCode
Length of output: 174
🏁 Script executed:
# Check if there are any admin or debug endpoints that might call aggregation methods
rg "get_saga_stats\|count_sagas_by_state\|state_pipeline" ./backend --type py -B 2 -A 5Repository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Look for Beanie FieldInfo type and its __str__ method
rg "class.*FieldInfo" --type py | head -10Repository: HardMax71/Integr8sCode
Length of output: 48
🌐 Web query:
Beanie FieldInfo __str__ conversion to field name string
💡 Result:
Use the model's model_fields keys (or the attribute name you defined) instead of relying on FieldInfo.str. FieldInfo is metadata for a field; it does not hold the attribute name as its string representation. Read its attributes (alias / serialization_alias) or inspect the model_fields mapping to get the field name.
Example (Beanie / Pydantic v2 style):
-
preferred: get names from the model_fields dict
for name, info in MyDoc.model_fields.items():
print(name) # actual attribute/field name
print(info.alias) # alias if set (may be_Unset/None) -
fallback: use serialization alias when present
field_name = info.serialization_alias or info.alias or name
Reference: Pydantic FieldInfo documentation (attributes like alias / serialization_alias; FieldInfo is field metadata). [1]
Sources
[1] Pydantic — Fields / FieldInfo.
🏁 Script executed:
# Check actual test files to see if they mock or test aggregations
cat -n ./backend/tests/integration/test_saga_routes.py | head -80Repository: HardMax71/Integr8sCode
Length of output: 3816
🏁 Script executed:
# Search for calls to get_saga_stats
rg "get_saga_stats" . --type pyRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Look for admin routes or services that call repository methods
rg "saga_repository\|SagaRepository" ./backend/app --type py -B 2 -A 3 | head -50Repository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Check if there are any admin endpoints that trigger aggregations
fd "admin.*route\|route.*admin" --type f ./backend/app --extension pyRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Search for any actual usage of aggregation pipelines in service/route layers
rg "\.aggregate\(" ./backend/app --type py -B 3 | head -60Repository: HardMax71/Integr8sCode
Length of output: 4748
🏁 Script executed:
# Check what methods call get_saga_stats or use aggregations in services/routes
rg "get_saga_stats\|\.aggregate\(" ./backend/app/db --type py -lRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Look at admin routes specifically
find ./backend/app -name "*admin*" -type f -name "*.py" | xargs grep -l "stats\|aggregate" 2>/dev/nullRepository: HardMax71/Integr8sCode
Length of output: 333
🏁 Script executed:
# Read the complete get_saga_stats method to understand its context
rg "async def get_saga_stats" -A 30 ./backend/app/db/repositories/saga_repository.pyRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Search for calls to get_saga_stats in services or routes
rg "get_saga_stats" ./backend --type py -rRepository: HardMax71/Integr8sCode
Length of output: 479
🌐 Web query:
monggregate Pipeline group by= Beanie field descriptor object support
💡 Result:
Short answer: Yes — Beanie’s built‑in aggregate helpers (avg, sum, count, etc.) accept Beanie FieldDescriptor properties (e.g. Product.price). For raw aggregation pipelines you must still use MongoDB field paths (strings like "$field" or "$nested.field") in the pipeline; Beanie’s .aggregate() accepts a native pipeline (dicts) and a projection_model. Example patterns:
-
Using helper (accepts FieldDescriptor):
- await Product.avg(Product.price). [1][2]
-
Using raw pipeline (use "$..." field paths):
- await Product.find(...).aggregate([{"$group": {"_id": "$category.name", "total": {"$avg": "$price"}}}], projection_model=OutModel).to_list() . [1][2]
Sources:
[1] Beanie docs — Aggregations (examples showing avg(Product.price) and native pipeline usage).
[2] Beanie PyPI / tutorial pages with same examples.
🏁 Script executed:
# Search more broadly for calls to saga stats or repository in services
rg "saga_repository\|SagaRepository" ./backend/app/services --type py -B 2 -A 5Repository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Check admin events service to see the pattern
cat -n ./backend/app/services/admin/admin_events_service.py | head -100Repository: HardMax71/Integr8sCode
Length of output: 3877
🏁 Script executed:
# Look at the complete saga_repository to find get_saga_stats signature
cat -n ./backend/app/db/repositories/saga_repository.py | grep -A 40 "get_saga_stats"Repository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Test what monggregate exports when using a FieldDescriptor
python3 << 'EOF'
# Simulate the issue
class MockFieldDescriptor:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
def __repr__(self):
return f"MockFieldDescriptor({self.name})"
# If monggregate just converts to string, it would produce "state" not "$state"
field = MockFieldDescriptor("state")
print(f"String conversion: {str(field)}")
print(f"Repr: {repr(field)}")
# MongoDB expects "$state" for field reference in aggregation
EOFRepository: HardMax71/Integr8sCode
Length of output: 152
🏁 Script executed:
# Search for admin routes
find ./backend/app -path "*/routes/*admin*" -o -path "*/admin/*route*" | head -10Repository: HardMax71/Integr8sCode
Length of output: 263
🏁 Script executed:
# Look for any routes that reference saga stats
rg "saga.*stats\|stats.*saga" ./backend/app --type py -iRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Check the admin_events_repository to see the pattern
cat -n ./backend/app/db/repositories/admin/admin_events_repository.py | head -120Repository: HardMax71/Integr8sCode
Length of output: 5623
🏁 Script executed:
# Directly check what happens with the field descriptor - look for actual test or usage
rg "SagaDocument\.state\|DLQMessageDocument\.status" ./backend/app --type py -B 1 -A 1 | head -50Repository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Continue reading admin_events_repository to see more pattern examples
cat -n ./backend/app/db/repositories/admin/admin_events_repository.py | sed -n '95,180p'Repository: HardMax71/Integr8sCode
Length of output: 4540
🏁 Script executed:
# Search for other uses of group(by=...) to see if there's a consistent pattern
rg "\.group\(by=" ./backend --type py -A 1 -B 1Repository: HardMax71/Integr8sCode
Length of output: 2647
🏁 Script executed:
# Read more of saga_repository to see all group calls
cat -n ./backend/app/db/repositories/saga_repository.py | sed -n '90,180p'Repository: HardMax71/Integr8sCode
Length of output: 2977
🏁 Script executed:
# Search for any Beanie/monggregate integration documentation or test
find ./backend/tests -name "*.py" -type f -exec grep -l "Pipeline\|group\|aggregate" {} \; | head -5Repository: HardMax71/Integr8sCode
Length of output: 1221
🏁 Script executed:
# Look at one of the test files to see if aggregations are tested
cat -n ./backend/tests/integration/services/saga/test_saga_service.py | head -100Repository: HardMax71/Integr8sCode
Length of output: 1060
🌐 Web query:
monggregate source code Beanie FieldDescriptor support how convert field reference
💡 Result:
Short answer + code: detect Beanie field/descriptor and extract its actual field name (alias) before building the mongo field reference.
Example helper to add to monggregate source (Python):
from typing import Any
from beanie.odm.fields import ExpressionField # optional import for exact type-check
from pydantic import BaseModel
from pydantic.fields import ModelField, FieldInfo
def to_field_ref(field: Any) -> str:
"""
Convert possible inputs (string, Beanie expression/field descriptor,
Pydantic FieldInfo/ModelField, dataclass attr, etc.) to a Mongo field
reference string (e.g. "a.b" or "$a.b").
"""
# already a string -> return as-is (no leading $)
if isinstance(field, str):
return field
# Beanie ExpressionField (and similar descriptors) implement attr access
# and behave like strings — try str() first
try:
s = str(field)
except Exception:
s = None
if s:
# ExpressionField.__repr__/__str__ yields the path (e.g. "comments.count")
# so prefer that when available
return s
# Pydantic ModelField or FieldInfo -> use name or alias
if isinstance(field, ModelField):
return field.alias or field.name
if isinstance(field, FieldInfo):
# FieldInfo has no 'name' — caller should provide name; fallback to repr
return getattr(field, "alias", repr(field))
# generic object with common attributes
for attr in ("alias", "name", "_name", "field_name"):
if hasattr(field, attr):
val = getattr(field, attr)
if isinstance(val, str):
return val
# last resort: attribute access for simple descriptor objects
if hasattr(field, "__getattr__"):
try:
return str(field)
except Exception:
pass
raise TypeError(f"Unsupported field descriptor: {type(field)!r}")
Usage when building the pipeline:
- call to_field_ref(x) to get "a.b", then use f"${to_field_ref(x)}" wherever monggregate expects a $-prefixed field path.
Notes / rationale:
- monggregate expects plain field paths when composing expressions; convert Beanie ExpressionField / descriptors to their string path first. See monggregate docs for how expressions use Expression.field(...) and string paths. [1]
- Beanie field types and expression fields are in Beanie's fields API (ExpressionField supports attribute access / indexing and is string-like). Use its str()/repr() or .name/.alias when available. [2]
References:
[1] Monggregate docs (usage / expressions). (vianneymi.github.io)
[2] Beanie fields / ExpressionField docs. (beanie-odm.dev)
If you want, I can make a PR patch for monggregate showing exactly where to call this helper and add unit tests — tell me the repo/branch.
Citations:
- 1: https://vianneymi.github.io/monggregate/?utm_source=openai
- 2: https://beanie-odm.dev/api-documentation/fields/?utm_source=openai
🏁 Script executed:
# Check if there's any monggregate version history or changelog that addresses this
find ./backend -name "*.lock" -o -name "*.txt" | xargs grep -i "monggregate" 2>/dev/null | head -20Repository: HardMax71/Integr8sCode
Length of output: 760
🏁 Script executed:
# Final check: Look for S.field() usage patterns to confirm best practice
rg "S\.field\(" ./backend/app/db/repositories --type py -B 1 -A 1 | head -40Repository: HardMax71/Integr8sCode
Length of output: 3577
🏁 Script executed:
# Check if admin_events_service actually calls get_event_stats (production usage)
rg "get_event_stats" ./backend/app --type py -B 2 -A 2Repository: HardMax71/Integr8sCode
Length of output: 2104
🏁 Script executed:
# Confirm the pattern across all affected locations one more time
rg "\.group\(by=[A-Z]" ./backend/app/db/repositories --type py -A 1Repository: HardMax71/Integr8sCode
Length of output: 1915
Wrap Beanie field descriptors with S.field() in Pipeline.group(by=...) calls.
monggregate's by= parameter requires explicit field path conversion. Passing raw Beanie FieldDescriptor objects (e.g., SagaDocument.state) will fail to group correctly. Use S.field() wrapper instead, consistent with the accumulators in the query= parameter. This affects production code: admin_events_repository.get_event_stats() is exposed via GET /api/admin/events/stats.
Affected locations
saga_repository.py:98-by=SagaDocument.statesaga_repository.py:127-by=SagaDocument.stateadmin_events_repository.py:138-by=EventDocument.event_typeadmin_events_repository.py:163-by=EventDocument.metadata.user_iddlq_repository.py-by=DLQMessageDocument.statusandby=DLQMessageDocument.event.event_type
Fix: Wrap with S.field(), e.g., by=S.field(EventDocument.event_type) (already used correctly in query accumulators on the same lines).
🤖 Prompt for AI Agents
In @backend/app/db/repositories/saga_repository.py around lines 97 - 102, The
aggregation group calls are using raw Beanie FieldDescriptor objects (e.g.,
SagaDocument.state, EventDocument.event_type, EventDocument.metadata.user_id,
DLQMessageDocument.status, DLQMessageDocument.event.event_type) which prevents
monggregate from resolving field paths; update each Pipeline.group(by=...) to
wrap the field descriptors with S.field(...) (for example change
by=SagaDocument.state to by=S.field(SagaDocument.state)) in the methods
count_sagas_by_state and the other group usages noted so the group key is a
proper field path.
| duration_pipeline = ( | ||
| Pipeline() | ||
| .project( | ||
| duration={"$subtract": [S.field(SagaDocument.completed_at), S.field(SagaDocument.created_at)]} | ||
| ) | ||
| .group(by=None, query={"avg_duration": S.avg("$duration")}) | ||
| ) | ||
| avg_duration = 0.0 | ||
| async for doc in SagaDocument.find(*completed_conditions).aggregate(duration_pipeline): | ||
| async for doc in SagaDocument.find(*completed_conditions).aggregate(duration_pipeline.export()): | ||
| avg_duration = doc["avg_duration"] / 1000.0 if doc["avg_duration"] else 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check monggregate S.field() usage patterns in the codebase
echo "=== Searching for S.field() usage ==="
rg -n "S\.field\(" --type py -C2
echo -e "\n=== Checking imports of monggregate ==="
rg -n "from monggregate|import monggregate" --type py
echo -e "\n=== Checking requirements or setup files ==="
fd -e "requirements*.txt" -e "setup.py" -e "pyproject.toml" | head -10 | xargs cat 2>/dev/null | grep -i monggregate || echo "No monggregate found"Repository: HardMax71/Integr8sCode
Length of output: 7501
🏁 Script executed:
#!/bin/bash
# Look at the sagadocument and S imports in the repository
echo "=== Checking SagaDocument definition ==="
rg -n "class SagaDocument" --type py -A5
echo -e "\n=== Checking S import ==="
rg -n "from.*import.*S\b|import.*\bS\b" --type py | head -20
echo -e "\n=== Checking the file in question for context ==="
head -50 backend/app/db/repositories/saga_repository.py 2>/dev/null || echo "File not found, trying to locate it"
fd saga_repository.pyRepository: HardMax71/Integr8sCode
Length of output: 4215
🏁 Script executed:
#!/bin/bash
# Check how Beanie fields are used - if they serialize to strings
echo "=== Checking SagaDocument field definitions ==="
cat backend/app/db/docs/saga.py | head -100
echo -e "\n=== Searching for S.field() calls with raw MongoDB operators (strongest evidence) ==="
rg -n 'S\.field\(' backend/app/db/repositories/ -C1 | grep -E '(\$subtract|\$group|\$min|\$max|\$sum|_id)' -A1 -B1Repository: HardMax71/Integr8sCode
Length of output: 4103
🏁 Script executed:
#!/bin/bash
# Check if there's any issue or failure logs related to S.field usage
echo "=== Looking for tests that verify S.field() works correctly ==="
fd -e "test_*.py" -e "*_test.py" | xargs grep -l "S\.field" 2>/dev/null | head -5
echo -e "\n=== Check git history or issues mentioning S.field ==="
git log --all --oneline --grep="S\.field" 2>/dev/null | head -5 || echo "No git history available"
echo -e "\n=== Verify that these operations work in practice by checking if there are actual usages ==="
rg -n "get_saga_statistics|get_dlq_stats" backend/app --type py -A2 -B2 | head -40Repository: HardMax71/Integr8sCode
Length of output: 2596
🌐 Web query:
monggregate S.field() Beanie document attributes
💡 Result:
Use S.field("dot.path") to reference document attributes (as stored in MongoDB) inside monggregate expressions. Example pattern from a real project:
- S.field("ingredients.quantity") — accesses nested ingredients.quantity.
- Use it inside operators: S.multiply(S.field("ingredients.quantity"), S.divide(...)).
- Use the document's stored field names (dot notation for nested fields); you don't need Beanie model attributes/objects — pass the DB field path as a string.
Sources: monggregate docs/examples and a code example using S.field("ingredients.quantity") in a project writeup. [1][2]
References
- Monggregate documentation/examples. [1]
- Example usage (Baker project using monggregate showing S.field("ingredients.quantity")). [2]
Use string field names with S.field() instead of Beanie document attributes.
The S.field() function from monggregate expects string field names (e.g., S.field("completed_at")), not Beanie document attributes. Lines 141 should use:
Suggested fix
duration_pipeline = (
Pipeline()
.project(
duration={"$subtract": [S.field("completed_at"), S.field("created_at")]}
)
.group(by=None, query={"avg_duration": S.avg("$duration")})
)This same pattern issue exists across dlq_repository.py, event_repository.py, and admin_events_repository.py—all should pass string field names to S.field().
The milliseconds-to-seconds conversion on line 147 is correct.
🤖 Prompt for AI Agents
In @backend/app/db/repositories/saga_repository.py around lines 138 - 147,
S.field() is currently being called with Beanie document attributes; change
calls like S.field(SagaDocument.completed_at) and
S.field(SagaDocument.created_at) to use string field names ("completed_at",
"created_at") inside the duration_pipeline projection, and apply the same fix
wherever S.field(...) is used in dlq_repository.py, event_repository.py, and
admin_events_repository.py so all aggregates pass string field names; keep the
rest of the aggregation (group and milliseconds-to-seconds conversion) unchanged
and ensure references to SagaDocument.find(...) aggregate remain the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/db/repositories/dlq_repository.py (1)
93-98: Bug:event_typefilter references non-existent field.
DLQMessageDocument.event_typedoesn't exist — perDLQMessageDocument(seebackend/app/db/docs/dlq.py),event_typeis nested insideevent. This will cause the filter to fail or return no matches.Proposed fix
conditions: list[Any] = [ DLQMessageDocument.status == status if status else None, DLQMessageDocument.original_topic == topic if topic else None, - DLQMessageDocument.event_type == event_type if event_type else None, + DLQMessageDocument.event.event_type == event_type if event_type else None, ]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
backend/app/db/repositories/dlq_repository.pybackend/pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/db/repositories/dlq_repository.py (3)
backend/app/domain/events/event_models.py (1)
SortDirection(20-22)backend/app/db/docs/dlq.py (1)
DLQMessageDocument(11-43)backend/app/dlq/models.py (7)
TopicStatistic(119-126)EventTypeStatistic(129-135)AgeStatistics(138-145)DLQStatistics(148-157)DLQMessageStatus(12-18)DLQMessage(31-52)DLQTopicSummary(189-200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Scan Backend
- GitHub Check: Scan Frontend
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
- GitHub Check: Integration Tests
🔇 Additional comments (5)
backend/pyproject.toml (1)
126-126: monggregate 0.22.1 is compatible with existing dependencies.Version 0.22.1 is the latest stable release and satisfies all dependency requirements: pydantic (project has 2.9.2, requires ≥2.6.0), pymongo (project has 4.12.1, requires ≥3.0.0), and typing-extensions (project has 4.12.2, requires ≥4.0). The addition aligns with the project's pinned dependency versioning approach.
backend/app/db/repositories/dlq_repository.py (4)
111-113: LGTM!Correctly uses the nested
event.event_idpath and Pydantic v2'smodel_validatefor conversion.
115-155: LGTM!The two-stage aggregation approach is well-structured. Using
$arrayToObjectfor status breakdown and computing averages via$divide/$roundis appropriate.
157-166: LGTM!Correctly queries by nested
event.event_idand updates the appropriate fields before saving.
168-178: LGTM!Correctly queries by nested
event.event_idand updates the appropriate fields before saving.
| # Age statistics - get timestamps and sum for true average, calculate ages in Python | ||
| time_pipeline = Pipeline().group( | ||
| by=None, | ||
| query={ | ||
| "oldest": S.min(S.field(DLQMessageDocument.failed_at)), | ||
| "newest": S.max(S.field(DLQMessageDocument.failed_at)), | ||
| "count": S.sum(1), | ||
| "sum_failed_at": S.sum(S.field(DLQMessageDocument.failed_at)), | ||
| }, | ||
| ] | ||
| age_result = [] | ||
| async for doc in DLQMessageDocument.aggregate(age_pipeline): | ||
| age_result.append(doc) | ||
| age_stats_data = age_result[0] if age_result else {} | ||
| age_stats = AgeStatistics( | ||
| min_age_seconds=age_stats_data.get("min_age", 0.0), | ||
| max_age_seconds=age_stats_data.get("max_age", 0.0), | ||
| avg_age_seconds=age_stats_data.get("avg_age", 0.0), | ||
| ) | ||
| time_results = await DLQMessageDocument.aggregate(time_pipeline.export()).to_list() | ||
| now = datetime.now(timezone.utc) | ||
| if time_results and time_results[0].get("oldest"): | ||
| r = time_results[0] | ||
| oldest, newest, count = r["oldest"], r["newest"], r["count"] | ||
| # sum_failed_at is sum of timestamps in ms since epoch, compute true average age | ||
| now_ms = now.timestamp() * 1000 | ||
| avg_age_seconds = (now_ms * count - r["sum_failed_at"]) / count / 1000 if count > 0 else 0.0 | ||
| age_stats = AgeStatistics( | ||
| min_age_seconds=(now - newest).total_seconds(), | ||
| max_age_seconds=(now - oldest).total_seconds(), | ||
| avg_age_seconds=avg_age_seconds, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
MongoDB $sum operator Date field aggregation behavior milliseconds
💡 Result:
Short answer: $sum treats BSON Date as non‑numeric and will ignore it (so it won't add ISODate values directly). To sum dates as milliseconds convert them first (e.g. {$toLong: "$date"}) and then $sum the converted longs. See MongoDB docs for $sum behavior, $toLong conversion (Date → milliseconds), and date subtraction returning milliseconds. [1][2][3]
Example (aggregation):
{ $group: { _id: null, totalMs: { $sum: { $toLong: "$date" } } } }
Sources:
- $sum behavior (ignores non‑numeric values). [1]
- $toLong converts Date to milliseconds since epoch. [2]
- $subtract / date ops use milliseconds for date arithmetic. [3]
🏁 Script executed:
# Search for monggregate usage examples with datetime/date fields
rg -i "addfields|subtract" --type py -A 3 -B 1 | head -50Repository: HardMax71/Integr8sCode
Length of output: 529
🏁 Script executed:
# Check if there are any examples in the codebase using datetime in pipelines
rg "Pipeline\(\)" --type py -A 10 | grep -i "datetime\|now\|date" | head -30Repository: HardMax71/Integr8sCode
Length of output: 691
🏁 Script executed:
# Look at how DLQMessageDocument is used with dates
rg "DLQMessageDocument" --type py -B 2 -A 2 | grep -i "failed_at\|date\|time" | head -30Repository: HardMax71/Integr8sCode
Length of output: 3070
🏁 Script executed:
# Look for $$NOW or similar time-related pipeline operations
rg '\$\$NOW|current_date' --type pyRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Check how other repositories handle age/time calculations in pipelines
rg 'timestamp\(\)' --type py -B 3 -A 3 | head -40Repository: HardMax71/Integr8sCode
Length of output: 3297
🏁 Script executed:
# Look at the actual dlq_repository.py file to see the full context and imports
head -30 backend/app/db/repositories/dlq_repository.pyRepository: HardMax71/Integr8sCode
Length of output: 967
Incorrect fix: MongoDB pipeline cannot evaluate Python datetime objects.
The original code has a real bug—$sum on a Date field doesn't yield epoch milliseconds (it returns null/0)—but the proposed fix compounds the problem by attempting to use a Python datetime object directly in the MongoDB pipeline expression. MongoDB pipelines operate on BSON data and cannot evaluate Python objects.
To compute average age correctly, either:
- Calculate in Python (simplest): Remove the buggy
sum_failed_ataggregation field. Use theoldestandnewesttimestamps already retrieved to compute all three statistics (min/max/avg ages) in Python after the aggregation. - Pass milliseconds as literal: If computing in the pipeline is preferred, convert
nowto milliseconds before sending to MongoDB:now_ms = int(now.timestamp() * 1000), then use it as a literal value in the pipeline (not a Python object).
The min/max age calculations—(now - newest).total_seconds() and (now - oldest).total_seconds()—are already correct as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/app/services/k8s_worker/config.py">
<violation number="1" location="backend/app/services/k8s_worker/config.py:11">
P1: Consumer group mismatch: `consumer_group` is `"kubernetes-worker"` but `GroupId.K8S_WORKER` resolves to `"k8s-worker"`. This inconsistency can cause issues with Kafka consumer group coordination. Consider using the enum value for consistency.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| # Kafka settings | ||
| consumer_group: str = "kubernetes-worker" | ||
| topics: list[KafkaTopic] = field(default_factory=lambda: [KafkaTopic.EXECUTION_TASKS]) | ||
| topics: list[KafkaTopic] = field(default_factory=lambda: list(CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.K8S_WORKER])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Consumer group mismatch: consumer_group is "kubernetes-worker" but GroupId.K8S_WORKER resolves to "k8s-worker". This inconsistency can cause issues with Kafka consumer group coordination. Consider using the enum value for consistency.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/app/services/k8s_worker/config.py, line 11:
<comment>Consumer group mismatch: `consumer_group` is `"kubernetes-worker"` but `GroupId.K8S_WORKER` resolves to `"k8s-worker"`. This inconsistency can cause issues with Kafka consumer group coordination. Consider using the enum value for consistency.</comment>
<file context>
@@ -1,14 +1,14 @@
# Kafka settings
consumer_group: str = "kubernetes-worker"
- topics: list[KafkaTopic] = field(default_factory=lambda: [KafkaTopic.EXECUTION_TASKS])
+ topics: list[KafkaTopic] = field(default_factory=lambda: list(CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.K8S_WORKER]))
# Kubernetes settings
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/domain/enums/kafka.py (1)
117-119: Remove stale K8S_WORKER entry from unusedCONSUMER_GROUP_EVENTSconstant.
CONSUMER_GROUP_EVENTSis dead code—it's defined in kafka.py but never imported or used anywhere in the codebase. TheGroupId.K8S_WORKERentry listingEventType.EXECUTION_STARTEDis inaccurate: K8S_WORKER actually subscribes toSAGA_COMMANDSand handlesCREATE_POD_COMMANDandDELETE_POD_COMMANDevents. Either remove the unusedCONSUMER_GROUP_EVENTSconstant entirely or delete the stale K8S_WORKER entry.
🤖 Fix all issues with AI agents
In @backend/app/services/coordinator/coordinator.py:
- Line 12: Import KafkaTopic is unused in coordinator, causing a linter F401;
remove KafkaTopic from the import statement so only CONSUMER_GROUP_SUBSCRIPTIONS
and GroupId are imported (i.e., change "from app.domain.enums.kafka import
CONSUMER_GROUP_SUBSCRIPTIONS, GroupId, KafkaTopic" to omit KafkaTopic), then run
the linter/tests to confirm the warning is resolved.
In @backend/app/services/k8s_worker/worker.py:
- Around line 13-34: The import block is unsorted; group and alphabetize imports
with standard library first, third-party next, then local app imports, and
ensure all Kubernetes imports are adjacent and ordered consistently—combine
"from kubernetes import client as k8s_client" and "from kubernetes import config
as k8s_config" with "from kubernetes.client.rest import ApiException" into the
third-party group and alphabetize them (keep references to k8s_client,
k8s_config, ApiException intact), then re-run the linter to confirm I001 is
resolved.
- Line 11: Remove the unused KafkaTopic import from the import statement in
worker.py: change the line importing from app.domain.enums.kafka so it only
imports CONSUMER_GROUP_SUBSCRIPTIONS and GroupId (remove KafkaTopic) to satisfy
the linter and avoid the unused-import warning.
🧹 Nitpick comments (1)
backend/app/services/result_processor/processor.py (1)
45-48: Clarify KafkaTopic usage for linter compliance.
KafkaTopicis used in the type annotations on Lines 45 and 48. The linter flags it as unused (F401), which may be a false positive due to its use in type hints. Consider usingTYPE_CHECKINGimport guard if the linter continues to flag this:♻️ Alternative approach if linter persists
from __future__ import annotations from typing import TYPE_CHECKING if TYPE_CHECKING: from app.domain.enums.kafka import KafkaTopicHowever, since
KafkaTopicappears to be needed at runtime forField(default=KafkaTopic.EXECUTION_RESULTS)on Line 48, the import is actually required. The linter may need a# noqa: F401comment or configuration update.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/domain/enums/kafka.pybackend/app/services/coordinator/coordinator.pybackend/app/services/k8s_worker/config.pybackend/app/services/k8s_worker/worker.pybackend/app/services/result_processor/processor.pybackend/app/services/sse/kafka_redis_bridge.pybackend/tests/integration/result_processor/test_result_processor.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/services/k8s_worker/config.py (1)
backend/app/domain/enums/kafka.py (2)
GroupId(57-69)KafkaTopic(7-54)
backend/tests/integration/result_processor/test_result_processor.py (1)
backend/app/domain/events/typed.py (3)
EventMetadata(16-27)ExecutionCompletedEvent(93-99)ResultStoredEvent(193-198)
backend/app/services/coordinator/coordinator.py (2)
backend/app/domain/enums/kafka.py (2)
GroupId(57-69)KafkaTopic(7-54)backend/app/events/core/dispatcher.py (1)
EventDispatcher(15-177)
🪛 GitHub Actions: Ruff Linting
backend/app/services/k8s_worker/worker.py
[error] 11-11: F401: 'KafkaTopic' imported but unused. Remove unused import: app.domain.enums.kafka.KafkaTopic.
[warning] I001: Import block is un-sorted or un-formatted.
backend/app/services/result_processor/processor.py
[error] 11-11: F401: 'KafkaTopic' imported but unused. Remove unused import: app.domain.enums.kafka.KafkaTopic.
[warning] I001: Import block is un-sorted or un-formatted.
backend/app/services/coordinator/coordinator.py
[error] 12-12: F401: 'KafkaTopic' imported but unused. Remove unused import: app.domain.enums.kafka.KafkaTopic.
[warning] I001: Import block is un-sorted or un-formatted.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Scan Backend
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Integration Tests
- GitHub Check: E2E Tests
🔇 Additional comments (19)
backend/tests/integration/result_processor/test_result_processor.py (4)
11-11: LGTM!Import correctly updated to use the unified domain event types as per the migration.
72-75: LGTM!Good filtering by
execution_idto prevent test interference from concurrent test runs or leftover events from previous tests.
109-122: LGTM!Clean test flow: wait for the
ResultStoredEventsignal before verifying DB persistence, eliminating the need for polling. The 12-second timeout is reasonable for an integration test, and the finally block ensures proper cleanup.
91-108: The review comment's concern is not valid. TheResultProcessor's consumer is explicitly configured withauto_offset_reset="earliest"at line 122 ofbackend/app/services/result_processor/processor.py, not the Kafka default of "latest". This means the processor will read the pre-produced event, and the test setup is correct. The comment in the test at line 92 is accurate.Likely an incorrect or invalid review comment.
backend/app/services/sse/kafka_redis_bridge.py (3)
8-9: LGTM!Import changes correctly migrate to the unified domain event model, importing
DomainEventand the centralized subscription mappings.
90-91: LGTM! Good use of centralized subscription configuration.Using
CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.WEBSOCKET_GATEWAY]aligns the SSE bridge with the WebSocket gateway's topic subscriptions, which is appropriate since both serve real-time client delivery. The comment clearly explains the rationale.
118-131: LGTM!The handler signature correctly updated to accept
DomainEvent, consistent with the unified event model. The event data extraction viamodel_dump()and execution_id lookup remains unchanged in behavior.backend/app/domain/enums/kafka.py (2)
52-52: LGTM!New
DLQ_EVENTStopic correctly added to support DLQ telemetry event publishing as described in PR objectives.
78-80: LGTM!The subscription change to
SAGA_COMMANDScorrectly reflects that K8S_WORKER now receives pod commands from the saga orchestrator rather than raw execution events. The inline comment documents the rationale well.backend/app/services/k8s_worker/config.py (2)
11-11: LGTM!Good refactor to derive topics dynamically from the centralized
CONSUMER_GROUP_SUBSCRIPTIONSmapping. This ensures K8sWorkerConfig stays in sync with the source-of-truth topic configuration.
4-4: Keep theKafkaTopicimport; it is used for the type annotation.
KafkaTopicis directly referenced on line 11 as the type annotation for thetopicsfield:topics: list[KafkaTopic]. The import is necessary and removing it would break the type hint.backend/app/services/k8s_worker/worker.py (2)
137-138: LGTM!Good refactor to use centralized subscriptions. The comment clearly documents that topics come from the centralized config rather than being hardcoded.
214-224: LGTM with a note on the assert pattern.The wrapper pattern using
assert isinstance(event, ...)is consistent with the unified event model approach. The assertion provides runtime type safety while allowing the dispatcher to use a commonDomainEventsignature.Note: In production, if an unexpected event type is dispatched to these handlers, an
AssertionErrorwill be raised. This is acceptable as it indicates a configuration/routing bug that should surface during development/testing.backend/app/services/result_processor/processor.py (4)
148-160: LGTM!Handler wrappers correctly updated to accept
DomainEventwith type assertions, consistent with the unified event model pattern used across this PR.
173-187: Good defensive improvement for optional resource_usage.The guard
if event.resource_usage:preventsAttributeErrorwhenresource_usageisNone. This is a sensible change since completed events may not always have resource usage data available.
214-214: LGTM!Graceful handling of
error_typewith string conversion and fallback to"unknown"is defensive and prevents potential issues iferror_typeisNone.
11-11: No action needed—KafkaTopicis actively used.The import on line 11 is used on lines 45 and 48 as type annotations for
topics: list[KafkaTopic]andresult_topic: KafkaTopic. The F401 linter warning is either absent or a false positive; the import is necessary and correctly placed.Likely an incorrect or invalid review comment.
backend/app/services/coordinator/coordinator.py (2)
14-27: Import migration looks correct; address linter warning for sorting.The consolidation of event imports from
app.domain.events.typedaligns with the PR's goal of unifying event systems. However, the Ruff linter (I001) flags the import block as un-sorted. Consider running the formatter or sorting imports to satisfy CI.
160-160: Good refactor: centralized topic subscriptions.Using
CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.EXECUTION_COORDINATOR]centralizes topic configuration, improving maintainability and consistency across services.
| # Schedule immediately if at front of queue (position 0) | ||
| if position == 0: | ||
| await self._schedule_execution(event) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate the QueueManager class
fd -t f "*.py" | xargs rg -l "class QueueManager"Repository: HardMax71/Integr8sCode
Length of output: 416
🏁 Script executed:
# Find the get_next_execution method implementation
rg -n -A 20 "def get_next_execution" --type pyRepository: HardMax71/Integr8sCode
Length of output: 2094
🏁 Script executed:
# Check the add_execution method
rg -n -A 20 "def add_execution" --type pyRepository: HardMax71/Integr8sCode
Length of output: 2108
🏁 Script executed:
# Examine the _schedule_execution method in coordinator.py
sed -n '340,380p' backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 2213
🏁 Script executed:
# Check the entire _handle_execution_requested method to understand context
rg -n -B 5 -A 30 "async def _handle_execution_requested" backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 1804
🏁 Script executed:
# Look for _active_executions initialization and usage
rg -n "_active_executions" backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 1166
🏁 Script executed:
rg -n -A 60 "async def _schedule_execution" backend/app/services/coordinator/coordinator.py | head -80Repository: HardMax71/Integr8sCode
Length of output: 3347
🏁 Script executed:
# Also check the _scheduling_loop implementation to confirm it doesn't have additional guards
rg -n -B 5 -A 20 "async def _scheduling_loop" backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 1238
🏁 Script executed:
# Check if there are any locks or deduplication mechanisms between queue removal and scheduling
rg -n "_queue_lock\|_execution_resources\|_active_executions" backend/app/services/coordinator/queue_manager.pyRepository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
rg -n "_scheduling_semaphore" backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 201
🏁 Script executed:
# Check the semaphore initialization in __init__
rg -n -A 50 "def __init__" backend/app/services/coordinator/coordinator.py | grep -A 50 "ExecutionCoordinator"Repository: HardMax71/Integr8sCode
Length of output: 48
🏁 Script executed:
# Get more context around semaphore initialization
sed -n '80,120p' backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 1940
🏁 Script executed:
# Verify how max_concurrent_scheduling is configured and what value it has
rg -n "max_concurrent_scheduling" backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 268
🏁 Script executed:
# Check the initialization to understand the full picture
sed -n '60,100p' backend/app/services/coordinator/coordinator.pyRepository: HardMax71/Integr8sCode
Length of output: 1678
Race condition between immediate scheduling and scheduling loop.
When position == 0, this code immediately awaits _schedule_execution(event). Concurrently, _scheduling_loop (line 301) calls get_next_execution() which removes the same event from the queue and creates a task to schedule it.
Both execution paths can enter _schedule_execution() for the same execution_id before either reaches the deduplication point at line 342 (_active_executions.add()). The guard at line 321 is checked before that addition, creating a time-of-check-time-of-use (TOCTOU) window where both paths request resource allocation concurrently.
The _scheduling_semaphore only limits concurrency count, not prevents duplicates.
Consider:
- Removing the event from the queue before scheduling, or
- Adding a check in
_schedule_execution()that persists the execution_id before resource allocation, or - Removing this eager scheduling and relying solely on the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/k8s_worker/worker.py (1)
215-225: Replaceassertwith explicit type checks for production safety.Using
assert isinstance(...)for runtime type validation is fragile—assertions are stripped when Python runs with the-Oflag (common in production containers). If an unexpected event type reaches these handlers, it would proceed without validation and likely fail with a confusingAttributeErrorlater.🔒️ Proposed fix using explicit type checks
async def _handle_create_pod_command_wrapper(self, event: DomainEvent) -> None: """Wrapper for handling CreatePodCommandEvent with type safety.""" - assert isinstance(event, CreatePodCommandEvent) + if not isinstance(event, CreatePodCommandEvent): + raise TypeError(f"Expected CreatePodCommandEvent, got {type(event).__name__}") self.logger.info(f"Processing create_pod_command for execution {event.execution_id} from saga {event.saga_id}") await self._handle_create_pod_command(event) async def _handle_delete_pod_command_wrapper(self, event: DomainEvent) -> None: """Wrapper for handling DeletePodCommandEvent.""" - assert isinstance(event, DeletePodCommandEvent) + if not isinstance(event, DeletePodCommandEvent): + raise TypeError(f"Expected DeletePodCommandEvent, got {type(event).__name__}") self.logger.info(f"Processing delete_pod_command for execution {event.execution_id} from saga {event.saga_id}") await self._handle_delete_pod_command(event)
🧹 Nitpick comments (1)
backend/app/services/result_processor/processor.py (1)
151-161: Consider replacingassertwith explicit type checks for production safety.Using
assert isinstance(...)for type validation can be problematic because assertions are disabled when Python runs with the-O(optimize) flag. If that happens, a misrouted event would cause crypticAttributeErrorfailures instead of a clear type error.♻️ Suggested refactor using explicit type guard
async def _handle_completed_wrapper(self, event: DomainEvent) -> None: - assert isinstance(event, ExecutionCompletedEvent) + if not isinstance(event, ExecutionCompletedEvent): + raise TypeError(f"Expected ExecutionCompletedEvent, got {type(event).__name__}") await self._handle_completed(event) async def _handle_failed_wrapper(self, event: DomainEvent) -> None: - assert isinstance(event, ExecutionFailedEvent) + if not isinstance(event, ExecutionFailedEvent): + raise TypeError(f"Expected ExecutionFailedEvent, got {type(event).__name__}") await self._handle_failed(event) async def _handle_timeout_wrapper(self, event: DomainEvent) -> None: - assert isinstance(event, ExecutionTimeoutEvent) + if not isinstance(event, ExecutionTimeoutEvent): + raise TypeError(f"Expected ExecutionTimeoutEvent, got {type(event).__name__}") await self._handle_timeout(event)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/services/coordinator/coordinator.pybackend/app/services/k8s_worker/worker.pybackend/app/services/result_processor/processor.pybackend/tests/unit/services/result_processor/test_processor.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/services/coordinator/coordinator.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Integration Tests
- GitHub Check: E2E Tests
🔇 Additional comments (8)
backend/app/services/k8s_worker/worker.py (3)
15-24: LGTM!The import changes correctly align with the PR objective of consolidating to a single Avro-backed domain event model. The centralized subscription configuration (
CONSUMER_GROUP_SUBSCRIPTIONS,GroupId) and domain-typed events are now properly sourced fromapp.domain.events.typed.
50-59: Formatting adjustment only.No functional changes; parameter indentation was reformatted.
138-139: Good refactor to centralized subscription configuration.This change correctly moves topic subscription from hardcoded values to a centralized configuration, improving maintainability and consistency across consumers.
Verify that
CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.K8S_WORKER]includes the expected topics for the K8S worker (e.g., the saga commands topic containingCREATE_POD_COMMANDandDELETE_POD_COMMANDevents):#!/bin/bash # Find the CONSUMER_GROUP_SUBSCRIPTIONS definition to verify K8S_WORKER topics rg -n -A 20 'CONSUMER_GROUP_SUBSCRIPTIONS\s*[=:]' --type pybackend/tests/unit/services/result_processor/test_processor.py (1)
18-20: LGTM! Test assertions properly validate centralized topic mapping.The updated assertions correctly verify that
config.topicsmatches the centralizedCONSUMER_GROUP_SUBSCRIPTIONSmapping while still ensuringEXECUTION_EVENTSis present as a sanity check. This aligns well with the PR's goal of deriving topics from a single source of truth.backend/app/services/result_processor/processor.py (4)
46-48: LGTM! Topics derived from centralized mapping.Using
default_factorywith list conversion from the centralizedCONSUMER_GROUP_SUBSCRIPTIONSis correct and aligns with the single-source-of-truth approach.
174-188: LGTM! Good defensive handling of optionalresource_usage.Guarding the metrics recording with
if event.resource_usage:preventsAttributeErrorwhen the field isNone. The existing TODO on line 184 appropriately acknowledges the memory limit parsing brittleness.
215-215: LGTM! Defensive error_type handling.The fallback to
"unknown"ensures metrics always receive a valid label even whenerror_typeisNone.
267-296: LGTM! Event publishing aligned with unified domain model.Both
_publish_result_storedand_publish_result_failedcorrectly use the newEventMetadataclass from the unified domain events module.
+ fixed conftests to clean up dbs created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/tests/conftest.py">
<violation number="1" location="backend/tests/conftest.py:88">
P2: The cleanup code is described as 'best-effort' that 'may fail silently', but it lacks a try-except block. Any failure during cleanup will raise an exception during fixture teardown, potentially causing all tests using this fixture to report errors.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @backend/tests/conftest.py:
- Around line 85-90: The teardown's "best-effort cleanup" should be wrapped in a
try/except so failures in drop_database do not propagate; in the async with
application.router.lifespan_context(application) block, after obtaining
container via application.state.dishka_container and resolving db with await
container.get(Database), call
db.client.drop_database(test_settings.DATABASE_NAME) inside a try/except that
catches Exception and either logs the error or suppresses it (matching the "may
fail silently" behavior) to avoid masking test results.
🧹 Nitpick comments (1)
backend/tests/conftest.py (1)
27-36: Consider logging swallowed exceptions for debuggability.The bare
except Exception: passsilently hides all failures, including misconfigurations or unexpected errors. While acceptable for unit tests where Kafka is unavailable, this could mask issues in integration test runs.♻️ Suggested improvement
@pytest.hookimpl(trylast=True) def pytest_configure() -> None: """Create Kafka topics once in master process before xdist workers spawn.""" # PYTEST_XDIST_WORKER is only set in workers, not master if os.environ.get("PYTEST_XDIST_WORKER"): return try: asyncio.run(create_topics(Settings(_env_file=".env.test"))) - except Exception: - pass # Kafka unavailable (unit tests) + except Exception as e: + # Kafka unavailable (unit tests) - log for debugging integration test issues + import logging + logging.getLogger(__name__).debug("Kafka topic creation skipped: %s", e)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/.env.testbackend/app/settings.pybackend/pyproject.tomlbackend/tests/conftest.pydocker-compose.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/.env.test
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/conftest.py (1)
backend/app/settings.py (1)
Settings(9-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Scan Backend
- GitHub Check: Scan Frontend
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
- GitHub Check: Integration Tests
🔇 Additional comments (5)
docker-compose.yaml (1)
50-53: LGTM!Adding
ulimits.nofilefor MongoDB is a sensible operational improvement. The values (65536 soft/hard) are appropriate for production-like workloads and consistent with the limits already configured for Zookeeper and Kafka in this file.backend/pyproject.toml (2)
219-221: Good practice: Disabling OTEL SDK during tests.Setting
OTEL_SDK_DISABLED="true"is an effective way to prevent test teardown delays caused by OTLP exporter retry attempts when no collector is available. This aligns with the commit message about removing the 40s teardown timeout.Note: If any tests specifically validate OpenTelemetry instrumentation behavior, they may need to explicitly override this setting.
126-126: monggregate version 0.22.1 is valid and current. The pinned version is available on PyPI and is the latest release, making it an appropriate dependency for MongoDB aggregation pipeline operations.backend/app/settings.py (1)
162-167: LGTM!The hardcoded
.envdefault is appropriate. Tests correctly override viaSettings(_env_file=".env.test")as seen inconftest.py, and production uses the default. This simplifies configuration by removing the unnecessaryDOTENV_PATHindirection.backend/tests/conftest.py (1)
40-65: LGTM!The per-worker isolation strategy is well-designed. Using
Settings(_env_file=".env.test")withmodel_copy(update={...})for dynamic overrides cleanly separates static test config from runtime isolation concerns. The docstring clearly documents what's shared vs. isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 14 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/app/events/core/types.py">
<violation number="1" location="backend/app/events/core/types.py:81">
P1: New `request_timeout_ms` field is not used in `to_consumer_config()`. The field is added but the conversion method doesn't include `"request.timeout.ms": self.request_timeout_ms`, so this setting will have no effect on the actual Kafka consumer.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/dlq/manager.py (1)
206-255: Potential partial failure when sending to both topics.The method sends to both
retry_topic(lines 222-227) andoriginal_topic(lines 230-235) without transactional guarantees. If the first send succeeds but the second fails, the system will be in an inconsistent state—the retry topic has the message but the original topic doesn't, yet the exception will prevent the status update.Consider wrapping these in a try/except to handle partial failures gracefully, or document this as acceptable behavior given the retry nature of this code.
backend/app/events/core/types.py (1)
81-105: Add missingrequest_timeout_mstoto_consumer_config()for consistency.The
to_consumer_config()method does not include therequest_timeout_msfield (line 81), even though it exists in theConsumerConfigdataclass. While production code currently uses direct field access (bypassing this method), the mapping should be added for consistency and to prevent silent field drops if the method is ever used.🐛 Proposed fix to add missing mapping
def to_consumer_config(self) -> dict[str, object]: """Convert to Confluent Kafka consumer configuration.""" return { "bootstrap.servers": self.bootstrap_servers, "group.id": self.group_id, "client.id": self.client_id, "auto.offset.reset": self.auto_offset_reset, "enable.auto.commit": self.enable_auto_commit, "session.timeout.ms": self.session_timeout_ms, "heartbeat.interval.ms": self.heartbeat_interval_ms, "max.poll.interval.ms": self.max_poll_interval_ms, + "request.timeout.ms": self.request_timeout_ms, "fetch.min.bytes": self.fetch_min_bytes, "fetch.wait.max.ms": self.fetch_max_wait_ms, "statistics.interval.ms": self.statistics_interval_ms, }
🤖 Fix all issues with AI agents
In @backend/.env.test:
- Around line 28-33: Update the KAFKA_REQUEST_TIMEOUT_MS in the test env so it
is greater than KAFKA_SESSION_TIMEOUT_MS: change KAFKA_REQUEST_TIMEOUT_MS
(currently 5000) to at least 8000 to maintain a safe margin relative to
KAFKA_SESSION_TIMEOUT_MS=6000 while keeping SSE_CONSUMER_POOL_SIZE and
KAFKA_HEARTBEAT_INTERVAL_MS as-is.
In @backend/app/services/event_bus.py:
- Around line 96-99: The Kafka timeout settings violate required relationships:
request_timeout_ms (KAFKA_REQUEST_TIMEOUT_MS) must be less than
session_timeout_ms (KAFKA_SESSION_TIMEOUT_MS); currently 40000 ≥ 30000. Fix by
adjusting the settings so KAFKA_REQUEST_TIMEOUT_MS < KAFKA_SESSION_TIMEOUT_MS
(e.g., lower KAFKA_REQUEST_TIMEOUT_MS to <30000) or by increasing
KAFKA_SESSION_TIMEOUT_MS above the request timeout, and ensure the values loaded
into session_timeout_ms, heartbeat_interval_ms, max_poll_interval_ms, and
request_timeout_ms in event_bus.py respect the documented constraints.
♻️ Duplicate comments (4)
backend/app/services/notification_service.py (1)
334-334: Blocking notification delivery may increase API latency.This change from
asyncio.create_task()toawaitblocks the caller until delivery completes, including external HTTP calls with 30s timeouts. If intentional for delivery guarantees, document the tradeoff; otherwise, restore the background task pattern.backend/app/dlq/manager.py (2)
72-76: Previous concerns about null safety and error handling remain valid.The issues flagged in previous reviews still apply:
v.decode()on line 75 will crash if header value isNonejson.loads()andDLQMessage(**data, ...)lack error handling for malformed messages
107-137: Async iteration pattern is clean, but error backoff concern remains.The refactoring to
async for msg in self.consumer:is a cleaner approach than manual polling. However, the missing error backoff after exception on line 137 was already flagged in a previous review—without a sleep, persistent errors (e.g., database unavailable) will cause rapid spinning.backend/app/services/coordinator/coordinator.py (1)
246-249: Off-by-one error: queue positions are 1-indexed.The condition
position == 0will never match a legitimate front-of-queue scenario. PerQueueManager._get_queue_position, positions are 1-indexed (first item = position 1). The value0only occurs whenpositionwasNone(error case) and defaulted to0at line 236.To schedule immediately when at the front of the queue, the condition should be
position == 1.Additionally, even with the fix, there's a potential TOCTOU race with
_scheduling_loopthat could schedule the same execution twice before_active_executions.add()is reached.🐛 Proposed fix for off-by-one
- # Schedule immediately if at front of queue (position 0) - if position == 0: + # Schedule immediately if at front of queue (position 1, since 1-indexed) + if position == 1: await self._schedule_execution(event)
🧹 Nitpick comments (2)
backend/app/dlq/manager.py (1)
389-416: Consider batch concurrency for large retry operations.The sequential
for event_id in event_idsloop processes retries one at a time. For large batches, this could be slow. Consider usingasyncio.gatherwith a semaphore to limit concurrency:async def retry_messages_batch(self, event_ids: list[str], max_concurrent: int = 10) -> DLQBatchRetryResult: semaphore = asyncio.Semaphore(max_concurrent) async def retry_with_limit(event_id: str) -> DLQRetryResult: async with semaphore: try: success = await self.retry_message_manually(event_id) return DLQRetryResult(event_id=event_id, status="success" if success else "failed", error=None if success else "Retry failed") except Exception as e: return DLQRetryResult(event_id=event_id, status="failed", error=str(e)) results = await asyncio.gather(*[retry_with_limit(eid) for eid in event_ids]) # ... aggregate resultsbackend/app/services/k8s_worker/worker.py (1)
219-229: Reasonable runtime type narrowing pattern.The wrapper methods accept
DomainEvent(the union type) and useassert isinstance(...)to narrow to the specific event type. This is acceptable since the dispatcher guarantees routing byevent_type, making the assertion a defensive check rather than validation logic.However, if assertions are disabled (
python -O), this would silently pass incorrect types. Consider using an explicit check with an exception if stricter runtime safety is desired.♻️ Optional: Explicit type check instead of assert
async def _handle_create_pod_command_wrapper(self, event: DomainEvent) -> None: """Wrapper for handling CreatePodCommandEvent with type safety.""" - assert isinstance(event, CreatePodCommandEvent) + if not isinstance(event, CreatePodCommandEvent): + raise TypeError(f"Expected CreatePodCommandEvent, got {type(event).__name__}") self.logger.info(f"Processing create_pod_command for execution {event.execution_id} from saga {event.saga_id}") await self._handle_create_pod_command(event)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
backend/.env.testbackend/app/dlq/manager.pybackend/app/events/consumer_group_monitor.pybackend/app/events/core/consumer.pybackend/app/events/core/types.pybackend/app/events/event_store_consumer.pybackend/app/services/coordinator/coordinator.pybackend/app/services/event_bus.pybackend/app/services/k8s_worker/worker.pybackend/app/services/notification_service.pybackend/app/services/result_processor/processor.pybackend/app/services/saga/saga_orchestrator.pybackend/app/services/sse/kafka_redis_bridge.pybackend/app/settings.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/events/event_store_consumer.py
- backend/app/settings.py
- backend/app/services/result_processor/processor.py
🧰 Additional context used
🧬 Code graph analysis (5)
backend/app/services/coordinator/coordinator.py (6)
backend/app/domain/enums/kafka.py (1)
GroupId(57-69)backend/app/domain/events/typed.py (2)
CreatePodCommandEvent(380-395)ExecutionRequestedEvent(46-60)backend/app/events/core/types.py (1)
ConsumerConfig(66-105)backend/app/events/core/dispatcher.py (1)
EventDispatcher(15-177)backend/app/events/core/consumer.py (2)
UnifiedConsumer(23-257)start(45-71)backend/app/events/schema/schema_registry.py (1)
SchemaRegistryManager(35-119)
backend/app/services/sse/kafka_redis_bridge.py (4)
backend/app/domain/enums/kafka.py (1)
GroupId(57-69)backend/app/settings.py (1)
Settings(9-170)backend/app/core/metrics/events.py (1)
EventMetrics(4-209)backend/app/services/sse/redis_bus.py (1)
SSERedisBus(44-89)
backend/app/services/saga/saga_orchestrator.py (3)
backend/app/domain/events/typed.py (2)
EventMetadata(16-27)SagaCancelledEvent(349-358)backend/app/events/event_store_consumer.py (1)
_handle_event(111-119)backend/app/services/saga/base_saga.py (1)
BaseSaga(8-52)
backend/app/services/notification_service.py (1)
backend/app/domain/events/typed.py (3)
ExecutionCompletedEvent(93-99)ExecutionFailedEvent(102-110)ExecutionTimeoutEvent(113-119)
backend/app/services/k8s_worker/worker.py (6)
backend/app/domain/enums/kafka.py (1)
GroupId(57-69)backend/app/domain/events/typed.py (5)
CreatePodCommandEvent(380-395)DeletePodCommandEvent(398-404)ExecutionFailedEvent(102-110)ExecutionStartedEvent(78-83)PodCreatedEvent(133-137)backend/app/services/k8s_worker/config.py (1)
K8sWorkerConfig(8-35)backend/app/events/core/producer.py (2)
producer(56-57)UnifiedProducer(23-218)backend/app/events/core/consumer.py (1)
start(45-71)backend/app/services/idempotency/middleware.py (1)
start(263-271)
🪛 dotenv-linter (4.0.0)
backend/.env.test
[warning] 31-31: [UnorderedKey] The KAFKA_SESSION_TIMEOUT_MS key should go before the SSE_CONSUMER_POOL_SIZE key
(UnorderedKey)
[warning] 32-32: [UnorderedKey] The KAFKA_HEARTBEAT_INTERVAL_MS key should go before the KAFKA_SESSION_TIMEOUT_MS key
(UnorderedKey)
[warning] 33-33: [UnorderedKey] The KAFKA_REQUEST_TIMEOUT_MS key should go before the KAFKA_SESSION_TIMEOUT_MS key
(UnorderedKey)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Scan Backend
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Integration Tests
- GitHub Check: E2E Tests
🔇 Additional comments (30)
backend/app/services/notification_service.py (4)
22-27: LGTM!Import changes correctly migrate from the infrastructure Kafka events layer to the unified domain events as per the PR objectives.
648-660: LGTM!The unified event handler correctly uses
DomainEventtype hint and dispatches to specific typed handlers viaisinstancechecks. The defensive error handling and logging for unhandled event types are appropriate.
224-228: Good addition of Kafka consumer timeout settings.These configuration parameters improve consumer resilience by allowing environment-specific tuning. The Settings class defines all required attributes with appropriate defaults:
KAFKA_SESSION_TIMEOUT_MS(30000ms),KAFKA_HEARTBEAT_INTERVAL_MS(3000ms),KAFKA_MAX_POLL_INTERVAL_MS(300000ms), andKAFKA_REQUEST_TIMEOUT_MS(40000ms).
635-636: LGTM!Safe handling of optional
resource_usagewith fallback to0.0. Theexecution_time_wall_secondsattribute exists inResourceUsageDomainas afloatwith a default value of0.0, and the duration extraction and formatting are correct.backend/app/dlq/manager.py (9)
69-70: LGTM!The initialization of
_dlq_events_topicwith proper prefix and_event_metadatafor lifecycle events is clean and follows the domain event model pattern.
139-168: LGTM!The message processing flow with filters, storage, retry policy evaluation, and status updates is well-structured. The transition to nested
message.event.*field access is consistent throughout.
170-183: LGTM!The upsert pattern using Beanie with nested field lookup is correct, and emitting the received event after storage ensures events are only published for successfully stored messages.
257-273: LGTM!The discard flow properly updates metrics, persists status, emits the lifecycle event, and logs the action.
275-304: LGTM!The monitor loop correctly uses
is_runningfor graceful shutdown, has proper error backoff (60s sleep), and the Beanie query for scheduled messages is well-structured. Note thatfrom_attributes=Trueis the correct approach for validating Beanie documents into Pydantic models.
360-372: LGTM with note on error handling.The
_produce_dlq_eventmethod correctly uses schema registry for Avro serialization. The error handling that logs and swallows exceptions is appropriate for telemetry/lifecycle events—failure to emit these events should not disrupt core DLQ processing.
374-387: LGTM!Proper state validation guards against retrying messages in terminal states (DISCARDED, RETRIED), and the nested field query is consistent with the new model.
418-440: LGTM!The manual discard implementation mirrors the retry pattern with proper state validation and clear docstring.
443-484: LGTM!The factory function now properly configures Kafka client settings from the application settings, including session timeout, heartbeat interval, max poll interval, and request timeout. The producer configuration with
enable_idempotence=Trueensures exactly-once semantics for retried messages.backend/.env.test (1)
43-45: LGTM! Clear documentation of OpenTelemetry disabling mechanism.The updated comments effectively explain the dual safety mechanism (empty endpoint + OTEL_SDK_DISABLED flag) for disabling OpenTelemetry in tests. This improves developer understanding of the test configuration.
backend/app/services/sse/kafka_redis_bridge.py (3)
74-77: LGTM! Kafka timeout settings are now configurable.The consumer configuration now correctly uses settings-based timeout values (
KAFKA_MAX_POLL_INTERVAL_MS,KAFKA_SESSION_TIMEOUT_MS,KAFKA_HEARTBEAT_INTERVAL_MS,KAFKA_REQUEST_TIMEOUT_MS) instead of hardcoded defaults, enabling environment-specific tuning.
91-92: LGTM! Centralized topic subscriptions.Using
CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.WEBSOCKET_GATEWAY]aligns with the PR's goal of centralizing consumer group→topic mappings and ensures the SSE bridge subscribes to the same topics as the WebSocket gateway.
119-126: LGTM! DomainEvent type migration.The
route_eventhandler now correctly acceptsDomainEvent, consistent with the unified event model. Themodel_dump()call andexecution_idextraction remain functionally equivalent.backend/app/services/saga/saga_orchestrator.py (3)
122-126: LGTM! Kafka timeout settings centralized.The consumer configuration now correctly sources timeout values from settings, providing consistency with other consumers in the system.
155-183: LGTM! DomainEvent type migration in event handlers.The
_handle_event,_should_trigger_sagamethods now correctly useDomainEventtype, aligning with the unified event model. The logic remains unchanged.
185-239: LGTM! Saga trigger and execution methods updated.The
_start_sagaand_execute_sagamethods now acceptDomainEventas the trigger event type. Thegetattr(trigger_event, "execution_id", None)pattern safely handles dynamic attribute access on the union type.backend/app/services/coordinator/coordinator.py (2)
115-118: LGTM! Kafka timeout settings centralized.The consumer configuration now correctly uses settings-based timeout values, consistent with other consumers in the codebase.
161-161: LGTM! Centralized topic subscriptions.Using
CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.EXECUTION_COORDINATOR]correctly centralizes topic management.backend/app/services/k8s_worker/worker.py (2)
111-114: LGTM! Kafka timeout settings centralized.The consumer configuration now correctly uses settings-based timeout values, consistent with other consumers.
142-143: LGTM! Centralized topic subscriptions.Using
CONSUMER_GROUP_SUBSCRIPTIONS[GroupId.K8S_WORKER]aligns with the centralized subscription model.backend/app/events/consumer_group_monitor.py (2)
150-150: LGTM!Storing the settings reference enables consistent timeout configuration for the lag-monitor consumer created in
_get_consumer_group_lag.
382-390: LGTM!The timeout configuration is now consistent with the centralized settings. For this ephemeral lag-monitoring consumer, the
request_timeout_msis the most relevant setting for controlling how long offset fetch operations wait.backend/app/events/core/consumer.py (4)
15-15: LGTM!Import correctly updated to use
DomainEventfrom the unified domain events module as part of the migration.
41-41: LGTM!Type annotation correctly updated to
DomainEvent, consistent with the deserialized event type used when invoking the callback (line 192).
194-195: LGTM!The callback signature is correctly updated to use
DomainEvent, consistent with the field type annotation on line 41.
60-60: LGTM!Adding
request_timeout_msprovides explicit control over broker response timeouts, complementing the existing session/heartbeat/poll interval settings. The field is properly defined inConsumerConfigwith a 40-second default for production use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="docs/testing/kafka-test-stability.md">
<violation number="1" location="docs/testing/kafka-test-stability.md:28">
P2: Broken MkDocs snippet directive. The `--8<--` syntax is used by pymdownx.snippets to include external code. Adding spaces (`--8 < --`) breaks this directive and the code snippet will not be rendered.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docs/testing/kafka-test-stability.md:
- Line 28: The snippet inclusion marker currently has spaces around the '<'
operator (`--8 < --`); remove the spaces so it uses the standard marker `--8<--
"..."` (i.e., change `--8 < --` to `--8<-- "..."`) so the snippet will be
correctly included in the rendered documentation.
🧹 Nitpick comments (2)
docs/reference/environment-variables.md (1)
58-61: Consider documenting timeout relationships.While the defaults are correct, consider adding a brief note explaining the required relationships between these Kafka timeout values for users who need to customize them. For example:
heartbeat_interval_msshould typically be ≤ 1/3 ofsession_timeout_msrequest_timeout_msshould be >session_timeout_msmax_poll_interval_msshould be >session_timeout_ms📝 Optional: Add a note about timeout relationships
You could add a note below the Kafka Configuration table:
## Kafka Configuration | Variable | Default | Description | |-------------------------------|-------------------------------|----------------------------------------| | ... | ... | ... | > **Note:** When customizing Kafka timeouts, ensure: `heartbeat_interval_ms` ≤ `session_timeout_ms` / 3, > `request_timeout_ms` > `session_timeout_ms`, and `max_poll_interval_ms` > `session_timeout_ms`.docs/testing/kafka-test-stability.md (1)
85-87: Fix line break within sentence.The word "or" is isolated on line 86, breaking sentence flow.
📝 Proposed fix
-The `request_timeout_ms` parameter in aiokafka defaults to **40000ms** (40 seconds). When the Kafka coordinator is slow -or -unresponsive during test teardown, the consumer waits the full timeout before giving up. +The `request_timeout_ms` parameter in aiokafka defaults to **40000ms** (40 seconds). When the Kafka coordinator is slow or unresponsive during test teardown, the consumer waits the full timeout before giving up.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/reference/environment-variables.mddocs/testing/kafka-test-stability.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/testing/kafka-test-stability.md
25-25: Spaces inside link text
(MD039, no-space-in-links)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: E2E Tests
- GitHub Check: Integration Tests
🔇 Additional comments (3)
docs/reference/environment-variables.md (1)
46-62: Documentation updates verified—all Kafka defaults match code.All documented Kafka configuration defaults (KAFKA_BOOTSTRAP_SERVERS, SCHEMA_REGISTRY_URL, SCHEMA_REGISTRY_AUTH, ENABLE_EVENT_STREAMING, EVENT_RETENTION_DAYS, KAFKA_TOPIC_PREFIX, KAFKA_GROUP_SUFFIX, KAFKA_CONSUMER_GROUP_ID, KAFKA_AUTO_OFFSET_RESET, KAFKA_ENABLE_AUTO_COMMIT, KAFKA_SESSION_TIMEOUT_MS, KAFKA_HEARTBEAT_INTERVAL_MS, KAFKA_MAX_POLL_INTERVAL_MS, KAFKA_REQUEST_TIMEOUT_MS, KAFKA_MAX_POLL_RECORDS) match their corresponding values in backend/app/settings.py. The three new timeout variables are properly defined in code, follow Kafka best practices (heartbeat < session < request), and the explicit "(ms)" unit annotations improve clarity.
docs/testing/kafka-test-stability.md (2)
71-130: Excellent documentation addition!The new "Consumer teardown delays" section is comprehensive and well-structured. It provides:
- Clear problem statement with concrete error examples
- Root cause explanation with external issue reference
- Practical configuration fix
- Before/after metrics showing dramatic improvement (40s → <1s)
- Detailed timeout explanations table
This will be very helpful for developers encountering similar issues.
99-102: Kafka heartbeat interval at configuration boundary.The heartbeat interval (2000ms) equals exactly session_timeout_ms / 3 (6000 / 3), which is the outer edge of the recommended range (heartbeat.interval.ms ≲ 1/3 of session.timeout_ms). This leaves no margin if a heartbeat is missed before rebalance triggers. Adjust to
KAFKA_HEARTBEAT_INTERVAL_MS=1800orKAFKA_SESSION_TIMEOUT_MS=6500to add safety margin.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 11 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/app/events/core/producer.py">
<violation number="1" location="backend/app/events/core/producer.py:83">
P3: The `client_id` format string is duplicated. Consider storing it as an instance variable during `_on_start()` or as a property to ensure consistency between the actual producer configuration and what `get_status()` reports.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| "batch_size": self._config.batch_size, | ||
| "compression_type": self._config.compression_type, | ||
| "bootstrap_servers": self._settings.KAFKA_BOOTSTRAP_SERVERS, | ||
| "client_id": f"{self._settings.SERVICE_NAME}-producer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3: The client_id format string is duplicated. Consider storing it as an instance variable during _on_start() or as a property to ensure consistency between the actual producer configuration and what get_status() reports.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/app/events/core/producer.py, line 83:
<comment>The `client_id` format string is duplicated. Consider storing it as an instance variable during `_on_start()` or as a property to ensure consistency between the actual producer configuration and what `get_status()` reports.</comment>
<file context>
@@ -62,28 +61,26 @@ async def _on_start(self) -> None:
- "batch_size": self._config.batch_size,
- "compression_type": self._config.compression_type,
+ "bootstrap_servers": self._settings.KAFKA_BOOTSTRAP_SERVERS,
+ "client_id": f"{self._settings.SERVICE_NAME}-producer",
},
"metrics": {
</file context>



Summary by cubic
Unifies the two event systems into one Avro-backed domain event model, removing the infrastructure/kafka/events layer. Simplifies serialization, routing, DLQ handling, and tests; adds DLQ telemetry events, configurable Kafka timeouts, and small SSE shutdown hooks.
Refactors
Migration
Written for commit d4a7bec. Summary will update on new commits.
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.