Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 24 additions & 6 deletions backend/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,7 @@ See `dev/guidelines/backend/python.md` for detailed coding standards including:

## Testing

- Unit tests: no external dependencies except database
- Integration tests: require Neo4j via testcontainers
- Test files mirror source: `infrahub/core/node.py` → `tests/unit/core/test_node.py`
- Async tests auto-configured via pytest-asyncio
See `dev/knowledge/backend/testing.md` for detailed testing infrastructure documentation.

## Boundaries

Expand All @@ -68,5 +65,26 @@ See `dev/guidelines/backend/python.md` for detailed coding standards including:

## See Also

- `dev/guidelines/backend/python.md` - Detailed Python coding standards
- `dev/knowledge/backend/` - Backend architecture documentation (to be created)
### Guidelines

- `dev/guidelines/backend/python.md` - Python coding standards

### Knowledge (How the system works)

- `dev/knowledge/backend/architecture.md` - Backend architecture overview
- `dev/knowledge/backend/testing.md` - Testing infrastructure and patterns
- `dev/knowledge/backend/events.md` - Events system
- `dev/knowledge/backend/async-tasks.md` - Asynchronous tasks (Prefect)
- `dev/knowledge/backend/message-bus.md` - Message bus system

### Guides (How to do X)

- `dev/guides/backend/creating-events.md` - Creating new events
- `dev/guides/backend/creating-async-tasks.md` - Creating async tasks
- `dev/guides/backend/creating-messages.md` - Creating message bus messages

### ADRs (Why we decided)

- `dev/adr/0002-events-system.md` - Events system design
- `dev/adr/0003-asynchronous-tasks.md` - Async tasks design
- `dev/adr/0004-message-bus.md` - Message bus design
49 changes: 34 additions & 15 deletions dev/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Internal documentation for contributors. For user-facing docs, see `/docs/`.
- `backend/` - Backend architecture and patterns
- `frontend/` - Frontend architecture and patterns
- **guides/**: Step-by-step procedures for specific tasks.
- `backend/` - Backend-specific guides (events, tasks, messages)
- `docs/` - Documentation-specific guides
- `frontend/` - Frontend-specific guides
- **adr/**: Architecture Decision Records. Why we chose what we chose.
Expand All @@ -43,22 +44,40 @@ Mark deprecated docs clearly. Don't delete—update with pointers to replacement

## Current Guidelines

- **Repository Organization**: `guidelines/repository-organization.md` - How to organize content in dev/
- **Python Backend**: `guidelines/backend/python.md`
- **TypeScript Frontend**: `guidelines/frontend/typescript.md`
- **Git Workflow**: `guidelines/git-workflow.md`
- **Markdown Formatting**: `guidelines/markdown.md`
- **Writing Documentation**: `guidelines/documentation.md` - How to write user-facing documentation
- **Repository Organization**: [guidelines/repository-organization.md](guidelines/repository-organization.md) - How to organize content in dev/
- **Python Backend**: [guidelines/backend/python.md](guidelines/backend/python.md)
- **TypeScript Frontend**: [guidelines/frontend/typescript.md](guidelines/frontend/typescript.md)
- **Git Workflow**: [guidelines/git-workflow.md](guidelines/git-workflow.md)
- **Markdown Formatting**: [guidelines/markdown.md](guidelines/markdown.md)
- **Writing Documentation**: [guidelines/documentation.md](guidelines/documentation.md) - How to write user-facing documentation

## Current Knowledge

Backend architecture documentation in [knowledge/backend/](knowledge/backend/):

- [architecture.md](knowledge/backend/architecture.md) - Backend architecture overview
- [testing.md](knowledge/backend/testing.md) - Testing infrastructure and patterns
- [events.md](knowledge/backend/events.md) - Events system
- [async-tasks.md](knowledge/backend/async-tasks.md) - Asynchronous tasks (Prefect)
- [message-bus.md](knowledge/backend/message-bus.md) - Message bus system

## Current Guides

Backend guides in [guides/backend/](guides/backend/):

- [creating-events.md](guides/backend/creating-events.md) - How to create new events
- [creating-async-tasks.md](guides/backend/creating-async-tasks.md) - How to create async tasks
- [creating-messages.md](guides/backend/creating-messages.md) - How to create message bus messages

## Current Commands

Available agent commands in `commands/`:
Available agent commands in [commands/](commands/):

- `_shared.md` - Shared instructions for all flows
- `new-component.md` - React component creation flow
- `guided-task.md` - General task flow
- `add-docs.md` - Documentation flow
- `fix-bug.md` - Bug fixing flow
- `fix-github-issue.md` - GitHub issue fixing
- `fix-mypy-module.md` - Mypy type fixes
- `fix-ruff-rule.md` - Ruff linting fixes
- [_shared.md](commands/_shared.md) - Shared instructions for all flows
- [new-component.md](commands/new-component.md) - React component creation flow
- [guided-task.md](commands/guided-task.md) - General task flow
- [add-docs.md](commands/add-docs.md) - Documentation flow
- [fix-bug.md](commands/fix-bug.md) - Bug fixing flow
- [fix-github-issue.md](commands/fix-github-issue.md) - GitHub issue fixing
- [fix-mypy-module.md](commands/fix-mypy-module.md) - Mypy type fixes
- [fix-ruff-rule.md](commands/fix-ruff-rule.md) - Ruff linting fixes
4 changes: 2 additions & 2 deletions dev/adr/0001-context-nuggets-pattern.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,5 +108,5 @@ We adopt the **Context Nuggets** pattern for organizing our repository, based on
## References

- Repository Organization for AI-Assisted Development @ OpsMill
- `dev/guidelines/repository-organization.md` - Detailed guidelines for organizing content
- `dev/README.md` - Quick navigation guide
- [Repository Organization](../guidelines/repository-organization.md) - Detailed guidelines for organizing content
- [Dev README](../README.md) - Quick navigation guide
100 changes: 100 additions & 0 deletions dev/adr/0002-events-system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# 2. Prefect Events System

**Status:** Accepted
**Date:** 2024-12-26
**Author:** @opsmill-team

## Context

Infrahub requires an event-driven architecture to support:

1. **Event Emission**: Notify the system when state changes occur (node mutations, branch operations, schema updates)
2. **Event Querying**: Provide queryable and filterable events for debugging, monitoring, and audit trails
3. **Automation Triggers**: Enable automated actions and workflows based on event patterns
4. **Rich Metadata**: Include sufficient context for complex filtering and routing decisions

The system must handle both internal operational events (infrastructure-level tasks like git sync, registry updates) and user-visible events (for automation and observability).

## Decision

We implement a **Prefect Events-based system** as the foundation for event-driven architecture, leveraging Prefect's built-in storage, querying, and automation capabilities.

### Dual-Channel Dispatch

Events are dispatched through two channels simultaneously:

1. **Message Bus (RabbitMQ/NATS)**: For internal operational tasks
- Point-to-point or broadcast communication
- Handles specific operational tasks (git repository sync, registry updates)
- Only certain events send messages via `get_messages()` method

2. **Prefect Events**: For user-visible automation and audit trails
- All `InfrahubEvent` instances are sent to Prefect
- Stored in Prefect database and queryable via APIs
- Trigger Prefect Automation workflows
- Include rich metadata for flexible filtering

### Event Structure

Events follow Prefect's model with Infrahub-specific extensions:

- **Event Name**: Namespaced under `infrahub.*` (e.g., `infrahub.node.created`, `infrahub.branch.merged`)
- **Resource**: Primary identifier with metadata (node ID, kind, branch)
- **Related Resources**: Additional context (account, branch, parent events, related nodes)
- **Payload**: Event-specific data including changelog information and context

### Core Implementation

The `InfrahubEventService` adapter handles dual dispatch:

```python
async def send(self, event: InfrahubEvent) -> None:
tasks = [self._send_bus(event=event), self._send_prefect(event=event)]
await asyncio.gather(*tasks)
```

Events inherit from `InfrahubEvent` and implement:

- `event_name`: Class variable defining the event namespace
- `get_resource()`: Returns primary resource metadata
- `get_related()`: Returns additional context (optional override)
- `get_messages()`: Returns message bus messages (optional override)

## Consequences

### Positive

- **Mature infrastructure**: Leverages Prefect's battle-tested event system
- **Built-in storage**: Events stored in Prefect database with retention policies
- **Powerful querying**: Filter events via Prefect Automation rules and APIs
- **Workflow integration**: Direct integration with existing Prefect-based task execution
- **Rich metadata**: Support for parent-child relationships, related resources, and flexible attributes
- **Multiple access methods**: GraphQL, REST API, and Prefect Client for event querying
- **Audit capabilities**: Complete event history for compliance and debugging

### Negative

- **Prefect coupling**: Tight dependency on Prefect infrastructure
- **Storage overhead**: All events stored in Prefect database increases storage requirements
- **Learning curve**: Developers must understand Prefect Automation concepts
- **Availability dependency**: Event querying requires Prefect server availability
- **Dual-system complexity**: Managing both message bus and Prefect events adds complexity

### Neutral

- **Event model adoption**: Following Prefect's event model provides consistency but requires adaptation for Infrahub-specific needs

## Implementation Notes

Key implementation locations:

- Event definitions: `backend/infrahub/events/`
- Service adapter: `backend/infrahub/services/adapters/event/`
- Trigger models: `backend/infrahub/trigger/models.py`
- Trigger setup: `backend/infrahub/trigger/setup.py`
- GraphQL queries: `backend/infrahub/graphql/queries/event.py`

See also:

- [Events Knowledge](../knowledge/backend/events.md) - How the event system works
- [Creating Events Guide](../guides/backend/creating-events.md) - How to create a new event
92 changes: 92 additions & 0 deletions dev/adr/0003-asynchronous-tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# 3. Asynchronous Tasks Execution with Prefect

**Status:** Accepted
**Date:** 2024-12-26
**Author:** @opsmill-team

## Context

Infrahub requires an asynchronous task execution framework to support:

1. **Task Reporting & Execution Tracking**: Centralized management of flow runs, state tracking, and log aggregation
2. **Asynchronous Execution**: Support for multiple worker types with future Kubernetes/Docker extensibility
3. **Event-Driven Automation**: React to system state changes with automated workflows
4. **Distributed Execution**: Scale task processing across multiple workers

The system must handle both internal infrastructure operations (branch merges, schema migrations) and user-defined workflows (transforms, generators) while maintaining observability and testability.

## Decision

We adopt **Prefect** as the asynchronous task orchestration framework. This pure-Python implementation enables seamless embedding within the application while simplifying testing and deployment.

### Core Architecture

The task system consists of:

- **Prefect Server**: Central orchestration hub for flow management
- **InfrahubWorkerAsync**: Custom async workers for task execution
- **WorkflowDefinition**: Declarative configurations registered in a catalogue
- **Flow Functions**: Async business logic decorated with `@flow`
- **Task Functions**: Discrete work units decorated with `@task`

### Declarative Workflow Catalogue

We adopt a declarative model where all workflows are centralized in a single catalogue. Every flow must be declared as a `WorkflowDefinition` in this catalogue, specifying its name, type, module path, and optional scheduling/concurrency configuration.

This centralized approach provides:

- **Single source of truth**: All available workflows discoverable in one location
- **Consistent configuration**: Uniform structure for workflow metadata and behavior
- **Automatic deployment**: Workflows are deployed to Prefect on system initialization
- **Extensibility**: Enterprise extensions can inject additional workflows via dependency injection

### Workflow Adapter Pattern

The system decouples execution from implementation:

- **WorkflowWorkerExecution**: Production mode submitting flows to Prefect servers
- **WorkflowLocalExecution**: Testing mode executing flows in-process without infrastructure

This pattern enables identical code across environments while avoiding Prefect dependencies in unit tests.

## Consequences

### Positive

- **Robust observability**: Centralized logging, state tracking, and execution history via Prefect UI/API
- **Scalable execution**: Distributed task processing across multiple workers
- **Pure-Python simplicity**: No external DSLs or configuration languages
- **Testability**: Local execution mode enables unit testing without infrastructure
- **Event integration**: Workflows can be triggered by Prefect Events (see ADR-0002)
- **Concurrency control**: Built-in support for concurrency limits and collision strategies
- **Cron scheduling**: Native support for scheduled workflow execution
- **Centralized catalogue**: Single location for all workflow definitions improves discoverability and maintainability
- **Import isolation**: Using string module paths in the catalogue avoids circular import issues that would occur if all workflows were imported in a single file

### Negative

- **Prefect dependency**: Tight coupling to Prefect infrastructure
- **Worker management**: Requires deploying and managing worker processes
- **Deployment complexity**: More complex than in-process task handling
- **Learning curve**: Developers must understand Prefect concepts (flows, tasks, deployments)

### Neutral

- **Tagging system**: Workflows receive metadata tags for organization and filtering
- **Dependency injection**: Services injected via `fast-depends` pattern

## Implementation Notes

Key implementation locations:

- Workflow definitions: [`backend/infrahub/workflows/catalogue.py`](../../../backend/infrahub/workflows/catalogue.py)
- Workflow models: [`backend/infrahub/workflows/models.py`](../../../backend/infrahub/workflows/models.py)
- Constants & types: [`backend/infrahub/workflows/constants.py`](../../../backend/infrahub/workflows/constants.py)
- Initialization: [`backend/infrahub/workflows/initialization.py`](../../../backend/infrahub/workflows/initialization.py)
- Task functions: Various `tasks.py` files across the codebase

See also:

- [Async Tasks Knowledge](../knowledge/backend/async-tasks.md) - How the async task system works
- [Creating Workflows Guide](../guides/backend/creating-async-tasks.md) - How to create a new workflow
- [ADR-0002: Events System](0002-events-system.md) - Event-driven automation triggers
89 changes: 89 additions & 0 deletions dev/adr/0004-message-bus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# 4. Message Bus Architecture

**Status:** Accepted
**Date:** 2024-12-26
**Author:** @opsmill-team

## Context

Infrahub requires asynchronous inter-component communication to support:

1. **Loose Coupling**: Components should communicate without direct dependencies
2. **Horizontal Scaling**: Workers should scale independently
3. **Rapid Response Operations**: Some operations require faster response times than workflow-based execution
4. **Broadcast Communication**: Ability to notify multiple workers simultaneously (e.g., registry refresh)

Historically, the message bus handled both task execution and broadcasts. With the adoption of Prefect (see ADR-0003), most asynchronous work migrated to workflows. The message bus now focuses on operations requiring rapid responses and broadcast notifications.

## Decision

We implement a message bus using **RabbitMQ** (primary) or **NATS** (alternative) for asynchronous inter-component communication.

### Core Principles

- **Typed Messages**: All messages are Pydantic models extending `InfrahubMessage`
- **Declarative Registration**: Messages map to routing keys via centralized `MESSAGE_MAP`
- **Two Messaging Patterns**: Support both broadcast (one-to-many) and RPC (request/reply)
- **Loop Prevention**: Initiator tracking prevents workers from processing their own broadcasts

### Message Patterns

**Broadcast Messaging**: One-to-many distribution using routing key patterns

- Used for notifications that all workers should process (e.g., `refresh.registry.branches`)
- Workers ignore messages originating from themselves via `initiator_id` check

**Request/Reply (RPC)**: Synchronous-style calls with response correlation

- Used when a response is required (e.g., `git.file.get`)
- Responses correlated via `correlation_id` in message metadata

### Scope of Message Bus vs Workflows

| Use Case | Message Bus | Prefect Workflow |
|----------|-------------|------------------|
| Rapid response needed (< 1s) | Yes | No |
| Broadcast to all workers | Yes | No |
| Long-running operations (> 1s) | No | Yes |
| Needs execution tracking in UI | No | Yes |
| Database modifications | Avoid | Yes |

## Consequences

### Positive

- **Decoupled communication**: Components interact without direct dependencies
- **Horizontal scaling**: Workers scale independently based on message load
- **Non-blocking execution**: Operations don't block HTTP request handling
- **Automatic retries**: Built-in retry logic with exponential backoff
- **Multiple broker support**: RabbitMQ and NATS implementations available
- **Priority support**: Messages can be prioritized (1-5 scale)

### Negative

- **Infrastructure overhead**: Requires deploying and managing message broker
- **Delivery guarantees**: Message ordering and delivery require careful configuration
- **Debugging complexity**: Distributed message flow harder to trace than direct calls (mitigated when distributed tracing is enabled)
- **Monitoring needs**: Requires dedicated observability for message queues
- **Potential message loss**: Without proper configuration, messages may be lost

### Neutral

- **Reduced scope**: Most task execution now handled by Prefect workflows
- **Specialized use cases**: Message bus reserved for specific patterns (broadcast, rapid RPC)

## Implementation Notes

Key implementation locations:

- Base message classes: `backend/infrahub/message_bus/__init__.py`
- Message definitions: `backend/infrahub/message_bus/messages/`
- Operation handlers: `backend/infrahub/message_bus/operations/`
- Message registry: `backend/infrahub/message_bus/messages/__init__.py`

See also:

- [Message Bus Knowledge](../knowledge/backend/message-bus.md) - How the message bus works
- [Creating Messages Guide](../guides/backend/creating-messages.md) - How to create a new message
- [ADR-0002: Events System](0002-events-system.md) - Event system (uses message bus for internal dispatch)
- [ADR-0003: Asynchronous Tasks](0003-asynchronous-tasks.md) - Workflow system (preferred for most async operations)
Loading
Loading