Skip to content

Commit 5afd5c3

Browse files
Update the architecture overview
1 parent 2ff8a77 commit 5afd5c3

File tree

1 file changed

+56
-43
lines changed

1 file changed

+56
-43
lines changed

docs/platform/understanding-airbyte/high-level-view.md

Lines changed: 56 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -4,42 +4,18 @@ description: A high level view of Airbyte's components.
44

55
# Architecture overview
66

7-
Airbyte is conceptually composed of two parts: platform and connectors.
7+
Think of Airbyte as two things:
88

9-
The platform provides all the horizontal services required to configure and run data movement operations e.g: the UI, configuration API, job scheduling, logging, alerting, etc. and is structured as a set of microservices.
9+
- The platform
10+
- Connectors
1011

11-
Connectors are independent modules which push/pull data to/from sources and destinations. Connectors are built in accordance with the [Airbyte Specification](./airbyte-protocol.md), which describes the interface with which data can be moved between a source and a destination using Airbyte. Connectors are packaged as Docker images, which allows total flexibility over the technologies used to implement them.
12+
The platform provides all the horizontal services required to configure and run data movement operations. This includes the UI, API, job scheduling, logging, alerting, etc. These functions exist as a set of microservices.
1213

13-
## Data Transfer Modes
14+
Connectors are independent modules which push/pull data to/from sources and destinations. Connectors follow the [Airbyte Specification](./airbyte-protocol.md), which describes the interface with which Airbyte can move data between a source and a destination. Connectors are Docker images, which allows flexibility over the technologies used to implement them.
1415

15-
Airbyte supports two data transfer modes that are automatically selected based on connector capabilities:
16+
## Platform architecture
1617

17-
- **Socket Mode**: Records flow directly from source to destination via Unix domain sockets, enabling high-throughput parallel data transfer. A lightweight bookkeeper process handles control messages, state, and logs.
18-
- **Legacy Mode**: Records flow through an orchestrator middleware that sits between source and destination, using standard input/output streams.
19-
20-
Socket mode is used when both source and destination connectors support it, providing significantly higher performance for data movement operations.
21-
22-
### Data Flow Comparison
23-
24-
```mermaid
25-
---
26-
title: Data Transfer Modes
27-
---
28-
flowchart LR
29-
subgraph Legacy["Legacy Mode"]
30-
SRC1[Source] --> ORCH[Orchestrator] --> DEST1[Destination]
31-
end
32-
33-
subgraph Socket["Socket Mode"]
34-
SRC2[Source] -.->|control| BK[Bookkeeper]
35-
SRC2 ==>|records via sockets| DEST2[Destination]
36-
DEST2 -.->|state| BK
37-
end
38-
```
39-
40-
## Platform Architecture
41-
42-
A more concrete diagram of the platform orchestration can be seen below:
18+
This diagram describes platform orchestration at a high level.
4319

4420
```mermaid
4521
---
@@ -64,26 +40,63 @@ flowchart LR
6440
WL -->|queues workload| Q
6541
Q -->|reads from| L
6642
L -->|launches| OP
67-
O -->|reports status to| WL
43+
OP -->|reports status to| WL
6844
```
6945

46+
### Steady state operation
47+
7048
- **Config API Server** [`airbyte-server`, `airbyte-server-api`]: Airbyte's main controller and graphical user interface. All operations in Airbyte such as creating sources, destinations, connections, managing configurations, etc. are configured and invoked from the API.
71-
- **Database Config & Jobs** [`airbyte-db`]: Stores all the configuration \(credentials, frequency...\) and job history.
72-
- **Temporal Service** [`airbyte-temporal`]: Manages the scheduling and sequencing task queues and workflows.
73-
- **Worker** [`airbyte-worker`]: Reads from the task queues and executes the connection scheduling and sequencing logic, making calls to the workload API.
74-
- **Workload API** [`airbyte-workload-api-server`]: The HTTP interface for enqueuing workloads — the discrete pods that run the connector operations.
75-
- **Launcher** [`airbyte-workload-launcher`]: Consumes events from the workload API and interfaces with k8s to launch workloads.
7649

77-
### Data Transfer Middleware
50+
- **Database config & jobs** [`airbyte-db`]: stores all the configuration \(credentials, frequency...\) and job history.
51+
52+
- **Temporal service** [`airbyte-temporal`]: manages the scheduling and sequencing task queues and workflows.
53+
54+
- **Worker** [`airbyte-worker`]: reads from the task queues and executes the connection scheduling and sequencing logic, making calls to the workload API.
55+
56+
- **Workload API** [`airbyte-workload-api-server`]: The HTTP interface for enqueuing workloads and the discrete pods that run the connector operations.
57+
58+
- **Launcher** [`airbyte-workload-launcher`]: consumes events from the workload API and interfaces with k8s to launch workloads.
59+
60+
### Additional components
61+
62+
- **Cron** [`airbyte-cron`]: cleans the server and sync logs (when using local logs). Regularly updates connector definitions and sweeps old workloads ensuring eventual consensus.
7863

79-
Within connector operation pods, Airbyte runs middleware containers to process connector output:
64+
- **Bootloader** [`airbyte-bootloader`]: upgrades and migrates database tables and confirm the environment is ready to work.
65+
66+
### Data transfer middleware
67+
68+
Airbyte supports two data transfer modes.
69+
70+
- **Socket mode**: Records flow directly from source to destination via Unix domain sockets. This is a high-throughput parallel data transfer. A lightweight bookkeeper process handles control messages, state, and logs.
71+
72+
- **Legacy mode**: Records flow through an orchestrator middleware that sits between source and destination, using standard input/output streams.
73+
74+
Airbyte selects the mode automatically, based on the capabilities of the connectors used in a connection. It uses socket mode when both source and destination connectors support it. Socket mode provides between four and ten times the performance of legacy mode.
75+
76+
Within connector operation pods, Airbyte runs middleware containers to process connector output.
8077

8178
- **Bookkeeper** [`airbyte-bookkeeper`]: Used in socket mode. Processes control messages, state, and logs while records flow directly between connectors via sockets.
79+
8280
- **Container Orchestrator** [`airbyte-container-orchestrator`]: Used in legacy mode. Sits between source and destination connectors, processing all data and control messages.
8381

84-
The diagram shows the steady-state operation of Airbyte, there are components not described you'll see in your deployment:
82+
#### Data flow comparison
8583

86-
- **Cron** [`airbyte-cron`]: Clean the server and sync logs (when using local logs). Regularly updates connector definitions and sweeps old workloads ensuring eventual consenus.
87-
- **Bootloader** [`airbyte-bootloader`]: Upgrade and Migrate the Database tables and confirm the environment is ready to work.
84+
```mermaid
85+
---
86+
title: Legacy Mode
87+
---
88+
flowchart LR
89+
direction LR
90+
SRC1[Source] --> ORCH[Orchestrator] --> DEST1[Destination]
91+
```
8892

89-
This is a holistic high-level description of each component. For Airbyte deployed in Kubernetes the structure is very similar with a few changes.
93+
```mermaid
94+
---
95+
title: Socket Mode
96+
---
97+
flowchart LR
98+
direction LR
99+
SRC2[Source] -.->|control| BK[Bookkeeper]
100+
SRC2 ==>|records via sockets| DEST2[Destination]
101+
DEST2 -.->|state| BK[Bookkeeper]
102+
```

0 commit comments

Comments
 (0)