You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/platform/understanding-airbyte/high-level-view.md
+56-43Lines changed: 56 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,42 +4,18 @@ description: A high level view of Airbyte's components.
4
4
5
5
# Architecture overview
6
6
7
-
Airbyte is conceptually composed of two parts: platform and connectors.
7
+
Think of Airbyte as two things:
8
8
9
-
The platform provides all the horizontal services required to configure and run data movement operations e.g: the UI, configuration API, job scheduling, logging, alerting, etc. and is structured as a set of microservices.
9
+
- The platform
10
+
- Connectors
10
11
11
-
Connectors are independent modules which push/pull data to/from sources and destinations. Connectors are built in accordance with the [Airbyte Specification](./airbyte-protocol.md), which describes the interface with which data can be moved between a source and a destination using Airbyte. Connectors are packaged as Docker images, which allows total flexibility over the technologies used to implement them.
12
+
The platform provides all the horizontal services required to configure and run data movement operations. This includes the UI, API, job scheduling, logging, alerting, etc. These functions exist as a set of microservices.
12
13
13
-
## Data Transfer Modes
14
+
Connectors are independent modules which push/pull data to/from sources and destinations. Connectors follow the [Airbyte Specification](./airbyte-protocol.md), which describes the interface with which Airbyte can move data between a source and a destination. Connectors are Docker images, which allows flexibility over the technologies used to implement them.
14
15
15
-
Airbyte supports two data transfer modes that are automatically selected based on connector capabilities:
16
+
## Platform architecture
16
17
17
-
-**Socket Mode**: Records flow directly from source to destination via Unix domain sockets, enabling high-throughput parallel data transfer. A lightweight bookkeeper process handles control messages, state, and logs.
18
-
-**Legacy Mode**: Records flow through an orchestrator middleware that sits between source and destination, using standard input/output streams.
19
-
20
-
Socket mode is used when both source and destination connectors support it, providing significantly higher performance for data movement operations.
A more concrete diagram of the platform orchestration can be seen below:
18
+
This diagram describes platform orchestration at a high level.
43
19
44
20
```mermaid
45
21
---
@@ -64,26 +40,63 @@ flowchart LR
64
40
WL -->|queues workload| Q
65
41
Q -->|reads from| L
66
42
L -->|launches| OP
67
-
O -->|reports status to| WL
43
+
OP -->|reports status to| WL
68
44
```
69
45
46
+
### Steady state operation
47
+
70
48
-**Config API Server**[`airbyte-server`, `airbyte-server-api`]: Airbyte's main controller and graphical user interface. All operations in Airbyte such as creating sources, destinations, connections, managing configurations, etc. are configured and invoked from the API.
71
-
-**Database Config & Jobs**[`airbyte-db`]: Stores all the configuration \(credentials, frequency...\) and job history.
72
-
-**Temporal Service**[`airbyte-temporal`]: Manages the scheduling and sequencing task queues and workflows.
73
-
-**Worker**[`airbyte-worker`]: Reads from the task queues and executes the connection scheduling and sequencing logic, making calls to the workload API.
74
-
-**Workload API**[`airbyte-workload-api-server`]: The HTTP interface for enqueuing workloads — the discrete pods that run the connector operations.
75
-
-**Launcher**[`airbyte-workload-launcher`]: Consumes events from the workload API and interfaces with k8s to launch workloads.
76
49
77
-
### Data Transfer Middleware
50
+
-**Database config & jobs**[`airbyte-db`]: stores all the configuration \(credentials, frequency...\) and job history.
51
+
52
+
-**Temporal service**[`airbyte-temporal`]: manages the scheduling and sequencing task queues and workflows.
53
+
54
+
-**Worker**[`airbyte-worker`]: reads from the task queues and executes the connection scheduling and sequencing logic, making calls to the workload API.
55
+
56
+
-**Workload API**[`airbyte-workload-api-server`]: The HTTP interface for enqueuing workloads and the discrete pods that run the connector operations.
57
+
58
+
-**Launcher**[`airbyte-workload-launcher`]: consumes events from the workload API and interfaces with k8s to launch workloads.
59
+
60
+
### Additional components
61
+
62
+
-**Cron**[`airbyte-cron`]: cleans the server and sync logs (when using local logs). Regularly updates connector definitions and sweeps old workloads ensuring eventual consensus.
78
63
79
-
Within connector operation pods, Airbyte runs middleware containers to process connector output:
64
+
-**Bootloader**[`airbyte-bootloader`]: upgrades and migrates database tables and confirm the environment is ready to work.
65
+
66
+
### Data transfer middleware
67
+
68
+
Airbyte supports two data transfer modes.
69
+
70
+
-**Socket mode**: Records flow directly from source to destination via Unix domain sockets. This is a high-throughput parallel data transfer. A lightweight bookkeeper process handles control messages, state, and logs.
71
+
72
+
-**Legacy mode**: Records flow through an orchestrator middleware that sits between source and destination, using standard input/output streams.
73
+
74
+
Airbyte selects the mode automatically, based on the capabilities of the connectors used in a connection. It uses socket mode when both source and destination connectors support it. Socket mode provides between four and ten times the performance of legacy mode.
75
+
76
+
Within connector operation pods, Airbyte runs middleware containers to process connector output.
80
77
81
78
-**Bookkeeper**[`airbyte-bookkeeper`]: Used in socket mode. Processes control messages, state, and logs while records flow directly between connectors via sockets.
79
+
82
80
-**Container Orchestrator**[`airbyte-container-orchestrator`]: Used in legacy mode. Sits between source and destination connectors, processing all data and control messages.
83
81
84
-
The diagram shows the steady-state operation of Airbyte, there are components not described you'll see in your deployment:
82
+
#### Data flow comparison
85
83
86
-
-**Cron**[`airbyte-cron`]: Clean the server and sync logs (when using local logs). Regularly updates connector definitions and sweeps old workloads ensuring eventual consenus.
87
-
-**Bootloader**[`airbyte-bootloader`]: Upgrade and Migrate the Database tables and confirm the environment is ready to work.
0 commit comments