You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-8Lines changed: 30 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,17 +6,27 @@ Multi-hop declarative data pipelines
6
6
7
7
Hoptimator is an SQL-based control plane for complex data pipelines.
8
8
9
-
Hoptimator turns high-level SQL _subscriptions_ into low-level SQL
10
-
_pipelines_. Pipelines may involve an auto-generated Flink job (or
11
-
similar) and any arbitrary resources required for the job to run.
9
+
Hoptimator turns high-level SQL _subscriptions_ into multi-hop data pipelines. Pipelines may involve an auto-generated Flink job (or similar) and any arbitrary resources required for the job to run.
12
10
13
11
## How does it work?
14
12
15
-
Hoptimator has a pluggable _adapter_ framework, which lets you wire-up
16
-
arbtitary data sources. Adapters loosely correspond to "connectors"
17
-
in the underlying compute engine (e.g. Flink Connectors), but they may
18
-
bring along additional _baggage_. For example, an adapter may bring
19
-
along a cache or a CDC stream as part of the resulting pipeline.
13
+
Hoptimator has a pluggable _adapter_ framework, which lets you wire-up arbtitary data sources. Adapters loosely correspond to connectors in the underlying compute engine (e.g. Flink Connectors), but they may include custom control plane logic. For example, an adapter may create a cache or a CDC stream as part of a pipeline. This enables a single pipeline to span multiple "hops" across different systems (as opposed to, say, a single Flink job).
14
+
15
+
Hoptimator's pipelines tend to have the following general shape:
The three data sources on the left correspond to three different adapters:
24
+
25
+
1.`topic1` can be read directly from a Flink job, so the first adapter simply configures a Flink connector.
26
+
2.`table2` is inefficient for bulk access, so the second adapter creates a CDC stream (`topic2`) and configures a Flink connector to read from _that_.
27
+
3.`table3` is in cold storage, so the third adapter creates a reverse-ETL job to re-ingest the data into Kafka.
28
+
29
+
In order to deploy such a pipeline, you only need to write one SQL query, called a _subscription_. Pipelines are constructed automatically based on subscriptions.
20
30
21
31
## Quick Start
22
32
@@ -54,4 +64,16 @@ You can verify the job is running by inspecting the output:
54
64
> !q
55
65
```
56
66
67
+
## The Operator
68
+
69
+
Hoptimator-operator is a Kubernetes operator that orchestrates multi-hop data pipelines based on Subscriptions (a custom resource). When a Subscription is deployed, the operator:
70
+
71
+
1. creates a _plan_ based on the Subscription SQL. The plan includes a set of _resources_ that make up a _pipeline_.
72
+
2. deploys each resource in the pipeline. This may involve creating Kafka topics, Flink jobs, etc.
73
+
3. reports Subscription status, which depends on the status of each resource in the pipeline.
74
+
75
+
The operator is extensible via _adapters_. Among other responsibilities, adapters can implement custom control plane logic (see `ControllerProvider`), or they can depend on external operators. For example, the Kafka adapter actively manages Kafka topics using a custom controller. The Flink adapter defers to [flink-kubernetes-operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/) to manage Flink jobs.
76
+
77
+
## The CLI
57
78
79
+
Hoptimator includes a SQL CLI based on [sqlline](https://github.com/julianhyde/sqlline). This is primarily for testing and debugging purposes, but it can also be useful for runnig ad-hoc queries. The CLI leverages the same adapters as the operator, but it doesn't deploy anything. Instead, queries run as local, in-process Flink jobs.
0 commit comments