Skip to content

Commit f748776

Browse files
authored
Update README.md (#26)
Merged without review.
1 parent 9569c43 commit f748776

File tree

1 file changed

+30
-8
lines changed

1 file changed

+30
-8
lines changed

README.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,27 @@ Multi-hop declarative data pipelines
66

77
Hoptimator is an SQL-based control plane for complex data pipelines.
88

9-
Hoptimator turns high-level SQL _subscriptions_ into low-level SQL
10-
_pipelines_. Pipelines may involve an auto-generated Flink job (or
11-
similar) and any arbitrary resources required for the job to run.
9+
Hoptimator turns high-level SQL _subscriptions_ into multi-hop data pipelines. Pipelines may involve an auto-generated Flink job (or similar) and any arbitrary resources required for the job to run.
1210

1311
## How does it work?
1412

15-
Hoptimator has a pluggable _adapter_ framework, which lets you wire-up
16-
arbtitary data sources. Adapters loosely correspond to "connectors"
17-
in the underlying compute engine (e.g. Flink Connectors), but they may
18-
bring along additional _baggage_. For example, an adapter may bring
19-
along a cache or a CDC stream as part of the resulting pipeline.
13+
Hoptimator has a pluggable _adapter_ framework, which lets you wire-up arbtitary data sources. Adapters loosely correspond to connectors in the underlying compute engine (e.g. Flink Connectors), but they may include custom control plane logic. For example, an adapter may create a cache or a CDC stream as part of a pipeline. This enables a single pipeline to span multiple "hops" across different systems (as opposed to, say, a single Flink job).
14+
15+
Hoptimator's pipelines tend to have the following general shape:
16+
17+
_________
18+
topic1 ----------------------> | |
19+
table2 --> CDC ---> topic2 --> | SQL job | --> topic4
20+
table3 --> rETL --> topic3 --> |_________|
21+
22+
23+
The three data sources on the left correspond to three different adapters:
24+
25+
1. `topic1` can be read directly from a Flink job, so the first adapter simply configures a Flink connector.
26+
2. `table2` is inefficient for bulk access, so the second adapter creates a CDC stream (`topic2`) and configures a Flink connector to read from _that_.
27+
3. `table3` is in cold storage, so the third adapter creates a reverse-ETL job to re-ingest the data into Kafka.
28+
29+
In order to deploy such a pipeline, you only need to write one SQL query, called a _subscription_. Pipelines are constructed automatically based on subscriptions.
2030

2131
## Quick Start
2232

@@ -54,4 +64,16 @@ You can verify the job is running by inspecting the output:
5464
> !q
5565
```
5666

67+
## The Operator
68+
69+
Hoptimator-operator is a Kubernetes operator that orchestrates multi-hop data pipelines based on Subscriptions (a custom resource). When a Subscription is deployed, the operator:
70+
71+
1. creates a _plan_ based on the Subscription SQL. The plan includes a set of _resources_ that make up a _pipeline_.
72+
2. deploys each resource in the pipeline. This may involve creating Kafka topics, Flink jobs, etc.
73+
3. reports Subscription status, which depends on the status of each resource in the pipeline.
74+
75+
The operator is extensible via _adapters_. Among other responsibilities, adapters can implement custom control plane logic (see `ControllerProvider`), or they can depend on external operators. For example, the Kafka adapter actively manages Kafka topics using a custom controller. The Flink adapter defers to [flink-kubernetes-operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/) to manage Flink jobs.
76+
77+
## The CLI
5778

79+
Hoptimator includes a SQL CLI based on [sqlline](https://github.com/julianhyde/sqlline). This is primarily for testing and debugging purposes, but it can also be useful for runnig ad-hoc queries. The CLI leverages the same adapters as the operator, but it doesn't deploy anything. Instead, queries run as local, in-process Flink jobs.

0 commit comments

Comments
 (0)