|
1 | 1 | # Hoptimator
|
2 | 2 |
|
3 |
| -Multi-hop declarative data pipelines |
| 3 | +## Intro |
4 | 4 |
|
5 |
| -## What is Hoptimator? |
| 5 | +Hoptimator gives you a SQL interface to a Kubernetes cluster. You can install databases, query tables, create views, and deploy data pipelines using just SQL. |
6 | 6 |
|
7 |
| -Hoptimator is an SQL-based control plane for complex data pipelines. |
8 |
| - |
9 |
| -Hoptimator turns high-level SQL _subscriptions_ into multi-hop data pipelines. Pipelines may involve an auto-generated Flink job (or similar) and any arbitrary resources required for the job to run. |
10 |
| - |
11 |
| -## How does it work? |
12 |
| - |
13 |
| -Hoptimator has a pluggable _adapter_ framework, which lets you wire-up arbtitary data sources. Adapters loosely correspond to connectors in the underlying compute engine (e.g. Flink Connectors), but they may include custom control plane logic. For example, an adapter may create a cache or a CDC stream as part of a pipeline. This enables a single pipeline to span multiple "hops" across different systems (as opposed to, say, a single Flink job). |
14 |
| - |
15 |
| -Hoptimator's pipelines tend to have the following general shape: |
16 |
| - |
17 |
| - _________ |
18 |
| - topic1 ----------------------> | | |
19 |
| - table2 --> CDC ---> topic2 --> | SQL job | --> topic4 |
20 |
| - table3 --> rETL --> topic3 --> |_________| |
| 7 | +To install a database, use `kubectl`: |
21 | 8 |
|
| 9 | +``` |
| 10 | + $ kubectl apply -f my-database.yaml |
| 11 | +``` |
22 | 12 |
|
23 |
| -The three data sources on the left correspond to three different adapters: |
| 13 | +(`create database` is coming soon!) |
24 | 14 |
|
25 |
| -1. `topic1` can be read directly from a Flink job, so the first adapter simply configures a Flink connector. |
26 |
| -2. `table2` is inefficient for bulk access, so the second adapter creates a CDC stream (`topic2`) and configures a Flink connector to read from _that_. |
27 |
| -3. `table3` is in cold storage, so the third adapter creates a reverse-ETL job to re-ingest the data into Kafka. |
| 15 | +Then use Hoptimator DDL to create a materialized view: |
28 | 16 |
|
29 |
| -In order to deploy such a pipeline, you only need to write one SQL query, called a _subscription_. Pipelines are constructed automatically based on subscriptions. |
| 17 | +``` |
| 18 | + > create materialized view my.foo as select * from ads.page_views; |
| 19 | +``` |
30 | 20 |
|
31 |
| -## Quick Start |
| 21 | +Views created via DDL show up in Kubernetes as `views`: |
32 | 22 |
|
33 |
| -### Prerequistes |
| 23 | +``` |
| 24 | + $ kubectl get views |
| 25 | + NAME SCHEMA VIEW SQL |
| 26 | + my-foo MY FOO SELECT *... |
34 | 27 |
|
35 |
| -1. `docker` is installed and docker daemon is running |
36 |
| -2. `kubectl` is installed and cluster is running |
37 |
| - 1. `minikube` can be used for a local cluster |
38 |
| -3. `helm` for Kubernetes is installed |
| 28 | +``` |
39 | 29 |
|
40 |
| -### Run |
| 30 | +Materialized views result in `pipelines`: |
41 | 31 |
|
42 | 32 | ```
|
43 |
| - $ make quickstart |
44 |
| - ... wait a while ... |
45 |
| - $ ./bin/hoptimator |
46 |
| - > !intro |
47 |
| - > !q |
| 33 | + $ kubectl get pipelines |
| 34 | + NAME SQL STATUS |
| 35 | + my-foo INSERT INTO... Ready. |
48 | 36 | ```
|
49 | 37 |
|
50 |
| -## Subscriptions |
| 38 | +## Quickstart |
51 | 39 |
|
52 |
| -Subscriptions are SQL views that are automatically materialized by a pipeline. |
| 40 | +Hoptimator requires a Kubernetes cluster. To connect from outside a Kubernetes cluster, make sure your `kubectl` is properly configured. |
53 | 41 |
|
54 | 42 | ```
|
55 |
| - $ kubectl apply -f deploy/samples/subscriptions.yaml |
| 43 | + $ make install # build and install SQL CLI |
| 44 | + $ make deploy deploy-demo # install CRDs and K8s objects |
| 45 | + $ ./hoptimator |
| 46 | + > !intro |
56 | 47 | ```
|
57 | 48 |
|
58 |
| -In response, the operator will deploy a Flink job and other resources: |
| 49 | +## The SQL CLI |
59 | 50 |
|
60 |
| -``` |
61 |
| - $ kubectl get subscriptions |
62 |
| - $ kubectl get flinkdeployments |
63 |
| - $ kubectl get kafkatopics |
64 |
| -``` |
| 51 | +The `./hoptimator` script launches the [sqlline](https://github.com/julianhyde/sqlline) SQL CLI pre-configured to connect to `jdbc:hoptimator://`. The CLI includes some additional commands. See `!intro`. |
65 | 52 |
|
66 |
| -You can verify the job is running by inspecting the output: |
| 53 | +## The JDBC Driver |
67 | 54 |
|
68 |
| -``` |
69 |
| - $ ./bin/hoptimator |
70 |
| - > !tables |
71 |
| - > SELECT * FROM RAWKAFKA."products" LIMIT 5; |
72 |
| - > !q |
73 |
| -``` |
| 55 | +To use Hoptimator from Java code, or from anything that supports JDBC, use the `jdbc:hoptimator://` JDBC driver. |
74 | 56 |
|
75 | 57 | ## The Operator
|
76 | 58 |
|
77 |
| -Hoptimator-operator is a Kubernetes operator that orchestrates multi-hop data pipelines based on Subscriptions (a custom resource). When a Subscription is deployed, the operator: |
| 59 | +`hoptimator-operator` turns materialized views into real data pipelines. |
78 | 60 |
|
79 |
| -1. creates a _plan_ based on the Subscription SQL. The plan includes a set of _resources_ that make up a _pipeline_. |
80 |
| -2. deploys each resource in the pipeline. This may involve creating Kafka topics, Flink jobs, etc. |
81 |
| -3. reports Subscription status, which depends on the status of each resource in the pipeline. |
| 61 | +## Extending Hoptimator |
82 | 62 |
|
83 |
| -The operator is extensible via _adapters_. Among other responsibilities, adapters can implement custom control plane logic (see `ControllerProvider`), or they can depend on external operators. For example, the Kafka adapter actively manages Kafka topics using a custom controller. The Flink adapter defers to [flink-kubernetes-operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/) to manage Flink jobs. |
| 63 | +Hoptimator can be extended via `TableTemplates`: |
84 | 64 |
|
85 |
| -## The CLI |
| 65 | +``` |
| 66 | + $ kubectl apply -f my-table-template.yaml |
| 67 | +``` |
86 | 68 |
|
87 |
| -Hoptimator includes a SQL CLI based on [sqlline](https://github.com/julianhyde/sqlline). This is primarily for testing and debugging purposes, but it can also be useful for runnig ad-hoc queries. The CLI leverages the same adapters as the operator, but it doesn't deploy anything. Instead, queries run as local, in-process Flink jobs. |
0 commit comments