Skip to content

Commit 9043563

Browse files
committed
[DOP-28075] Add HttpTransport examples of integrations
1 parent 4c858f0 commit 9043563

File tree

9 files changed

+587
-249
lines changed

9 files changed

+587
-249
lines changed

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
"sphinx_favicon",
5858
"sphinx_last_updated_by_git",
5959
"sphinxarg.ext",
60+
"sphinx_tabs.tabs",
6061
]
6162

6263
numpydoc_show_class_members = True

docs/integrations/airflow/index.rst

Lines changed: 76 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,10 @@ Requirements
99
------------
1010

1111
* `Apache Airflow <https://airflow.apache.org/>`_ 2.x or 3.x
12-
* OpenLineage 1.19.0 or higher, recommended 1.34.0+
12+
* OpenLineage 1.19.0 or higher, recommended 1.37.0+
1313
* OpenLineage integration for Airflow (see below)
14+
* Running :ref:`message-broker`
15+
* (Optional) :ref:`http2kafka`
1416

1517
Entity mapping
1618
--------------
@@ -25,15 +27,27 @@ Install
2527

2628
* For Airflow 2.7 or higher, use `apache-airflow-providers-openlineage <https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html>`_ 1.9.0 or higher:
2729

28-
.. code:: console
30+
.. tabs::
2931

30-
$ pip install "apache-airflow-providers-openlineage>=2.3.0" "openlineage-python[kafka]>=1.34.0" zstd
32+
.. code-tab:: console KafkaTransport
33+
34+
$ pip install "apache-airflow-providers-openlineage>=2.6.1" "openlineage-python[kafka]>=1.37.0" zstd
35+
36+
.. code-tab:: console HttpTransport (requires HTTP2Kafka)
37+
38+
$ pip install "apache-airflow-providers-openlineage>=2.6.1"
3139

3240
* For Airflow 2.1.x-2.6.x, use `OpenLineage integration for Airflow <https://openlineage.io/docs/integrations/airflow/>`_ 1.19.0 or higher
3341

34-
.. code:: console
42+
.. tabs::
43+
44+
.. code-tab:: console KafkaTransport
45+
46+
$ pip install "openlineage-airflow>=1.37.0" "openlineage-python[kafka]>=1.37.0" zstd
3547

36-
$ pip install "openlineage-airflow>=1.34.0" "openlineage-python[kafka]>=1.34.0" zstd
48+
.. code-tab:: console HttpTransport (requires HTTP2Kafka)
49+
50+
$ pip install "openlineage-airflow>=1.37.0"
3751

3852
Setup
3953
-----
@@ -43,19 +57,36 @@ Via OpenLineage config file
4357

4458
* Create ``openlineage.yml`` file with content like:
4559

46-
.. code:: yaml
47-
48-
transport:
49-
type: kafka
50-
topic: input.runs
51-
config:
52-
bootstrap.servers: localhost:9093
53-
security.protocol: SASL_PLAINTEXT
54-
sasl.mechanism: SCRAM-SHA-256
55-
sasl.username: data_rentgen
56-
sasl.password: changeme
57-
compression.type: zstd
58-
acks: all
60+
.. tabs::
61+
62+
.. code-tab:: yaml KafkaTransport
63+
64+
transport:
65+
type: kafka
66+
topic: input.runs
67+
config:
68+
# should be accessible from Airflow scheduler
69+
bootstrap.servers: localhost:9093
70+
security.protocol: SASL_PLAINTEXT
71+
sasl.mechanism: SCRAM-SHA-256
72+
# Kafka auth credentials
73+
sasl.username: data_rentgen
74+
sasl.password: changeme
75+
compression.type: zstd
76+
acks: all
77+
78+
.. code-tab:: yaml HttpTransport (requires HTTP2Kafka)
79+
80+
transport:
81+
type: http
82+
# http2kafka URL, should be accessible from Airflow scheduler
83+
url: http://localhost:8002
84+
endpoint: /v1/openlineage
85+
compression: gzip
86+
auth:
87+
type: api_key
88+
# create a PersonalToken, and pass it here
89+
apiKey: personal_token_AAAAAAAAAAAA.BBBBBBBBBBBBBBBBBBBBBBB.CCCCCCCCCCCCCCCCCCCCC
5990

6091
* Pass path to config file via ``AIRFLOW__OPENLINEAGE__CONFIG_PATH`` environment variable:
6192

@@ -69,24 +100,45 @@ Via Airflow config file
69100

70101
Setup OpenLineage integration using ``airflow.cfg`` config file:
71102

72-
.. code:: ini
103+
.. tabs::
104+
105+
.. code-tab:: ini KafkaTransport
73106

74107
[openlineage]
75108
# set here address of Airflow Web UI
76109
namespace = http://airflow.hostname.fqdn:8080
77-
# set here Kafka connection address & credentials
110+
# set here Kafka broker address & auth credentials
78111
transport = {"type": "kafka", "config": {"bootstrap.servers": "localhost:9093", "security.protocol": "SASL_PLAINTEXT", "sasl.mechanism": "SCRAM-SHA-256", "sasl.username": "data_rentgen", "sasl.password": "changeme", "compression.type": "zstd", "acks": "all"}, "topic": "input.runs", "flush": true}
79112

113+
.. code-tab:: ini HttpTransport (requires HTTP2Kafka)
114+
115+
[openlineage]
116+
# set here address of Airflow Web UI
117+
namespace = http://airflow.hostname.fqdn:8080
118+
# set here HTTP2Kafka url & create PersonalToken
119+
transport = {"type": "http", "url": "http://localhost:8002", "endpoint": "/v1/openlineage", "compression": "gzip", "auth": {"type": "api_key", "apiKey": "personal_token_AAAAAAAAAAAA.BBBBBBBBBBBBBBBBBBBBBBB.CCCCCCCCCCCCCCCCCCCCC"}}
80120

81121
Via Airflow environment variables
82122
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
83123

84-
Set environment variables for all Airflow components (e.g. via ``docker-compose.yml``)
124+
Set environment variables for all Airflow components (e.g. via ``docker-compose.yml``). Depending on your shell, you may remove single quotes
85125

86-
.. code:: ini
126+
.. tabs::
127+
128+
.. code-tab:: bash KafkaTransport
129+
130+
# set here address of Airflow Web UI
131+
AIRFLOW__OPENLINEAGE__NAMESPACE='http://airflow.hostname.fqdn:8080'
132+
# set here Kafka broker address & auth credentials
133+
AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "kafka", "config": {"bootstrap.servers": "localhost:9093", "security.protocol": "SASL_PLAINTEXT", "sasl.mechanism": "SCRAM-SHA-256", "sasl.username": "data_rentgen", "sasl.password": "changeme", "compression.type": "zstd", "acks": "all"}, "topic": "input.runs", "flush": true}'
134+
135+
.. code-tab:: bash HttpTransport (requires HTTP2Kafka)
136+
137+
# set here address of Airflow Web UI
138+
AIRFLOW__OPENLINEAGE__NAMESPACE='http://airflow.hostname.fqdn:8080'
139+
# set here HTTP2Kafka url & create PersonalToken
140+
AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "http://localhost:8002", "endpoint": "/v1/openlineage", "compression": "gzip", "auth": {"type": "api_key", "apiKey": "personal_token_AAAAAAAAAAAA.BBBBBBBBBBBBBBBBBBBBBBB.CCCCCCCCCCCCCCCCCCCCC"}}'
87141

88-
AIRFLOW__OPENLINEAGE__NAMESPACE=http://airflow.hostname.fqdn:8080
89-
AIRFLOW__OPENLINEAGE__TRANSPORT={"type": "kafka", "config": {"bootstrap.servers": "localhost:9093", "security.protocol": "SASL_PLAINTEXT", "sasl.mechanism": "SCRAM-SHA-256", "sasl.username": "data_rentgen", "sasl.password": "changeme", "compression.type": "zstd", "acks": "all"}, "topic": "input.runs", "flush": true}
90142

91143
Airflow 2.1.x and 2.2.x
92144
~~~~~~~~~~~~~~~~~~~~~~~

docs/integrations/dbt/index.rst

Lines changed: 44 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ Requirements
99
------------
1010

1111
* `dbt <https://www.getdbt.com/>`_ 1.3 or higher
12-
* OpenLineage 1.19.0 or higher, recommended 1.34.0+
12+
* OpenLineage 1.19.0 or higher, recommended 1.37.0+
13+
* Running :ref:`message-broker`
14+
* (Optional) :ref:`http2kafka`
1315

1416
Entity mapping
1517
--------------
@@ -21,28 +23,52 @@ Entity mapping
2123
Install
2224
-------
2325

24-
.. code:: console
26+
.. tabs::
2527

26-
$ pip install "openlineage-dbt>=1.34.0" "openlineage-python[kafka]>=1.34.0" zstd
28+
.. code-tab:: console KafkaTransport
29+
30+
$ pip install "openlineage-dbt>=1.37.0" "openlineage-python[kafka]>=1.37.0" zstd
31+
32+
.. code-tab:: console HttpTransport (requires HTTP2Kafka)
33+
34+
$ pip install "openlineage-dbt>=1.37.0"
2735

2836
Setup
2937
-----
3038

31-
* Create ``openlineage.yml`` file with content below`:
32-
33-
.. code:: yaml
34-
35-
transport:
36-
type: kafka
37-
topic: input.runs
38-
config:
39-
bootstrap.servers: localhost:9093
40-
security.protocol: SASL_PLAINTEXT
41-
sasl.mechanism: SCRAM-SHA-256
42-
sasl.username: data_rentgen
43-
sasl.password: changeme
44-
compression.type: zstd
45-
acks: all
39+
* Create ``openlineage.yml`` file with content like:
40+
41+
.. tabs::
42+
43+
.. code-tab:: yaml KafkaTransport
44+
45+
transport:
46+
type: kafka
47+
topic: input.runs
48+
config:
49+
# should be accessible from host
50+
bootstrap.servers: localhost:9093
51+
security.protocol: SASL_PLAINTEXT
52+
sasl.mechanism: SCRAM-SHA-256
53+
# Kafka auth credentials
54+
sasl.username: data_rentgen
55+
sasl.password: changeme
56+
compression.type: zstd
57+
acks: all
58+
59+
.. code-tab:: yaml HttpTransport (requires HTTP2Kafka)
60+
61+
transport:
62+
# "type: http" for OpenLineage below 1.35.0
63+
type: async_http
64+
# http2kafka URL, should be accessible from host
65+
url: http://localhost:8002
66+
endpoint: /v1/openlineage
67+
compression: gzip
68+
auth:
69+
type: api_key
70+
# create a PersonalToken, and pass it here
71+
apiKey: personal_token_AAAAAAAAAAAA.BBBBBBBBBBBBBBBBBBBBBBB.CCCCCCCCCCCCCCCCCCCCC
4672

4773
* Set environment variables:
4874

docs/integrations/flink1/index.rst

Lines changed: 64 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ Requirements
99
------------
1010

1111
* `Apache Flink <https://flink.apache.org/>`_ 1.x
12-
* OpenLineage 1.31.0 or higher, recommended 1.34.0+
12+
* OpenLineage 1.31.0 or higher, recommended 1.37.0+
13+
* Running :ref:`message-broker`
14+
* (Optional) :ref:`http2kafka`
1315

1416
Limitations
1517
-----------
@@ -30,7 +32,8 @@ Installation
3032
.. code-block:: groovy
3133
:caption: build.gradle
3234
33-
implementation "io.openlineage:openlineage-flink:1.34.0"
35+
implementation "io.openlineage:openlineage-flink:1.37.0"
36+
// For KafkaTransport only
3437
implementation "org.apache.kafka:kafka-clients:3.9.0"
3538
3639
* Register ``OpenLineageFlinkJobListener`` in the code of your Flink job:
@@ -59,36 +62,68 @@ Setup
5962
6063
* Create ``openlineage.yml`` file with content like:
6164

62-
.. code-block:: yaml
63-
:caption: openlineage.yml
64-
65-
job:
66-
namespace: http://some.host.name:18081 # set namespace to match Flink address
67-
name: flink_examples_stateful # set job name
68-
69-
# Send RUNNING event every 1 hour.
70-
# Using default interval (1 minute) just floods Kafka with useless RUNNING events.
71-
trackingIntervalInSeconds: 3600
72-
73-
transport:
74-
type: kafka
75-
topicName: input.runs
76-
properties:
77-
bootstrap.servers: broker:9092 # not using localhost in docker
78-
security.protocol: SASL_PLAINTEXT
79-
sasl.mechanism: SCRAM-SHA-256
80-
sasl.jaas.config: |
81-
org.apache.kafka.common.security.scram.ScramLoginModule required
82-
username="data_rentgen"
83-
password="changeme";
84-
key.serializer: org.apache.kafka.common.serialization.StringSerializer
85-
value.serializer: org.apache.kafka.common.serialization.StringSerializer
86-
compression.type: zstd
87-
acks: all
65+
.. tabs::
66+
67+
.. code-tab:: yaml KafkaTransport
68+
:caption: openlineage.yaml
69+
70+
job:
71+
# set namespace to match Flink address
72+
namespace: http://some.host.name:18081
73+
# set job name
74+
name: flink_examples_stateful
75+
76+
# Send RUNNING event every 1 hour.
77+
# Using default interval (1 minute) just floods Kafka with useless RUNNING events.
78+
trackingIntervalInSeconds: 3600
79+
80+
transport:
81+
type: kafka
82+
topicName: input.runs
83+
properties:
84+
# should be accessible inside jobmanager container
85+
# not using localhost in docker!
86+
bootstrap.servers: broker:9092
87+
security.protocol: SASL_PLAINTEXT
88+
sasl.mechanism: SCRAM-SHA-256
89+
# Kafka auth credentials
90+
sasl.jaas.config: |
91+
org.apache.kafka.common.security.scram.ScramLoginModule required
92+
username="data_rentgen"
93+
password="changeme";
94+
key.serializer: org.apache.kafka.common.serialization.StringSerializer
95+
value.serializer: org.apache.kafka.common.serialization.StringSerializer
96+
compression.type: zstd
97+
acks: all
98+
99+
.. code-tab:: yaml HttpTransport (requires HTTP2Kafka)
100+
:caption: openlineage.yaml
101+
102+
job:
103+
# set namespace to match Flink address
104+
namespace: http://some.host.name:18081
105+
# set job name
106+
name: flink_examples_stateful
107+
108+
# Send RUNNING event every 1 hour.
109+
# Using default interval (1 minute) just floods Kafka with useless RUNNING events.
110+
trackingIntervalInSeconds: 3600
111+
112+
transport:
113+
type: http
114+
# should be accessible inside jobmanager container
115+
# not using localhost in docker!
116+
url: http://http2kafka:8000
117+
endpoint: /v1/openlineage
118+
compression: gzip
119+
auth:
120+
type: api_key
121+
# create a PersonalToken, and pass it here
122+
apiKey: personal_token_AAAAAAAAAAAA.BBBBBBBBBBBBBBBBBBBBBBB.CCCCCCCCCCCCCCCCCCCCC
88123

89124
* Pass path to config file via ``OPENLINEAGE_CONFIG`` environment variable of ``jobmanager``:
90125

91-
.. code:: ini
126+
.. code:: bash
92127
93128
OPENLINEAGE_CONFIG=/path/to/openlineage.yml
94129

0 commit comments

Comments
 (0)