Skip to content

Commit 4c272d2

Browse files
Release 0.24.2 (#100)
* FIX #100 - Bump version and CHANGELOG for release 0.24.2 * Updated changelog --------- Co-authored-by: Pawel Jurkiewicz <[email protected]>
1 parent 27e8436 commit 4c272d2

File tree

6 files changed

+165
-5
lines changed

6 files changed

+165
-5
lines changed

CHANGELOG.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,16 @@
22

33
## [Unreleased]
44

5+
## [0.24.2] - 2023-04-14
6+
7+
- Added Airbyte integration documentation
8+
59
## [0.24.1] - 2023-03-15
610

11+
### Fixed
12+
13+
- `dp` commands failing when BI config was missing.
14+
715
## [0.24.0] - 2022-12-16
816

917
- Airbyte integration
@@ -235,7 +243,9 @@
235243
- Draft of `dp init`, `dp create`, `dp template new`, `dp template list` and `dp dbt`
236244
- Draft of `dp compile` and `dp deploy`
237245

238-
[Unreleased]: https://github.com/getindata/data-pipelines-cli/compare/0.24.1...HEAD
246+
[Unreleased]: https://github.com/getindata/data-pipelines-cli/compare/0.24.2...HEAD
247+
248+
[0.24.2]: https://github.com/getindata/data-pipelines-cli/compare/0.24.1...0.24.2
239249

240250
[0.24.1]: https://github.com/getindata/data-pipelines-cli/compare/0.24.0...0.24.1
241251

data_pipelines_cli/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@
55
pipelines.
66
"""
77

8-
version = "0.24.1"
8+
version = "0.24.2"

docs/configuration.rst

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,143 @@ Data governance configuration
166166
**dp** can sends **dbt** metadata to DataHub. All related configuration is stored in ``config/<ENV>/datahub.yml`` file.
167167
More information about it can be found `here <https://datahubproject.io/docs/metadata-ingestion#recipes>`_ and `here <https://datahubproject.io/docs/generated/ingestion/sources/dbt>`_.
168168

169+
Data ingestion configuration
170+
++++++++++++++++++++++++++++++
171+
172+
Ingestion configuration is divided into two levels:
173+
174+
- General: ``config/<ENV>/ingestion.yml``
175+
- Ingestion tool related: e.g. ``config/<ENV>/airbyte.yml``
176+
177+
``config/<ENV>/ingestion.yml`` contains basic configuration of ingestion:
178+
179+
.. list-table::
180+
:widths: 25 20 55
181+
:header-rows: 1
182+
183+
* - Parameter
184+
- Data type
185+
- Description
186+
* - enable
187+
- bool
188+
- Flag for enable/disable ingestion option in **dp**.
189+
* - engine
190+
- enum string
191+
- Ingestion tool you would like to integrate with (currently the only supported value is ``airbyte``).
192+
193+
``config/<ENV>/airbyte.yml`` must be present if engine of your choice is ``airbyte``. It consists of two parts:
194+
195+
1. First part is required by `dbt-airflow-factory <https://github.com/getindata/dbt-airflow-factory>`_
196+
and must be present in order to create ingestion tasks preceding dbt rebuild in Airflow. When you choose to manage
197+
Airbyte connections with `dp` tool, ``connectionId`` is unknown at the time of coding however `dp` tool is ready to
198+
handle this case. For detailed info reference example ``airbyte.yml`` at the end of this section.
199+
200+
.. list-table::
201+
:widths: 25 20 55
202+
:header-rows: 1
203+
204+
* - Parameter
205+
- Data type
206+
- Description
207+
* - airbyte_connection_id
208+
- string
209+
- Name of Airbyte connection in Airflow
210+
* - tasks
211+
- array<*task*>
212+
- Configurations of Airflow tasks used by `dbt-airflow-factory <https://github.com/getindata/dbt-airflow-factory>`_.
213+
Allowed *task* options are documented `here <https://dbt-airflow-factory.readthedocs.io/en/latest/configuration.html#id3>`_.
214+
215+
2. Second part is used directly by `dp` tool to manage (insert or update) connections in Airbyte. It is **not** required
216+
unless you would like to manage Airbyte connections with `dp` tool.
217+
218+
.. list-table::
219+
:widths: 25 20 55
220+
:header-rows: 1
221+
222+
* - Parameter
223+
- Data type
224+
- Description
225+
* - airbyte_url
226+
- string
227+
- Https address of Airbyte deployment that allows to connect to Airbyte API
228+
* - connections
229+
- array<*connection*>
230+
- Configurations of Airbyte connections that should be upserted during CI/CD. Minimal connection schema is documented below.
231+
These configurations are passed directly to Airbyte API to the `connections/create` or `connections/update` endpoint.
232+
Please reference
233+
`Airbyte API reference <https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/connections/create>`_
234+
for more detailed configuration.
235+
236+
.. code-block:: text
237+
238+
YOUR_CONNECTION_NAME: string
239+
name: string Optional name of the connection
240+
sourceId: uuid UUID of Airbyte source used for this connection
241+
destinationId: uuid UUID of Airbyte destination used for this connection
242+
namespaceDefinition: enum Method used for computing final namespace in destination
243+
namespaceFormat: string Used when namespaceDefinition is 'customformat'
244+
status: enum `active` means that data is flowing through the connection. `inactive` means it is not
245+
syncCatalog: object Describes the available schema (catalog).
246+
streams: array
247+
- stream: object
248+
name: string Stream's name
249+
jsonSchema: object Stream schema using Json Schema specs.
250+
config:
251+
syncMode: enum Allowed: full_refresh | incremental
252+
destinationSyncMode: enum Allowed: append | overwrite | append_dedup
253+
aliasName: string Alias name to the stream to be used in the destination
254+
255+
Example ``airbyte.yml`` might look like the following. Notice (highlighted lines) how connection name in ``connections``
256+
array has the same name as the environmental variable in `task[0].connection_id` attribute. During CI/CD, after the
257+
connection creation in Airbyte, variable ``${POSTGRES_BQ_CONNECTION}`` is substituted by the received Airbyte
258+
connection UUID and passed in config to dbt-airflow-factory tool.
259+
260+
.. code-block:: yaml
261+
:linenos:
262+
:emphasize-lines: 6,13
263+
264+
# dbt-airflow-factory configuration properties:
265+
airbyte_connection_id: airbyte_connection_id
266+
tasks:
267+
- api_version: v1
268+
asyncronous: false
269+
connection_id: ${POSTGRES_BQ_CONNECTION}
270+
task_id: postgres_bq_connection_sync_task
271+
timeout: 600
272+
wait_seconds: 3
273+
# Airbyte connection managing properties:
274+
airbyte_url: https://airbyte-dev.company.com
275+
connections:
276+
POSTGRES_BQ_CONNECTION:
277+
name: postgres_bq_connection
278+
sourceId: c3aa49f0-90dd-4c8e-9641-505a2f6cb65c
279+
destinationId: 3f47dbf1-11f3-41b0-945f-9463c82f711b
280+
namespaceDefinition: customformat
281+
namespaceFormat: ingestion_pg
282+
status: active
283+
syncCatalog:
284+
streams:
285+
- stream:
286+
name: raw_orders
287+
jsonSchema:
288+
properties:
289+
id:
290+
airbyte_type: integer
291+
type: number
292+
order_date:
293+
format: date
294+
type: string
295+
status:
296+
type: string
297+
user_id:
298+
airbyte_type: integer
299+
type: number
300+
type: object
301+
config:
302+
syncMode: full_refresh
303+
destinationSyncMode: append
304+
aliasName: raw_orders
305+
169306
Business Intelligence configuration
170307
++++++++++++++++++++++++++++++
171308

@@ -226,3 +363,16 @@ BI configuration is divided into two levels:
226363
* - looker_instance_url
227364
- string
228365
- URL for you Looker instance
366+
367+
Example ``looker.yml`` file might look like this:
368+
369+
.. code-block:: yaml
370+
:linenos:
371+
372+
looker_repository: [email protected]:company/looker/pipeline-example-looker.git
373+
looker_repository_username: "{{ env_var('LOOKER_REPO_USERNAME') }}"
374+
looker_repository_email: [email protected]
375+
looker_project_id: my_looker_project
376+
looker_webhook_secret: "{{ env_var('LOOKER_WEBHOOK_SECRET') }}"
377+
looker_repository_branch: main
378+
looker_instance_url: https://looker.company.com/

docs/integration.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ integration of the created project with the `VS Code` plugin for **dbt** managem
6464
Airbyte
6565
++++++++++++++++++++++++++++++++++++++++++++++
6666

67-
Under development
67+
`Data Pipelines CLI` can manage Airbyte connections and execute their syncs in Airflow tasks preceding dbt build.
6868

6969
Looker
7070
++++++++++++++++++++++++++++++++++++++++++++++

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.24.1
2+
current_version = 0.24.2
33

44
[bumpversion:file:setup.py]
55

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@
6767

6868
setup(
6969
name="data_pipelines_cli",
70-
version="0.24.1",
70+
version="0.24.2",
7171
description="CLI for data platform",
7272
long_description=README,
7373
long_description_content_type="text/markdown",

0 commit comments

Comments
 (0)