Skip to content

Commit 58227d2

Browse files
committed
[DOP-23708] Update README
1 parent 6c10679 commit 58227d2

File tree

1 file changed

+21
-9
lines changed

1 file changed

+21
-9
lines changed

README.rst

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,20 +38,32 @@ Data.Rentgen is a Data Motion Lineage service, compatible with `OpenLineage <htt
3838
Goals
3939
-----
4040

41-
* Collect lineage events produced by OpenLineage clients & integrations (Spark, Airflow).
42-
* Support consuming large amounts of lineage events, by using Kafka as event buffer and storing data in tables partitioned by event timestamp.
43-
* Store operation-grained events (instead of job grained `Marquez <https://marquezproject.ai/>`_), for better detalization.
44-
* Provide API for fetching run ↔ dataset lineage.
45-
* Allow building lineage graph with specific time boundaries (unlike Marquez there lineage is build only for last job run).
46-
* Allow building lineage graph with different granularity. e.g. merge all individual Spark operations into Spark applicationId or Spark applicationName.
47-
* Include column-level lineage into lineage graph.
41+
* Collect lineage events produced by OpenLineage clients & integrations.
42+
* Store operation-grained events for better detalization (instead of job grained `Marquez <https://marquezproject.ai/>`_).
43+
* Provide API for fetching job/run ↔ dataset lineage, not dataset ↔ dataset lineage (like `Datahub <https://datahubproject.io/>`_ and `OpenMetadata <https://open-metadata.org/>`_).
44+
45+
Features
46+
--------
47+
48+
* Support consuming large amounts of lineage events, use Apache Kafka as event buffer.
49+
* Store data in tables partitioned by event timestamp, to speed up lineage graph resolution.
50+
* Lineage graph is build with user-specified time boundaries (unlike Marquez where lineage is build only for last job run).
51+
* Lineage graph can be build with different granularity. e.g. merge all individual Spark operations into Spark applicationId or Spark applicationName.
52+
* Column-level lineage support.
53+
* Authentication support.
4854

4955
Non-goals
5056
---------
5157

52-
* This is **not** a Data Catalog. Use `Datahub <https://datahubproject.io/>`_ or `OpenMetadata <https://open-metadata.org/>`_ instead.
58+
* This is **not** a Data Catalog, DataRentgen doesn't track dataset schema change, owner and so on. Use Datahub or OpenMetadata instead.
5359
* Static Data Lineage like view → table is not supported.
54-
* Job/run/operation are always a part of lineage graph. Hiding them to produce dataset → dataset lineage is not supported for now.
60+
61+
Limitations
62+
-----------
63+
64+
* For now, only Apache Spark and Apache Airflow are supported as lineage event sources.
65+
OpenLineage also supports Apache Flink, DBT, Trino and others. DataRentgen support may be added later.
66+
* Unlike Marquez, DataRentgen parses only limited set of facets send by OpenLineage, and doesn't store custom facets. This can be changed in future.
5567

5668
.. documentation
5769

0 commit comments

Comments
 (0)