Releases: MobileTeleSystems/data-rentgen
0.4.8 (2025-01-26)
Fixed issue with updating Location's external_id field - server returned response code 200 but ignored the input value.
0.4.7 (2025-01-20)
Dependency-only updates.
0.4.6 (2025-01-12)
Dependency-only updates.
0.4.5 (2025-12-24)
Improvements
Allow disabling SessionMiddleware, as it only required by KeycloakAuthProvider.
0.4.4 (2025-11-21)
Bug Fixes
- 0.4.3 release broken inputs with 0 bytes statistics, fixed
0.4.3 (2025-11-21)
Features
- Disable
server.session.enabledby default. It is required only by KeycloakAuthProvider which is not used by default.
Bug Fixes
- Escape unprintable ASCII symbols in SQL queries before storing them in Postgres. Previously saving queries containing
\x00symbol lead to exceptions. - Kafka topic with malformed messages doesn't have to use the same number partitions as input topics.
- Prevent OpenLineage from reporting events which claim to read 8 Exabytes of data, this is actually a Spark quirk.
0.4.2 (2025-10-29)
Bug fixes
- Fix search query filter on UI Run list page.
- Fix passing multiple filters to
GET /v1/runs.
Doc only Changes
- Document
DATA_RENTGEN__UI__AUTH_PROVIDERconfig variable.
0.4.1 (2025-10-08)
Features
-
Add new
GET /v1/locations/typesendpoint returning list of all known location types. (#328) -
Add new filter to
GET /v1/jobs(#328):- location_type:
list[str]
- location_type:
-
Add new filter to
GET /v1/datasets(#328):- location_type:
list[str]
- location_type:
-
Allow passing multiple
location_typefilters toGET /v1/locations. (#328) -
Allow passing multiple values to
GETendpoinds with filters likejob_id,parent_run_id, and so on. (#329)
0.4.0 (2025-10-03)
Features
-
Introduce new
http2kafkacomponent. (#281)It allows using DataRentgen with OpenLineage HttpTransport. Authentication is done using personal tokens.
-
Add REST API endpoints for managing personal tokens. (#276)
-
List of endpoints:
GET /personal-tokens- get personal tokens for current user.POST /personal-tokens- create new personal token for current user.PATCH /personal-tokens/:id- refresh personal token (revoke token and create new one).DELETE /personal-tokens/:id- revoke personal token.
-
-
Add new entities
TagandTagValue. #268Tags can be used as additional properties for another entities. This feature is still under construction.
-
Added endpoint
GET /v1/tags. #289Tag names and values can be paginated, searched by, or fetched by ids.
Response example
[ { "id": 1, "name": "env", "values": [ { "id": 1, "value": "dev" }, { "id": 2, "value": "prod" } ] } ] -
Updated
GET /v1/datasetsto includetags: [...]in response. #289Dataset response examples
Before:
{ "id": "8400", "location": {...}, "name": "dataset_name", "schema": {}, }After:
{ "id": "25896", "location": {...}, "name": "dataset_name", "schema": {...}, "tags": [ # <--- { "id": "1", "name": "environment", "values": [ { "id": "2", "value": "production" } ] }, { "id": "2", "name": "team", "values": [ { "id": "4", "value": "my_awesome_team" } ] } ] } -
Added new filters to
GET /v1/datasetsendpoint. (#294, #289)-
Query params:
- location_id:
int - tag_value_id:
list[int]- if multiple values are passed, dataset should have all of them.
- location_id:
-
-
Added new filters for
GET /v1/jobsendpoint. #319-
Query params:
- location_id:
int - job_type:
list[str]
- location_id:
-
-
Added new filters to
GET /v1/runsendpoint. (#322, #323)-
Query params:
- job_type:
list[str] - status:
list[RunStatus] - started_since:
datetime | None - started_until:
datetime | None - ended_since:
datetime | None - ended_until:
datetime | None - job_location_id:
int | None - started_by_user:
list[str] | None
- job_type:
-
-
Added new endpoint
GET /v1/jobs/types. #319 -
Add custom
dataRentgen_runanddataRentgen_operationfacets. #265-
These facets allow to:
- Passing custom
external_id,persistent_log_urland other fields of Run. - Passing custom
name,description,group,posititionfields of Operation. - mark event as containing only Operation or both Run + Operation data.
- Passing custom
-
-
Set
output.typebased on executed SQL query, e.g.INSERT,UPDATE,DELETE, and so on. #310
Improvements
-
Improve consumer performance by reducing DB load on reading operations. #314
-
Add workaround if OpenLineage emitted Spark application event with
job.name=unknown. #263This requires installing OpenLineage with this fix merged: OpenLineage/OpenLineage#3848.
-
Dataset symlinks with no inputs/outputs are no longer removed from lineage graph. #269
-
Make matching for addresses and locations more deterministic by converting them to lowercase. #313
Items
oracle://host:1521andORACLE://HOST:1521are the same itemoracle://host:1521now. -
Make matching for datasets, jobs, tags and user names case-insensitive by using unique indexes on
lower(name)expression. #313Item
database.schema.tableandDATABASE.SCHEMA.TABLEare the same item now.As dataset canonical name depends on database naming convention (
UPPERCASEfor Oracle,lowercasefor Postgres), we can't convert them into one specific case (upper or lower). Instead we use first received value as canonical one.
Bug Fixes
-
For lineage with
granularity=DATASETreturn real lineage graph. #264v0.4.x resolved lineage by
run_id, but this may produce wrong lineage. v0.4.x now resolves lineage byoperation_id. -
Exclude self-referencing lineage edges in case
granularity=DATASET. #261If some run uses the same table as both input and output (e.g. merging duplicates or performing some checks before writing), DataRentgen excludes
dataset1 -> dataset1relations from lineage.This doesn't affect chains like
dataset1 -> job1 -> dataset1ordataset1 -> dataset2 -> dataset1.
0.3.1 (2025-07-04)
Breaking changes
- Drop
Dataset.formatfield.
Improvements
- Added syntax highlighting for SQL queries.