Skip to content

feat: defaults should be None to not rewrite already present topic sc…#25967

Open
schulzrol wants to merge 1 commit intoopen-metadata:mainfrom
schulzrol:rs-defaults-register-topic
Open

feat: defaults should be None to not rewrite already present topic sc…#25967
schulzrol wants to merge 1 commit intoopen-metadata:mainfrom
schulzrol:rs-defaults-register-topic

Conversation

@schulzrol
Copy link

Describe your changes:

I worked on software that enriches schemas for Topics metadata already present by/after the provided OpenMetadata Kafka messaging service connector

The issue i was having was with the fact that the versioning for such registered-then-enriched topics didn't correspond to changes to the schema or topic, before setting topic schema to empty strings/lists, it should left already present metadata alone and only supply what it knows (could be configurable to always rewrite with empty or just patch with Kafka metadata without schema)

Type of change:

  • Improvement

Checklist:

  • I have read the CONTRIBUTING document.

  • My PR title is Fixes <issue-number>: <short explanation>

  • I have commented on my code, particularly in hard-to-understand areas.

  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

  • I have added tests around the new logic.

  • For connector/ingestion changes: I updated the documentation.

@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link

gitar-bot bot commented Feb 18, 2026

🔍 CI failure analysis for 2abc658: All CI failures are due to a missing 'safe to test' label on the PR. This is a security gate that blocks CI execution before any code is built or tested.

Issue

All CI jobs are failing at the label verification step before any actual testing or building occurs.

Root Cause

The OpenMetadata CI pipeline requires PRs to have a "safe to test" label as a security measure to prevent running untrusted code. All failing jobs show:

Error! This pull request does not contain any of the valid labels: ['safe to test']
Exiting with an error code

Details

The label check runs before any code execution:

  • Python jobs: unit tests, integration tests (all shards), checkstyle, build tests
  • Java jobs: build-and-scan, java-checkstyle
  • E2E jobs: all 6 Playwright test shards
  • Build verification jobs: all variants

None of these jobs proceeded to actual code validation because the label gate blocked them.

Relationship to Code Changes

These failures are completely unrelated to the PR changes. The PR modifies one line in ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py (changing Topic(schemaText="", schemaType=SchemaType.Other, schemaFields=[]) to Topic()). This code has not been tested yet because the CI never reached the testing phase.

Code Review ⚠️ Changes requested 0 resolved / 1 findings

The change from Topic(schemaText="", schemaType=SchemaType.Other, schemaFields=[]) to Topic() doesn't fully achieve the stated goal of preserving existing metadata. The server-side updater still processes non-null messageSchema objects, so defaults from Topic() will still overwrite existing schemas. The else block should be removed entirely (or messageSchema left unset) so the server skips schema updates when no schema is found.

⚠️ Bug: Topic() still overwrites existing schema; doesn't achieve PR goal

📄 ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py:149

The PR's stated goal is to preserve already-present topic schema metadata when the connector doesn't find a schema in the registry. However, topic.messageSchema = Topic() still creates a non-null Topic object with default values (schemaText=None, schemaType="None", schemaFields=[]).

Looking at the server-side TopicUpdater.entitySpecificUpdate() in TopicRepository.java (line 607), the update logic checks if (updated.getMessageSchema() != null) — and since Topic() is not null, the server will proceed to compare and record changes:

  1. schemaText: null vs existing → records a change, clears existing schema text
  2. schemaType: "None" vs existing (e.g., "Avro") → records a change, overwrites type
  3. schemaFields: [] vs existing fields → records a change, clears existing fields

This means existing enriched schema metadata will still be overwritten on subsequent connector runs, just with slightly different default values than before.

To truly preserve existing metadata when no schema is found, messageSchema should not be set at all (leaving it as None on the CreateTopicRequest), so that the server-side null check skips the schema update entirely:

            else:
                pass  # No schema found; leave messageSchema unset to preserve existing metadata

Or equivalently, simply remove the entire else block so topic.messageSchema remains at its default None value from the CreateTopicRequest initialization.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

topic.messageSchema = Topic(
schemaText="", schemaType=SchemaType.Other, schemaFields=[]
)
topic.messageSchema = Topic()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Topic() still overwrites existing schema; doesn't achieve PR goal

The PR's stated goal is to preserve already-present topic schema metadata when the connector doesn't find a schema in the registry. However, topic.messageSchema = Topic() still creates a non-null Topic object with default values (schemaText=None, schemaType="None", schemaFields=[]).

Looking at the server-side TopicUpdater.entitySpecificUpdate() in TopicRepository.java (line 607), the update logic checks if (updated.getMessageSchema() != null) — and since Topic() is not null, the server will proceed to compare and record changes:

  1. schemaText: null vs existing → records a change, clears existing schema text
  2. schemaType: "None" vs existing (e.g., "Avro") → records a change, overwrites type
  3. schemaFields: [] vs existing fields → records a change, clears existing fields

This means existing enriched schema metadata will still be overwritten on subsequent connector runs, just with slightly different default values than before.

To truly preserve existing metadata when no schema is found, messageSchema should not be set at all (leaving it as None on the CreateTopicRequest), so that the server-side null check skips the schema update entirely:

            else:
                pass  # No schema found; leave messageSchema unset to preserve existing metadata

Or equivalently, simply remove the entire else block so topic.messageSchema remains at its default None value from the CreateTopicRequest initialization.

Was this helpful? React with 👍 / 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments