Fixes 25437: For kafka message consuming, switch to using poll() instead of consume()#25838
Conversation
…ializingConsumer.consume() which is not implemented, switched to .poll() instead.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| messages = self.consumer_client.consume(num_messages=10, timeout=10) | ||
| # DeserializingConsumer does not implement consume(), use poll() in a loop instead. | ||
| messages = [] | ||
| n_poll = 10 |
There was a problem hiding this comment.
💡 Quality: Magic numbers for poll count and timeout should be named constants
The values n_poll = 10 and total_timeout = 10 are local variables, but they mirror the original consume(num_messages=10, timeout=10) behavior and are configuration-like values. Consider extracting them as class-level or module-level constants for better discoverability and consistency, e.g., _SAMPLE_DATA_MAX_MESSAGES = 10 and _SAMPLE_DATA_POLL_TIMEOUT_SECS = 10. This is a minor style point - the current implementation is functional and correct.
Was this helpful? React with 👍 / 👎
🔍 CI failure analysis for 8ba77f7: CI failures have two root causes: (1) missing 'safe to test' label blocking most jobs, and (2) infrastructure timeout in py-run-build-tests waiting for labeler check.IssueCI failures are now showing two distinct failure patterns affecting 20 jobs total. Root CausesRoot Cause 1: Missing 'safe to test' Label (Primary) Most CI jobs include a "Verify PR labels" step that validates the PR has the Root Cause 2: Infrastructure Timeout (Secondary) The This job waits for the label verification check to complete before proceeding. Due to network connectivity issues with GitHub API, it timed out after multiple retry attempts. DetailsPR Status:
Failed Jobs Breakdown (20 total failures): Blocked at label verification (primary failures):
Infrastructure timeout (secondary failure):
Dependent job failures:
No actual code validation has occurred - all test execution, builds, linting, and security scans are blocked. ContextThis PR has been:
The infrastructure timeout in Code Review 👍 Approved with suggestions 0 resolved / 1 findingsSolid bug fix correctly replacing the unimplemented 💡 Quality: Magic numbers for poll count and timeout should be named constants📄 ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py:299 The values Tip Comment OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Describe your changes:
Fixes #25437: For kafka message consuming, switch to using poll() instead of consume().
DeserializingConsumerdoes not implement consume, it raises aNotImplementedError. Instead of using that, switch to using poll. Inyield_topic_sample_datainCommonBrokerSource.See here for the documentation about DeserializingConsumer.consume.
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Bug fix
I need help with this. I'm not too familiar with the codebase.
Summary by Gitar
consume()API call with a polling loop that callspoll()up to 10 times, collecting messages individually while respecting a 10-second total timeout usingtime.monotonic()deadline trackingConsumeError,KeyDeserializationError, andValueDeserializationErrorexceptions, logging warnings but continuing to collect additional valid messages instead of failing completelyCommonBrokerSource.yield_topic_sample_data()iningestion/src/metadata/ingestion/source/messaging/common_broker_source.py:294whereconfluent_kafka.DeserializingConsumer.consume()raisesNotImplementedErrorbecause the method is not implemented in the library