feat: consumer schema validation for avro#59
Merged
Conversation
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
KeranYang
reviewed
Feb 18, 2026
src/main/java/io/numaproj/pulsar/config/consumer/PulsarConsumerProperties.java
Outdated
Show resolved
Hide resolved
src/main/java/io/numaproj/pulsar/config/consumer/PulsarConsumerProperties.java
Show resolved
Hide resolved
src/main/java/io/numaproj/pulsar/consumer/GenericRecordToBytes.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
KeranYang
reviewed
Feb 19, 2026
src/main/java/io/numaproj/pulsar/consumer/PulsarConsumerManager.java
Outdated
Show resolved
Hide resolved
src/main/java/io/numaproj/pulsar/consumer/PulsarConsumerManager.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
KeranYang
reviewed
Feb 23, 2026
| bytesConsumer = pulsarClient.newConsumer(Schema.BYTES) | ||
| .loadConf(pulsarConsumerProperties.getConsumerConfig()) | ||
| .batchReceivePolicy(batchPolicy) | ||
| .subscriptionType(SubscriptionType.Shared) |
Contributor
There was a problem hiding this comment.
Suggested change
| .subscriptionType(SubscriptionType.Shared) | |
| .subscriptionType(SubscriptionType.Shared) // Must be shared to support multiple pods |
| .loadConf(pulsarConsumerProperties.getConsumerConfig()) | ||
| .batchReceivePolicy(batchPolicy) | ||
| .subscriptionType(SubscriptionType.Shared) // Must be shared to support multiple pods | ||
| .subscriptionType(SubscriptionType.Shared) |
Contributor
There was a problem hiding this comment.
Suggested change
| .subscriptionType(SubscriptionType.Shared) | |
| .subscriptionType(SubscriptionType.Shared) // Must be shared to support multiple pods |
| * Returns the consumer used for receiving and acknowledgment, creating it if necessary. | ||
| * Matches useAutoConsumeSchema. Never null. | ||
| */ | ||
| public Consumer<?> getConsumer() throws PulsarClientException { |
Contributor
There was a problem hiding this comment.
There are two methods caller can call to get a byte array consumer. Please keep one. See if we can mark getOrCreateGenericRecordConsumer and getOrCreateGenericRecordConsumer as private.
Comment on lines
+146
to
+148
| if (t instanceof SchemaSerializationException | ||
| || t instanceof IOException | ||
| || t instanceof UnsupportedOperationException) { |
Contributor
There was a problem hiding this comment.
Please document more details about each of these exceptions. When will it be thrown under a schema validation case.
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
KeranYang
reviewed
Feb 25, 2026
| .maxNumMessages((int) count) | ||
| .timeout((int) timeoutMillis, TimeUnit.MILLISECONDS) // We do not expect user to specify a number larger | ||
| // than 2^63 - 1 which will cause an overflow | ||
| .timeout((int) timeoutMillis, TimeUnit.MILLISECONDS) |
Contributor
There was a problem hiding this comment.
Please keep the original comments.
| consumerManagerMock = mock(PulsarConsumerManager.class); | ||
| consumerMock = mock(Consumer.class); | ||
| // Inject the mocked PulsarConsumerManager into pulsarSource using | ||
| // Inject the mocked PulsarConsumerManager into pulsarSource using |
Contributor
There was a problem hiding this comment.
Suggested change
| // Inject the mocked PulsarConsumerManager into pulsarSource using | |
| // Inject the mocked PulsarConsumerManager into pulsarSource using |
Contributor
There was a problem hiding this comment.
Consider configuring Format on Save.
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
KeranYang
approved these changes
Feb 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
when consumer receives messages from topic that is registered with an avro schema, will perform validation to ensure that correct fields are defined in message payload, and that correct format is used (not json etc). uses apache pulsar client consumer AUTO_CONSUME function.
Testing notes:
Figure 1: application fails after pipeline consumes methods that don't match schema registered to topic

Tested happy path where messages matched topic schema as well, verified that correct message logs were being consumed, also sink shows malformed json because it cannot deserialize avro bytes
