Set up type hinting: add fixes in schema-registry module (mostly already typed) #2107

fangnx · 2025-10-17T21:05:26Z

What

Follow-up PR after #2041: adding missing types + correcting existing types, according to mypy static checker, in schema-registry module

What's left are functions that might require refactoring and more thorough investigation to get the types right:

common/avro.py, json_schema.py, protobuf.py: schemas are defined as unions, and we need to add guards to verify types accordingly during transformations
schema_registry_client: handling None return types for several functions
rules/cel: adding types for CEL operations
rules/encryption: handling None values propagated through functions. Need to understand the logic better

Checklist

Contains customer facing changes? Including API/behavior changes
Did you add sufficient unit test and/or integration test coverage for this PR?
- If not, please explain why it is not required

References

JIRA: https://confluentinc.atlassian.net/browse/DGS-22076

Test & Review

Open questions / Follow-ups

confluent-cla-assistant · 2025-10-17T21:05:38Z

🎉 All Contributor License Agreements have been signed. Ready to merge.
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

fangnx · 2025-10-21T19:01:23Z

src/confluent_kafka/schema_registry/rules/encryption/hcvault/hcvault_client.py


    def __init__(
-        self, key_uri: Optional[str], token: Optional[str], ns: Optional[str] = None,
+        self, key_uri: str, token: Optional[str], ns: Optional[str] = None,


I removed the optional part as we are already checking emptiness for key_url in https://github.com/confluentinc/confluent-kafka-python/blob/master/src/confluent_kafka/schema_registry/rules/encryption/hcvault/hcvault_driver.py#L53, and I think HcVaultKmsClient is supposed to be only created with a specified key_uri, according to the function doc

Not sure if this is considered a breaking change (if customers initializes HcVaultKmsClient directly in their code). @rayokota would love to hear your thoughts on this

fangnx · 2025-10-21T19:02:02Z

src/confluent_kafka/schema_registry/rules/encryption/azurekms/azure_client.py


    def __init__(
-        self, key_uri: Optional[str], credentials: TokenCredential
+        self, key_uri: str, credentials: TokenCredential


Similar change as the one in hcvault_client.py

fangnx · 2025-10-21T20:52:38Z

src/confluent_kafka/schema_registry/_sync/avro.py

            return parsed_schema

        named_schemas = _resolve_named_schema(schema, self._registry)
+        if schema.schema_str is None:


In practice the schema_str field should never be empty, and even it is I think it makes sense to raise the error to fail early here

MSeal

A couple questions on the PR. We are introducing a lot of ignore comments in places like avro that feel off but maybe not worth tackling in this pass anyway.

MSeal · 2025-10-21T23:19:10Z

src/confluent_kafka/schema_registry/_async/avro.py

-            referenced_schema = await schema_registry_client.get_version(ref.subject, ref.version, True)
-            ref_named_schemas = await _resolve_named_schema(referenced_schema.schema, schema_registry_client)
+            # References in registered schemas are validated by server to be complete
+            referenced_schema = await schema_registry_client.get_version(ref.subject, ref.version, True)  # type: ignore[arg-type]


Why is there a need for type ignoring here? Can we set the ref/referenced_schema type to avoid this?

In practice, subject and version of SchemaReference never be None, but probably for historical reason (or tech debt) they have been typed as optional. I think updating the types will be a breaking change

Here, get_version() requires non-empty subject and version, so that's why I added the type ignore there. Alternatively we can do:

if ref.subject is None or ref.version is None: # maybe log something continue referenced_schema = await schema_registry_client.get_version(ref.subject, ref.version, True)

MSeal · 2025-10-21T23:21:26Z

src/confluent_kafka/schema_registry/_async/avro.py

-                schema_name = parsed_schema.get("name", schema_dict.get("type"))
+                if schema.schema_str is not None:
+                    schema_dict = json.loads(schema.schema_str)
+                    schema_name = parsed_schema.get("name", schema_dict.get("type"))  # type: ignore[union-attr]


Maybe add a comment that 'type' in the dict is for the schema type and not any language type hinting? Seeing it might be confusing without context right next to a type ignore call.

MSeal · 2025-10-21T23:24:04Z

src/confluent_kafka/schema_registry/_async/avro.py

            def field_transformer(rule_ctx, field_transform, msg): return (  # noqa: E731
                transform(rule_ctx, parsed_schema, msg, field_transform))
-            value = self._execute_rules(ctx, subject, RuleMode.WRITE, None,
+            value = self._execute_rules(ctx, subject, RuleMode.WRITE, None,  # type: ignore[arg-type]


I think we should spread this function args out per line if we're type commenting one. It read weird like this and I think is error prone to refactor bugs

MSeal · 2025-10-21T23:24:34Z

src/confluent_kafka/schema_registry/_async/avro.py

            buffer = fo.getvalue()

-            if latest_schema is not None:
+            if latest_schema is not None and ctx is not None and subject is not None:


This changed the logic here. Are we certain it's a correct / tested change?

_execute_rules_with_phase() already requires SerializationContext and subject to build RuleContext (

confluent-kafka-python/src/confluent_kafka/schema_registry/_async/serde.py

Line 70 in f026e72

def _execute_rules_with_phase(

)

… entrypoint init files

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

sonarqube-confluent · 2025-10-23T04:43:23Z

75.50% Coverage on New Code (is less than 80.00%)

Analysis Details

42 Issues

0 Bugs
0 Vulnerabilities
42 Code Smells

Coverage and Duplications

75.50% Coverage (67.00% Estimated after merge)
No duplication information (5.00% Estimated after merge)

Project ID: confluent-kafka-python

View in SonarQube

fangnx · 2025-10-23T20:26:30Z

src/confluent_kafka/schema_registry/__init__.py

+        ValueError: If ctx is None.
+
    """
+    if ctx is None:


I refactored the name_strategy functions a bit: they have to follow the same function signature but SerializationContext is only required for some

fangnx · 2025-10-24T16:57:18Z

src/confluent_kafka/schema_registry/_async/schema_registry_client.py

-                self.bearer_field_provider = _StaticFieldProvider(static_token, logical_cluster, identity_pool)
-                if not isinstance(static_token, string_type):
-                    raise TypeError("bearer.auth.token must be a str, not " + str(type(static_token)))
+                if self.bearer_auth_credentials_source == 'OAUTHBEARER':


Moving the indentation is intended to address the type issue of logical_cluster and identity_pool for building _AsyncOAuthClient: they must be non-empty, which we already check in line 280 and 284

This doesn't affect code logic: we check self.bearer_auth_credentials_source in {'OAUTHBEARER', 'STATIC_TOKEN'} for the outer block, and those if-else branches are for OAUTHBEARER and STATIC_TOKEN respectively

fangnx · 2025-10-24T19:20:14Z

src/confluent_kafka/schema_registry/common/json_schema.py



 def get_type(schema: JsonSchema) -> FieldType:
-    if isinstance(schema, list):


JsonSchema union type is either bool or dict. This was likely coped from avro.py

Copilot

Pull Request Overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-24T19:36:17Z

src/confluent_kafka/schema_registry/_async/schema_registry_client.py

            await self.generate_access_token()
-
+        if self.token is None:
+            raise ValueError("Token is not set after the at")


Corrected incomplete error message from 'after the at' to 'after the attempt to generate it'.

Suggested change

raise ValueError("Token is not set after the at")

raise ValueError("Token is not set after the attempt to generate it")

Copilot · 2025-10-24T19:36:18Z

src/confluent_kafka/schema_registry/_sync/schema_registry_client.py

+                        self.token_endpoint, logical_cluster, identity_pool,
+                        self.max_retries, self.retries_wait_ms,
+                        self.retries_max_wait_ms)
+                else: # STATIC_TOKEN


The elif condition check has been replaced with else without verifying all possible values. This assumes only 'OAUTHBEARER' and 'STATIC_TOKEN' are valid values for bearer_auth_credentials_source. Consider adding an explicit elif condition for 'STATIC_TOKEN' to make the code more maintainable and guard against unexpected values.

It's already the outer block if self.bearer_auth_credentials_source in {'OAUTHBEARER', 'STATIC_TOKEN'}:

Copilot · 2025-10-24T19:36:18Z

src/confluent_kafka/schema_registry/_async/schema_registry_client.py

+                        self.token_endpoint, logical_cluster, identity_pool,
+                        self.max_retries, self.retries_wait_ms,
+                        self.retries_max_wait_ms)
+                else: # STATIC_TOKEN


The elif condition check has been replaced with else without verifying all possible values. This assumes only 'OAUTHBEARER' and 'STATIC_TOKEN' are valid values for bearer_auth_credentials_source. Consider adding an explicit elif condition for 'STATIC_TOKEN' to make the code more maintainable and guard against unexpected values.

It's already the outer block if self.bearer_auth_credentials_source in {'OAUTHBEARER', 'STATIC_TOKEN'}: (line 273)

Copilot · 2025-10-24T19:36:18Z

src/confluent_kafka/schema_registry/_sync/schema_registry_client.py

-        if subject_name is not None:
-            query['subject'] = subject_name
+        query: dict[str, Any] = {'offset': offset, 'limit': limit}
+        if subject_name is not None:            query['subject'] = subject_name


The indentation is incorrect on line 867. The 'query['subject'] = subject_name' statement should be on its own line with proper indentation.

Suggested change

if subject_name is not None: query['subject'] = subject_name

if subject_name is not None:

query['subject'] = subject_name

Copilot · 2025-10-24T19:36:19Z

src/confluent_kafka/schema_registry/_async/schema_registry_client.py

-        if subject_name is not None:
-            query['subject'] = subject_name
+        query: dict[str, Any] = {'offset': offset, 'limit': limit}
+        if subject_name is not None:            query['subject'] = subject_name


The indentation is incorrect on line 868. The 'query['subject'] = subject_name' statement should be on its own line with proper indentation.

Suggested change

if subject_name is not None: query['subject'] = subject_name

if subject_name is not None:

query['subject'] = subject_name

Copilot · 2025-10-24T19:36:19Z

src/confluent_kafka/admin/_group.py

+    def __init__(self, topic_partitions: List[TopicPartition] = []) -> None:
        self.topic_partitions = topic_partitions or []


Using a mutable default argument (empty list) is dangerous as it will be shared across all instances. Use None as default and initialize inside the function instead.

Suggested change

def __init__(self, topic_partitions: List[TopicPartition] = []) -> None:

self.topic_partitions = topic_partitions or []

def __init__(self, topic_partitions: Optional[List[TopicPartition]] = None) -> None:

self.topic_partitions = topic_partitions if topic_partitions is not None else []

Copilot · 2025-10-24T19:36:19Z

src/confluent_kafka/schema_registry/_async/avro.py

+            inline_tags = get_inline_tags(reader_schema) if reader_schema is not None else None
+            obj_dict = self._execute_rules(ctx, subject, RuleMode.READ, None,
+                                           reader_schema_raw, obj_dict,
+                                           inline_tags,field_transformer)


Missing space after comma between 'inline_tags' and 'field_transformer' arguments.

Suggested change

inline_tags,field_transformer)

inline_tags, field_transformer)

Copilot

Pull Request Overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-30T17:17:02Z

src/confluent_kafka/schema_registry/rules/encryption/encrypt_executor.py

+            if dek is None or dek.version is None:
+                new_version = 1
+            else:
+                new_version = dek.version + 1 if is_expired else 1


The logic for calculating new_version is confusing. When is_expired is False, new_version should be 1, but when is_expired is True, it should be dek.version + 1. The current conditional expression has this backwards - it increments when is_expired is True but sets to 1 when False. Consider: new_version = (dek.version + 1) if is_expired else 1

Copilot · 2025-10-30T17:17:03Z

src/confluent_kafka/schema_registry/common/json_schema.py

+    if isinstance(schema, bool):
        return FieldType.COMBINED
-    elif isinstance(schema, dict):
-        schema_type = schema.get("type")
-    else:
-        # string schemas; this could be either a named schema or a primitive type
-        schema_type = schema

+    schema_type = schema.get("type")


Calling .get() on a boolean value will raise AttributeError. After checking isinstance(schema, bool), the function should return immediately before attempting to call schema.get(). The early return is missing on line 213.

Copilot · 2025-10-30T17:17:03Z

src/confluent_kafka/schema_registry/__init__.py

+    if ctx is None:
+        raise ValueError(
+            "SerializationContext is required for topic_subject_name_strategy. "
+            "Either provide a SerializationContext or use record_subject_name_strategy."
+        )


Adding a ValueError when ctx is None changes the API behavior. Previously, these functions accepted Optional[SerializationContext], and None would result in an AttributeError at line 94. The new explicit check is clearer, but this is a breaking change that could affect existing code that catches AttributeError. Consider documenting this as a breaking change.

Copilot · 2025-10-30T17:17:03Z

src/confluent_kafka/schema_registry/_sync/schema_registry_client.py

            if not isinstance(cache_capacity, (int, float)):
                raise TypeError("cache.capacity must be a number, not " + str(type(cache_capacity)))
-            self.cache_capacity = cache_capacity
+            self.cache_capacity = int(cache_capacity)


[nitpick] Converting float to int with int() truncates the decimal. If cache_capacity is 100.9, it becomes 100. Consider using round() instead for more intuitive behavior, or document that fractional values are truncated.

Suggested change

self.cache_capacity = int(cache_capacity)

self.cache_capacity = round(cache_capacity)

Copilot · 2025-10-30T17:17:04Z

src/confluent_kafka/experimental/aio/producer/_kafka_batch_executor.py

+    def _handle_partial_failures(
+        self,
+        batch_messages: List[Dict[str, Any]]
+    ) -> None:


[nitpick] This appears to be a pure formatting change (moving parameters to separate lines) unrelated to type hinting. Such formatting changes should typically be in a separate commit to keep type hinting changes focused.

Suggested change

def _handle_partial_failures(

self,

batch_messages: List[Dict[str, Any]]

) -> None:

def _handle_partial_failures(self, batch_messages: List[Dict[str, Any]]) -> None:

This comment has been minimized.

Sign in to view

fangnx changed the title ~~WIP: add type-related fixes in schema-registry module~~ Set up type hinting: add fixes in schema-registry module (mostly already typed) Oct 20, 2025

fangnx marked this pull request as ready for review October 20, 2025 22:15

fangnx requested review from a team and MSeal as code owners October 20, 2025 22:15

This comment has been minimized.

Sign in to view

airlock-confluentinc bot force-pushed the typehinting-sr-fix branch from 54fab2a to a5ff8aa Compare October 21, 2025 18:55

fangnx commented Oct 21, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

fangnx commented Oct 21, 2025

View reviewed changes

MSeal requested changes Oct 21, 2025

View reviewed changes

Base automatically changed from typehinting-kafka to master October 22, 2025 22:06

fangnx added 14 commits October 22, 2025 23:35

update

c50656f

remove py.typed for now

343dbfe

update

1e7f2ac

fix cimply and add types to serde producer/consumer

69ca6b8

admin

d789244

address feedback

0653fe0

add warning to stub and c files; admin typing more

77c4965

add accidentally removed md files

cfa723f

fix merge conflicts in md files, add types to admin and serialization…

1abe216

… entrypoint init files

finish admin init

8f789a0

add types for AIO module

9c99020

linter fix

a387ea7

address mypy complaints

c1e2f91

revert some accidental doc change

8bdb0af

fangnx added 6 commits October 22, 2025 23:46

fix some suggestions by copilot

26694e6

linter

c7865d8

fix

791c4ad

resolve conflict

ffb118e

encryption clients

d536a71

fix

cdbd203

Copilot AI review requested due to automatic review settings October 23, 2025 04:15

airlock-confluentinc bot force-pushed the typehinting-sr-fix branch from a5ff8aa to cdbd203 Compare October 23, 2025 04:15

Copilot AI reviewed Oct 23, 2025

View reviewed changes

revert incorrect merge conflict changes

6fa8730

This comment has been minimized.

Sign in to view

fangnx added 2 commits October 23, 2025 16:15

fix many things

485532f

more fixes in non sr modules

6262a73

fangnx commented Oct 23, 2025

View reviewed changes

fangnx added 3 commits October 23, 2025 19:27

type encrypt_executor.py

5ffe301

more typeignore removals

0f51247

update

b4bf42c

fangnx commented Oct 24, 2025

View reviewed changes

handle union types in schemas

1451647

fangnx commented Oct 24, 2025

View reviewed changes

a bit more

5e718d6

fangnx requested review from MSeal and Copilot October 24, 2025 19:34

Copilot AI reviewed Oct 24, 2025

View reviewed changes

revert some bad changes during merge, address copilot comments

0ee5103

fangnx requested a review from Copilot October 30, 2025 17:15

minor

4026889

Copilot AI reviewed Oct 30, 2025

View reviewed changes



		def get_type(schema: JsonSchema) -> FieldType:
		if isinstance(schema, list):

	raise ValueError("Token is not set after the at")
	raise ValueError("Token is not set after the attempt to generate it")

	if subject_name is not None: query['subject'] = subject_name
	if subject_name is not None:
	query['subject'] = subject_name

		def __init__(self, topic_partitions: List[TopicPartition] = []) -> None:
		self.topic_partitions = topic_partitions or []

	inline_tags,field_transformer)
	inline_tags, field_transformer)

	self.cache_capacity = int(cache_capacity)
	self.cache_capacity = round(cache_capacity)

Set up type hinting: add fixes in schema-registry module (mostly already typed) #2107

Are you sure you want to change the base?

Set up type hinting: add fixes in schema-registry module (mostly already typed) #2107

Uh oh!

Conversation

fangnx commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Checklist

References

Test & Review

Open questions / Follow-ups

Uh oh!

confluent-cla-assistant bot commented Oct 17, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

MSeal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

sonarqube-confluent bot commented Oct 23, 2025

Analysis Details

42 Issues

Coverage and Duplications

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

fangnx commented Oct 17, 2025 •

edited

Loading