Skip to content

Conversation

jackgene
Copy link
Contributor

Changes

Adds type stubs to classes that clients will interact with, and py.typed so that type checkers know type information is available.

Fixes #980 (kind of)

Per providing type annotations, there are several ways to distribute type information:

  • inline type annotations (preferred)
  • type stub files included in the package
  • a separate companion type stub package
  • type stubs in the typeshed repository

This is the second option, and does not touch any of the .py files. If you prefer to go with the first option (preferred, but makes changes to .py files), please have a look at #1074 instead. That PR also includes additional details on how type information was added.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

@jackgene
Copy link
Contributor Author

These type stubs are being used in a couple of my projects:

@jackgene jackgene force-pushed the feature/stub-typing-for-clients branch from edf33a5 to f2fe1eb Compare April 15, 2025 04:36
@shuckc
Copy link

shuckc commented Apr 15, 2025

I tried these stubs against Python 3.9.21 by copying *.pyi to my stubs directory. However even with from __future__ import annotations present I get the following errors from mypy:

% MYPYPATH=stubs mypy --strict feedhandler
...
stubs/aiokafka/producer/producer.pyi:29: error: Incompatible default for argument "key_serializer" (default has type "Callable[[bytes], bytes]", argument has type "Callable[[KT_contra], bytes]")  [assignment]
stubs/aiokafka/producer/producer.pyi:30: error: Incompatible default for argument "value_serializer" (default has type "Callable[[bytes], bytes]", argument has type "Callable[[VT_contra], bytes]")  [assignment] 

I've tried different python and mypy versions and the cause seems to be using --strict argument on mypy also checks the stubs using strict mode.

As a workaround, this seems to work:

KT_contra = TypeVar("KT_contra", contravariant=True)
VT_contra = TypeVar("VT_contra", contravariant=True)
ET = TypeVar("ET", bound=BaseException)

def _identity_kt(data: KT_contra) -> bytes: ...
def _identity_vt(data: VT_contra) -> bytes: ...

class AIOKafkaProducer(Generic[KT_contra, VT_contra]):
    def __init__(
        self,
        ...
        key_serializer: Callable[[KT_contra], bytes] = _identity_kt,
        value_serializer: Callable[[VT_contra], bytes] = _identity_vt,
        compression_type: (
        ...


__version__ = ...
__all__ = [
"AIOKafkaConsumer",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level __init__.pyi should import and expose AIOKafkaProducer in __all__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think I started by typing the consumer and left this out initially, and forgot to put it back in. Fixed.

@jackgene
Copy link
Contributor Author

jackgene commented Apr 16, 2025

I tried these stubs against Python 3.9.21 by copying *.pyi to my stubs directory. However even with from __future__ import annotations present I get the following errors from mypy:

% MYPYPATH=stubs mypy --strict feedhandler
...
stubs/aiokafka/producer/producer.pyi:29: error: Incompatible default for argument "key_serializer" (default has type "Callable[[bytes], bytes]", argument has type "Callable[[KT_contra], bytes]")  [assignment]
stubs/aiokafka/producer/producer.pyi:30: error: Incompatible default for argument "value_serializer" (default has type "Callable[[bytes], bytes]", argument has type "Callable[[VT_contra], bytes]")  [assignment] 

I've tried different python and mypy versions and the cause seems to be using --strict argument on mypy also checks the stubs using strict mode.

As a workaround, this seems to work:

KT_contra = TypeVar("KT_contra", contravariant=True)
VT_contra = TypeVar("VT_contra", contravariant=True)
ET = TypeVar("ET", bound=BaseException)

def _identity_kt(data: KT_contra) -> bytes: ...
def _identity_vt(data: VT_contra) -> bytes: ...

class AIOKafkaProducer(Generic[KT_contra, VT_contra]):
    def __init__(
        self,
        ...
        key_serializer: Callable[[KT_contra], bytes] = _identity_kt,
        value_serializer: Callable[[VT_contra], bytes] = _identity_vt,
        compression_type: (
        ...

Thanks for identifying this. I use PyRight as my type checker, as MyPy doesn't seem to understand types under certain circumstances. But I understand that that's the main type checker people use, and your workaround looks like a great solution.

I'm going to just make sure it doesn't upset PyRight (I don't think it will), and update it shortly.

@jackgene
Copy link
Contributor Author

I tried these stubs against Python 3.9.21 by copying *.pyi to my stubs directory. However even with from __future__ import annotations present I get the following errors from mypy:

% MYPYPATH=stubs mypy --strict feedhandler
...
stubs/aiokafka/producer/producer.pyi:29: error: Incompatible default for argument "key_serializer" (default has type "Callable[[bytes], bytes]", argument has type "Callable[[KT_contra], bytes]")  [assignment]
stubs/aiokafka/producer/producer.pyi:30: error: Incompatible default for argument "value_serializer" (default has type "Callable[[bytes], bytes]", argument has type "Callable[[VT_contra], bytes]")  [assignment] 

I've tried different python and mypy versions and the cause seems to be using --strict argument on mypy also checks the stubs using strict mode.

As a workaround, this seems to work:

KT_contra = TypeVar("KT_contra", contravariant=True)
VT_contra = TypeVar("VT_contra", contravariant=True)
ET = TypeVar("ET", bound=BaseException)

def _identity_kt(data: KT_contra) -> bytes: ...
def _identity_vt(data: VT_contra) -> bytes: ...

class AIOKafkaProducer(Generic[KT_contra, VT_contra]):
    def __init__(
        self,
        ...
        key_serializer: Callable[[KT_contra], bytes] = _identity_kt,
        value_serializer: Callable[[VT_contra], bytes] = _identity_vt,
        compression_type: (
        ...

Unfortunately, PyRight did not like this workaround. PyRight was smart enough to realize the type variable wasn't actually needed in the identity functions, and rejected it:

/Users/jack/Developer/3rdParty/aiokafka/aiokafka/producer/producer.pyi
  /Users/jack/Developer/3rdParty/aiokafka/aiokafka/producer/producer.pyi:22:24 - error: TypeVar "KT_contra" appears only once in generic function signature
    Use "object" instead (reportInvalidTypeVarUse)
  /Users/jack/Developer/3rdParty/aiokafka/aiokafka/producer/producer.pyi:23:24 - error: TypeVar "VT_contra" appears only once in generic function signature
    Use "object" instead (reportInvalidTypeVarUse)

I also tried casting, but that caused PyRight to not be able to infer the type of AIOKafkaProducer's generic parameters when no key_serializer/value_serializer is provided.

You can probably tell, but what I was trying to do here is to have the default producer be an AIOKafkaProducer[bytes, bytes]:

# Default `key_serializer`'s type is `Callable[[bytes], bytes]`, which means `KT_contra` is `bytes`
# Default `value_serializer`'s type is `Callable[[bytes], bytes]`, which means `VT_contra` is `bytes`
kafka_producer: AIOKafkaProducer[bytes, bytes] = AIOKafkaProducer(
    bootstrap_servers=["..."],
)

You can also explicitly override the type by providing the appropriate serializer functions. I have an application with an AIOKafkaProducer[str, dict[str, str]] for instance.

I'm going to try and spend some time this weekend to see if I can come up with a solution that is sound (and therefore type-checks in PyRight), while also conforming to MyPy's sensibilities.

@jackgene
Copy link
Contributor Author

jackgene commented Apr 20, 2025

For the problem with AIOKafkaConsumer and AIOKafkaProducer generic type parameters, I can't come up with a good solution other than to have MyPy ignore type checking for those. As a last ditch effort, I tried to see if AI could come up with a clever solution, but there simply wasn't a solution (other than perhaps a PR to fix MyPy, but I don't use MyPy):

Prompt:

Why do these lines fail to type-check in mypy, when the types are correct, and they type-check just fine in Pyright?

AI Response:

I'll help you understand why these lines might be failing type-checking in mypy but passing in Pyright. Let me first examine the code more closely to understand the context.

Read file: typings/aiokafka/producer/producer.pyi

Read file: typings/aiokafka/producer/producer.pyi

Read file: typings/aiokafka/producer/producer.pyi
Now I understand the issue. The problem is related to the type compatibility between the _identity function and the expected type for the serializer parameters.

The issue is that mypy and Pyright handle type compatibility differently, particularly with contravariant type variables. Let me explain:

  1. The _identity function is defined as:

    def _identity(data: bytes) -> bytes: ...
  2. But the serializer parameters are defined as:

    key_serializer: Callable[[KT_contra], bytes] = _identity,
    value_serializer: Callable[[VT_contra], bytes] = _identity,
  3. Where KT_contra and VT_contra are contravariant type variables:

    KT_contra = TypeVar("KT_contra", contravariant=True)
    VT_contra = TypeVar("VT_contra", contravariant=True)

The key difference between mypy and Pyright is in how they handle this specific case:

  • Mypy is stricter about contravariant type variables. It sees that _identity expects a bytes parameter, but the type annotation requires a function that can accept any type that is a supertype of KT_contra or VT_contra. Since bytes is not a supertype of all possible types that could be assigned to these type variables, mypy flags this as an error.

  • Pyright is more lenient in this case. It recognizes that _identity is a reasonable default implementation that works with bytes, which is a common use case, and doesn't flag this as an error.

To fix this in mypy, you have a few options:

  1. Use a more generic identity function:

    def _identity(data: Any) -> bytes: ...

    This would make the function accept any type, which satisfies the contravariant constraint.

  2. Use a type cast:

    key_serializer: Callable[[KT_contra], bytes] = cast(Callable[[KT_contra], bytes], _identity),
    value_serializer: Callable[[VT_contra], bytes] = cast(Callable[[VT_contra], bytes], _identity),

    This tells mypy to trust that the types are compatible.

  3. Use a more specific type for the type variables:
    If you know that KT_contra and VT_contra will always be bytes or a subtype of bytes, you could constrain them:

    KT_contra = TypeVar("KT_contra", bound=bytes, contravariant=True)
    VT_contra = TypeVar("VT_contra", bound=bytes, contravariant=True)

The most common solution in type stubs is to use option 2 with cast(), as it's the most explicit about the intention while maintaining the flexibility of the type variables.

Would you like me to implement one of these solutions for you?

Prompt:

No, because that (and as I understand it, all other solutions you provided) would break Pyright's ability to infer the type of KT_contra and VT_contra unless a key_serializer and value_serializer is explicitly passed in.

AI Response:

You're absolutely right, and I apologize for not considering that important aspect. Let me reconsider the problem more carefully.

The key insight here is that you want to maintain Pyright's ability to infer the types of KT_contra and VT_contra from the default serializers, which is a valuable feature. The solutions I proposed would indeed break this inference capability.

Let's analyze why this is happening:

  1. In Pyright, when you use _identity as the default value for the serializers, it can infer that KT_contra and VT_contra should be bytes because _identity accepts and returns bytes.

  2. In mypy, the contravariant type variables are causing issues because mypy is being stricter about type compatibility with contravariant type variables.

This is a classic case of different type checkers having different interpretations of the type system, particularly around variance.

A better approach might be to use a type ignore comment specifically for mypy, while keeping the code that works well with Pyright:

key_serializer: Callable[[KT_contra], bytes] = _identity,  # type: ignore
value_serializer: Callable[[VT_contra], bytes] = _identity,  # type: ignore

This would allow Pyright to continue inferring the types correctly while silencing the mypy errors.

Alternatively, if you want to avoid type ignore comments, you could consider using a more specific approach that satisfies both type checkers:

# Define a more generic identity function that satisfies both type checkers
def _generic_identity(data: Any) -> bytes:
    return data if isinstance(data, bytes) else bytes(data)

# Then use it in the class
key_serializer: Callable[[KT_contra], bytes] = _generic_identity,
value_serializer: Callable[[VT_contra], bytes] = _generic_identity,

However, this might still affect Pyright's ability to infer the types correctly.

Given your requirement to maintain Pyright's type inference, the type ignore comment might be the most pragmatic solution. It allows you to keep the code that works well with Pyright while silencing the mypy errors.

Would you like me to suggest any other approaches that might work for both type checkers without breaking Pyright's inference capabilities?

@jackgene
Copy link
Contributor Author

Finally, to ensure the use of contra-variance is appropriate here:

Prompt:

Is the use of contravariant type parameters in the AIOKafkaProducer correct? Or should the type parameters be invariant? Why?

AI Response:

That's an excellent question about the variance of the type parameters in AIOKafkaProducer. Let me analyze whether contravariant type parameters are appropriate here.

In the current implementation, AIOKafkaProducer is defined with contravariant type parameters:

KT_contra = TypeVar("KT_contra", contravariant=True)
VT_contra = TypeVar("VT_contra", contravariant=True)

class AIOKafkaProducer(Generic[KT_contra, VT_contra]):
    # ...

Let's examine whether this is correct by understanding variance in the context of this class:

Understanding Variance in Type Parameters

  1. Contravariant (-): If A is a subtype of B, then Producer[B] is a subtype of Producer[A]. This is appropriate when the type parameter is used only in input positions (parameters).

  2. Covariant (+): If A is a subtype of B, then Producer[A] is a subtype of Producer[B]. This is appropriate when the type parameter is used only in output positions (return values).

  3. Invariant: No variance relationship exists. This is the default and is appropriate when the type parameter is used in both input and output positions.

Analysis of AIOKafkaProducer

Let's look at how KT_contra and VT_contra are used in the class:

  1. They are used in the send method parameters:

    async def send(
        self,
        topic: str,
        value: VT_contra | None = ...,
        key: KT_contra | None = ...,
        # ...
    )
  2. They are used in the serializer type annotations:

    key_serializer: Callable[[KT_contra], bytes] = _identity,
    value_serializer: Callable[[VT_contra], bytes] = _identity,
  3. They are used in the send_and_wait method parameters (similar to send).

  4. They are used in the send_batch method.

The key insight is that these type parameters are used in input positions - they represent the types of data that can be sent to Kafka. The producer accepts these types as input but doesn't return them.

Is Contravariance Correct?

Yes, contravariance is correct for this use case. Here's why:

  1. If we have a producer that can handle str keys and values (AIOKafkaProducer[str, str]), it should be able to handle any supertype of str (like object). This means AIOKafkaProducer[object, object] should be assignable to AIOKafkaProducer[str, str].

  2. This matches the contravariant relationship: if str is a subtype of object, then AIOKafkaProducer[object, object] is a subtype of AIOKafkaProducer[str, str].

  3. The type parameters are only used in input positions (parameters to methods and callbacks), which is exactly when contravariance is appropriate.

Why Not Invariant?

Using invariant type parameters would be too restrictive. It would mean that a producer that can handle str keys and values could only handle exactly str keys and values, not any supertype. This would make the API less flexible and harder to use.

Conclusion

The use of contravariant type parameters in AIOKafkaProducer is correct and well-designed. It allows for the flexibility needed in a Kafka producer where you want to be able to use a more general producer in place of a more specific one.

This is a good example of proper use of variance in generic types, and it's why the type checker issues you're experiencing are likely due to differences in how mypy and Pyright handle contravariant type variables, rather than a design issue with the type parameters themselves.

@shuckc
Copy link

shuckc commented Apr 24, 2025

Could we make key_serializer and value_serializer Optional, defaulted to None, and build the default bytes/bytes serialiser inside of AIOKafkaProducer, when the type bindings are known. Ie. rather than infer KT_contra be explicit?

@jackgene
Copy link
Contributor Author

Could we make key_serializer and value_serializer Optional, defaulted to None, and build the default bytes/bytes serialiser inside of AIOKafkaProducer, when the type bindings are known. Ie. rather than infer KT_contra be explicit?

That's how the serializers were originally implemented, so it's definitely possible at the cost of type safety.

But the purpose of this PR was to improve type safety, and really it has. The problem is that there is a bug in mypy when dealing with generic types and default arguments.

Having said that, I'm not opposed to "being explicit rather than inferring the type", and if we wish to do that, we can simply omit the default arguments, and require mypy users to explicitly provide them. However, that would be a breaking change (which was something else I was hoping to prevent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal to Add Type Hints
2 participants