Skip to content

Memory leak in Inngest long running workers #347

@vaibhav-46

Description

@vaibhav-46

PydanticSerializer creates new TypeAdapter on every serialize/deserialize call, causing memory leak in long-running workers

Summary

PydanticSerializer in serializer_lib.py creates a new pydantic.TypeAdapter instance on every call to serialize() and deserialize(). Each TypeAdapter instantiation triggers Pydantic's core schema generation, which registers type metadata in Pydantic's internal global registry that is never freed. In long-running connect workers processing many functions, this causes a steady memory leak (~22 MB/hour in our production environment).

Environment

  • inngest version: 0.5.18 (with inngest[connect])
  • pydantic version: 2.11.7
  • pydantic-core version: 2.33.2
  • Python version: 3.11.15
  • Deployment: 5 connect workers running on Kubernetes, processing background jobs continuously

The problem

In inngest/_internal/serializer_lib.py:

class PydanticSerializer(Serializer):
    def serialize(self, obj: object, typ: object) -> object:
        adapter = pydantic.TypeAdapter(object)        # new instance every call
        return adapter.dump_python(obj, mode="json")

    def deserialize(self, obj: object, typ: object) -> object:
        adapter = pydantic.TypeAdapter[object](typ)   # new instance every call
        return adapter.validate_python(obj)

TypeAdapter.__init__ is not a lightweight operation. It triggers Pydantic's full core schema generation pipeline, which:

  1. Builds a core schema representation for the type
  2. Creates SchemaValidator and SchemaSerializer objects (Rust-backed via pydantic-core)
  3. Registers type metadata (FieldInfo, ModelMetaclass, MockValSer, etc.) in Pydantic's internal type registry
  4. Creates associated Python objects: type metaclasses, dicts, functions, set/frozensets, ReferenceType weakrefs

The key issue is that Pydantic's internal type registry holds strong references to the generated schema objects. Even though the adapter local variable goes out of scope, the validators, serializers, and type metadata created during schema generation are retained for the lifetime of the process. They are never garbage collected.

This is called from the step execution path — every step.run(), step.invoke(), and function return triggers serialize/deserialize, so a single function with N steps creates ~2N TypeAdapter instances.

Reproduction

Minimal script demonstrating the leak:

import gc
import os
import pydantic

def get_rss_mb():
    """Get current RSS in MB from /proc (Linux) or resource module."""
    try:
        with open(f"/proc/{os.getpid()}/status") as f:
            for line in f:
                if line.startswith("VmRSS:"):
                    return int(line.split()[1]) / 1024
    except FileNotFoundError:
        import resource
        return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024

gc.collect()
baseline_rss = get_rss_mb()
baseline_objects = len(gc.get_objects())

# Simulate what PydanticSerializer does on every call
for i in range(10_000):
    adapter = pydantic.TypeAdapter(object)
    adapter.dump_python({"key": "value"}, mode="json")

gc.collect()
final_rss = get_rss_mb()
final_objects = len(gc.get_objects())

print(f"RSS:        {baseline_rss:.1f} MB -> {final_rss:.1f} MB (+ {final_rss - baseline_rss:.1f} MB)")
print(f"GC objects: {baseline_objects:,} -> {final_objects:,} (+ {final_objects - baseline_objects:,})")

Production data

We profiled a live production connect worker (5,529 execution requests in 47 minutes on one pod) by injecting GC audit scripts via gdb and comparing two snapshots:

GC object growth in 47 minutes (single pod):

Object type Delta What it is
type +156 New validator/serializer type objects
ModelMetaclass +82 Pydantic model metaclasses
FieldInfo +270 Pydantic field descriptors
MockValSer +158 Pydantic mock validators/serializers
ReferenceType +3,860 Weakrefs to all the above
dict +1,094 Module/class __dict__s
function +906 Methods on new classes

Total GC-tracked growth: 1.89 MB in 47 minutes (2.4 MB/hour)
Total RSS growth: ~22 MB/hour (the gap is CPython arena fragmentation from the constant alloc/free churn)

Over 72 hours without a deploy, individual pods climb from ~700 MB to ~1.5 GB RSS. Pods have a 2 GiB memory limit. Currently relying on periodic deploys as an accidental memory pressure relief valve.

Suggested fix

Cache TypeAdapter instances since they are deterministic and reusable for a given type:

from functools import lru_cache

import pydantic


@lru_cache(maxsize=256)
def _get_type_adapter(typ: type) -> pydantic.TypeAdapter:  # type: ignore[type-arg]
    return pydantic.TypeAdapter(typ)


class PydanticSerializer(Serializer):
    def serialize(self, obj: object, typ: object) -> object:
        adapter = _get_type_adapter(object)
        return adapter.dump_python(obj, mode="json")

    def deserialize(self, obj: object, typ: object) -> object:
        adapter = _get_type_adapter(typ)
        return adapter.validate_python(obj)

This is safe because:

  • TypeAdapter is stateless after construction — dump_python() and validate_python() are pure functions of their input
  • Python types are hashable, so lru_cache works directly
  • serialize() always passes object as the type, so there's exactly 1 cached adapter for it
  • deserialize() passes function output types, which are a small fixed set per application
  • maxsize=256 is more than sufficient and bounds memory usage

An alternative approach would be to instantiate the adapters once in __init__ for the common case (object type), and lazily cache others.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions