PydanticSerializer creates new TypeAdapter on every serialize/deserialize call, causing memory leak in long-running workers
Summary
PydanticSerializer in serializer_lib.py creates a new pydantic.TypeAdapter instance on every call to serialize() and deserialize(). Each TypeAdapter instantiation triggers Pydantic's core schema generation, which registers type metadata in Pydantic's internal global registry that is never freed. In long-running connect workers processing many functions, this causes a steady memory leak (~22 MB/hour in our production environment).
Environment
inngest version: 0.5.18 (with inngest[connect])
pydantic version: 2.11.7
pydantic-core version: 2.33.2
- Python version: 3.11.15
- Deployment: 5 connect workers running on Kubernetes, processing background jobs continuously
The problem
In inngest/_internal/serializer_lib.py:
class PydanticSerializer(Serializer):
def serialize(self, obj: object, typ: object) -> object:
adapter = pydantic.TypeAdapter(object) # new instance every call
return adapter.dump_python(obj, mode="json")
def deserialize(self, obj: object, typ: object) -> object:
adapter = pydantic.TypeAdapter[object](typ) # new instance every call
return adapter.validate_python(obj)
TypeAdapter.__init__ is not a lightweight operation. It triggers Pydantic's full core schema generation pipeline, which:
- Builds a core schema representation for the type
- Creates
SchemaValidator and SchemaSerializer objects (Rust-backed via pydantic-core)
- Registers type metadata (
FieldInfo, ModelMetaclass, MockValSer, etc.) in Pydantic's internal type registry
- Creates associated Python objects:
type metaclasses, dicts, functions, set/frozensets, ReferenceType weakrefs
The key issue is that Pydantic's internal type registry holds strong references to the generated schema objects. Even though the adapter local variable goes out of scope, the validators, serializers, and type metadata created during schema generation are retained for the lifetime of the process. They are never garbage collected.
This is called from the step execution path — every step.run(), step.invoke(), and function return triggers serialize/deserialize, so a single function with N steps creates ~2N TypeAdapter instances.
Reproduction
Minimal script demonstrating the leak:
import gc
import os
import pydantic
def get_rss_mb():
"""Get current RSS in MB from /proc (Linux) or resource module."""
try:
with open(f"/proc/{os.getpid()}/status") as f:
for line in f:
if line.startswith("VmRSS:"):
return int(line.split()[1]) / 1024
except FileNotFoundError:
import resource
return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024
gc.collect()
baseline_rss = get_rss_mb()
baseline_objects = len(gc.get_objects())
# Simulate what PydanticSerializer does on every call
for i in range(10_000):
adapter = pydantic.TypeAdapter(object)
adapter.dump_python({"key": "value"}, mode="json")
gc.collect()
final_rss = get_rss_mb()
final_objects = len(gc.get_objects())
print(f"RSS: {baseline_rss:.1f} MB -> {final_rss:.1f} MB (+ {final_rss - baseline_rss:.1f} MB)")
print(f"GC objects: {baseline_objects:,} -> {final_objects:,} (+ {final_objects - baseline_objects:,})")
Production data
We profiled a live production connect worker (5,529 execution requests in 47 minutes on one pod) by injecting GC audit scripts via gdb and comparing two snapshots:
GC object growth in 47 minutes (single pod):
| Object type |
Delta |
What it is |
type |
+156 |
New validator/serializer type objects |
ModelMetaclass |
+82 |
Pydantic model metaclasses |
FieldInfo |
+270 |
Pydantic field descriptors |
MockValSer |
+158 |
Pydantic mock validators/serializers |
ReferenceType |
+3,860 |
Weakrefs to all the above |
dict |
+1,094 |
Module/class __dict__s |
function |
+906 |
Methods on new classes |
Total GC-tracked growth: 1.89 MB in 47 minutes (2.4 MB/hour)
Total RSS growth: ~22 MB/hour (the gap is CPython arena fragmentation from the constant alloc/free churn)
Over 72 hours without a deploy, individual pods climb from ~700 MB to ~1.5 GB RSS. Pods have a 2 GiB memory limit. Currently relying on periodic deploys as an accidental memory pressure relief valve.
Suggested fix
Cache TypeAdapter instances since they are deterministic and reusable for a given type:
from functools import lru_cache
import pydantic
@lru_cache(maxsize=256)
def _get_type_adapter(typ: type) -> pydantic.TypeAdapter: # type: ignore[type-arg]
return pydantic.TypeAdapter(typ)
class PydanticSerializer(Serializer):
def serialize(self, obj: object, typ: object) -> object:
adapter = _get_type_adapter(object)
return adapter.dump_python(obj, mode="json")
def deserialize(self, obj: object, typ: object) -> object:
adapter = _get_type_adapter(typ)
return adapter.validate_python(obj)
This is safe because:
TypeAdapter is stateless after construction — dump_python() and validate_python() are pure functions of their input
- Python types are hashable, so
lru_cache works directly
serialize() always passes object as the type, so there's exactly 1 cached adapter for it
deserialize() passes function output types, which are a small fixed set per application
maxsize=256 is more than sufficient and bounds memory usage
An alternative approach would be to instantiate the adapters once in __init__ for the common case (object type), and lazily cache others.
PydanticSerializercreates newTypeAdapteron every serialize/deserialize call, causing memory leak in long-running workersSummary
PydanticSerializerinserializer_lib.pycreates a newpydantic.TypeAdapterinstance on every call toserialize()anddeserialize(). EachTypeAdapterinstantiation triggers Pydantic's core schema generation, which registers type metadata in Pydantic's internal global registry that is never freed. In long-running connect workers processing many functions, this causes a steady memory leak (~22 MB/hour in our production environment).Environment
inngestversion: 0.5.18 (withinngest[connect])pydanticversion: 2.11.7pydantic-coreversion: 2.33.2The problem
In
inngest/_internal/serializer_lib.py:TypeAdapter.__init__is not a lightweight operation. It triggers Pydantic's full core schema generation pipeline, which:SchemaValidatorandSchemaSerializerobjects (Rust-backed via pydantic-core)FieldInfo,ModelMetaclass,MockValSer, etc.) in Pydantic's internal type registrytypemetaclasses,dicts,functions,set/frozensets,ReferenceTypeweakrefsThe key issue is that Pydantic's internal type registry holds strong references to the generated schema objects. Even though the
adapterlocal variable goes out of scope, the validators, serializers, and type metadata created during schema generation are retained for the lifetime of the process. They are never garbage collected.This is called from the step execution path — every
step.run(),step.invoke(), and function return triggers serialize/deserialize, so a single function with N steps creates ~2NTypeAdapterinstances.Reproduction
Minimal script demonstrating the leak:
Production data
We profiled a live production connect worker (5,529 execution requests in 47 minutes on one pod) by injecting GC audit scripts via
gdband comparing two snapshots:GC object growth in 47 minutes (single pod):
typeModelMetaclassFieldInfoMockValSerReferenceTypedict__dict__sfunctionTotal GC-tracked growth: 1.89 MB in 47 minutes (2.4 MB/hour)
Total RSS growth: ~22 MB/hour (the gap is CPython arena fragmentation from the constant alloc/free churn)
Over 72 hours without a deploy, individual pods climb from ~700 MB to ~1.5 GB RSS. Pods have a 2 GiB memory limit. Currently relying on periodic deploys as an accidental memory pressure relief valve.
Suggested fix
Cache
TypeAdapterinstances since they are deterministic and reusable for a given type:This is safe because:
TypeAdapteris stateless after construction —dump_python()andvalidate_python()are pure functions of their inputlru_cacheworks directlyserialize()always passesobjectas the type, so there's exactly 1 cached adapter for itdeserialize()passes function output types, which are a small fixed set per applicationmaxsize=256is more than sufficient and bounds memory usageAn alternative approach would be to instantiate the adapters once in
__init__for the common case (objecttype), and lazily cache others.