Scaling AIOKafkaProducer with FastAPI: High Response Times at 20+ RPS & Connection Management #1120

slava-poddubsky · 2025-08-01T08:08:59Z

slava-poddubsky
Aug 1, 2025

Problem Description

We are operating a FastAPI web server that acts as a proxy, receiving incoming HTTP requests and forwarding the request body (along with some metadata) to a Kafka topic using AIOKafkaProducer.

Our current setup involves:

FastAPI (asynchronous web framework)
Uvicorn (ASGI server)
aiokafka (for Kafka interaction)

We observe that when the incoming request rate exceeds approximately 20 requests per second (RPS), the average response time from our FastAPI server begins to increase significantly, impacting overall throughput and latency.

Our Hypothesis
We suspect that the AIOKafkaProducer might be a bottleneck. Our understanding is that a single instance of AIOKafkaProducer maintains only one TCP connection per Kafka broker. Given this, we are questioning whether this single connection is sufficient to handle a sustained high volume of messages (20+ RPS) efficiently, potentially leading to queuing or backpressure within the producer itself.

Our Question
Given this scenario, we have two primary questions regarding scaling:

Connection Pooling: Should we implement a custom connection pooling mechanism on top of AIOKafkaProducer to manage multiple connections to the same Kafka broker from within a single FastAPI application instance? Is this a recommended pattern for high-throughput scenarios, or would it conflict with aiokafka's internal connection management?

Uvicorn Workers: Is the recommended approach simply to scale horizontally by increasing the number of Uvicorn workers (e.g., using gunicorn --workers N ...)? Our understanding is that each Uvicorn worker would run its own independent FastAPI application instance, and thus its own AIOKafkaProducer instance, which would then establish its own set of connections to Kafka brokers. Would this be the most effective way to achieve higher throughput and utilize multiple CPU cores?

Additional Context

The Kafka interaction itself is primarily I/O-bound (sending messages) with minimal in-flight CPU-intensive processing before sending.

We're particularly interested if the linger_ms=0 default plays a significant role in this bottleneck, given that each message is effectively sent immediately.

Any guidance or best practices on how to effectively scale AIOKafkaProducer for high-RPS asynchronous web services would be greatly appreciated. Thank you!

vmaurin · 2025-08-01T12:24:34Z

vmaurin
Aug 1, 2025

@slava-poddubsky Do you have some code snippets to share ?

In my company, we have a similar setup and we never noticed a slowdown on high RPS.
AIOKafkaProducer is an abstraction on top of a kafka cluster (so many brokers). When you "send" a message to a topic, it will go in a buffer, per topic/parititon and sent to the broker leader on the topic partition in a separated coroutine. Is linger_ms = 0, the coroutine will try to send it asap, so increasing the linger is more a way to try to "batch" more messages for a given topic partition.

In term of asyncio, the overall idea is to have a single thread, runnning on a single CPU, to scale using concurrency instead of parallelism: I/O are treated in an async manner, so they don't "block" a thread, while you try to use as much as CPU possible. Keep in mind that it is concurrent and not in parallel. Once you are reaching a limit on your process (basically using a CPU almost completely), you just spawn multiple processes that you load balance too (but it should not be a python thing, more provided by your infrastructure)

1 reply

slava-poddubsky Aug 7, 2025
Author

Thank you for the answer!

The code looks like:

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.publisher = AIOKafkaProducer(bootstrap_servers=KAFKA_HOSTS)
    await app.publisher.start()
    yield
    await app.publisher.stop()

APP = FastAPI(lifespan=lifespan)


if __name__ == "__main__":
    uvicorn.run("main:APP", host="0.0.0.0", port=8070, reload=True)

async def register_event_endpoint(
    request: Request,
):
    event_body = await request.body()

    kafka_message = {
        "event_body": event_body.decode("utf8"),
        "event_receive_date": datetime.now(UTC).isoformat(),
    }

    await request.app.publisher.send("test_topic_name", json.dumps(kafka_message).encode())
    return {"status": "ok"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaling AIOKafkaProducer with FastAPI: High Response Times at 20+ RPS & Connection Management #1120

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Scaling AIOKafkaProducer with FastAPI: High Response Times at 20+ RPS & Connection Management #1120

Uh oh!

slava-poddubsky Aug 1, 2025

Replies: 1 comment · 1 reply

Uh oh!

vmaurin Aug 1, 2025

Uh oh!

slava-poddubsky Aug 7, 2025 Author

slava-poddubsky
Aug 1, 2025

Replies: 1 comment 1 reply

vmaurin
Aug 1, 2025

slava-poddubsky Aug 7, 2025
Author