Scaling AIOKafkaProducer with FastAPI: High Response Times at 20+ RPS & Connection Management #1120
Replies: 1 comment 1 reply
-
@slava-poddubsky Do you have some code snippets to share ? In my company, we have a similar setup and we never noticed a slowdown on high RPS. In term of asyncio, the overall idea is to have a single thread, runnning on a single CPU, to scale using concurrency instead of parallelism: I/O are treated in an async manner, so they don't "block" a thread, while you try to use as much as CPU possible. Keep in mind that it is concurrent and not in parallel. Once you are reaching a limit on your process (basically using a CPU almost completely), you just spawn multiple processes that you load balance too (but it should not be a python thing, more provided by your infrastructure) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem Description
We are operating a FastAPI web server that acts as a proxy, receiving incoming HTTP requests and forwarding the request body (along with some metadata) to a Kafka topic using AIOKafkaProducer.
Our current setup involves:
We observe that when the incoming request rate exceeds approximately 20 requests per second (RPS), the average response time from our FastAPI server begins to increase significantly, impacting overall throughput and latency.
Our Hypothesis
We suspect that the AIOKafkaProducer might be a bottleneck. Our understanding is that a single instance of AIOKafkaProducer maintains only one TCP connection per Kafka broker. Given this, we are questioning whether this single connection is sufficient to handle a sustained high volume of messages (20+ RPS) efficiently, potentially leading to queuing or backpressure within the producer itself.
Our Question
Given this scenario, we have two primary questions regarding scaling:
Connection Pooling: Should we implement a custom connection pooling mechanism on top of AIOKafkaProducer to manage multiple connections to the same Kafka broker from within a single FastAPI application instance? Is this a recommended pattern for high-throughput scenarios, or would it conflict with aiokafka's internal connection management?
Uvicorn Workers: Is the recommended approach simply to scale horizontally by increasing the number of Uvicorn workers (e.g., using gunicorn --workers N ...)? Our understanding is that each Uvicorn worker would run its own independent FastAPI application instance, and thus its own AIOKafkaProducer instance, which would then establish its own set of connections to Kafka brokers. Would this be the most effective way to achieve higher throughput and utilize multiple CPU cores?
Additional Context
The Kafka interaction itself is primarily I/O-bound (sending messages) with minimal in-flight CPU-intensive processing before sending.
We're particularly interested if the linger_ms=0 default plays a significant role in this bottleneck, given that each message is effectively sent immediately.
Any guidance or best practices on how to effectively scale AIOKafkaProducer for high-RPS asynchronous web services would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions