Performance problems with Django app #782

langesven · 2026-01-07T12:38:23Z

langesven
Jan 7, 2026

Hi! Since you reached out, figured I'd take you up on this 😄

So basically we're right now running on https://github.com/nginx/unit but that project has been archived and even prior to archiving has been a bit unmaintained so there are a few things on master that never made it to releases. Other than that we're pretty happy with the performance we've been getting out of it.
We have a Django 4 stack running on unit and are using it in ASGI mode with 4 workers per pod (~1 CPU, 4GB RAM). We'd like to upgrade to Django 5 and for that we need to move to a different webserver due to some issues in unit. Naturally Granian came up as a candidate so we've spun up some tests around it.

We started with 4 workers simply because that's what we had before. Then based on reading in your docs ended up going down to 1 worker and instead scaling horizontally, so x4 our pods. We've experimented with runtime threads a bit, ultimately landed on 1 or 2 threads. Also probably read most of the discussions in this repo that revolve around Django, performance, etc 😅

1 worker @ 4 threads bumped the p95 by a lot so we reverted that

We have disabled keep-alive to get a more even distribution of requests from the reverse proxy, this has helped, but did not bring down p99 by a large amount.

Right now we have it running in production on a 90%/10% traffic split between old and new for one application. The issue we're running into is that we have a slightly elevated p95 and quite elevated p99 latency with the traffic that's hitting Granian compared to unit.

unit, 90% traffic, 7 days

Granian, 10% traffic, 7 days

And that's basically where we are sort of stuck right now. We've tried a few settings to see if we could get this down but it hasn't made "click" yet what we might be missing here.

We experimented with backlog/backpressure values (128, 1) and that made it slightly better, but not by a huge amount.
We've since worked out that part of the visual difference in latency is definitely also due to metric bucketing in the histogram. Based on logs we know it's not as bad as it looks, but Granian is definitely a bit slower for this app we're experimenting with and we're trying to figure out if we can do something about it or have to accept that it is what it is.

We think Djangos sync views are probably our main limiting factor (sync_to_async(thread_sensitive=True))

If you have any suggestions for what we can try or what else you'd need to know that'd be much appreciated 🙏

gi0baro · 2026-01-07T14:03:06Z

gi0baro
Jan 7, 2026
Maintainer

Hey!

Based on your description, here are some "thoughts":

if you don't use uvloop and you're willing to use it, it generally helps a lot Granian on async protocols (compared to other servers, Granian needs to schedule work on Python event loop from other threads, and the stdlib implementation is quite slow in scheduling such calls)
I don't think disabling keep-alive on the reverse proxy is a good idea, that generally increase response time in a sensible way
if you're trying to reduce concurrency and better balance connections across workers, I would suggest to use backpressure instead
on async protocols such as ASGI I wouldn't go that low though, but stay more in the 8-16 range. With these values a backlog in the 128-256 range should be fine
runtime threads appears to be quite application related; with Django I'd use 1 or 2, these are the values that showed best results in Sentry
I don't know which reverse proxy you use on top, but if it let you (like Nginx does), setting a max number of keepalived connections slightly lower than the granian backpressure * workers should also help in balancing connections

Also, if you have room for that, trying with 2x the workers and everything else unchanged might provide better results (we have a couple of Django services in Sentry where we use 8 workers per pod). That's what I can think of at the moment, let me know if you have further details to add.

0 replies

langesven · 2026-01-07T16:56:10Z

langesven
Jan 7, 2026
Author

Thanks, let me dig into this a bit and try some stuff!

We are using uvloop as the loop for Granian - this is our full call currently

granian \
        --interface asgi \
        posthog.asgi:application \
        --workers 1 \
        --runtime-threads 2 \
        --no-http1-keep-alive \
        --loop uvloop \
        --host 0.0.0.0 \
        --port 8000 \
        --log-level warning \
        --access-log \
        --respawn-failed-workers

Would that satisfy the uvloop approach or do we also need to "install" it as the loop policy/handler to use within the code? I found the bit in the uvloop wiki that says for gunicorn if you tell it to use uvloop it takes care of making sure it's the loop being used, is Granian doing the same or are we making some wrong assumptions?

I switched to 2 workers in one pod, kept settings at 2 threads, and re-enabled the keep-alive (--workers 2 --runtime-threads 2 --http1-keep-alive) and yeah this is already looking like it's developing in the right direction! We had tried 4/4 before and that wasn't great, so definitely feels like 2 threads might be the sweet spot here.

Went down quite a bit, went back up, but is staying below where it was before 😄

Backlog 128 / Backpressure 8 = not a good idea as it turns out (with 1024/512 being the default right now)

Will continue trying some things tomorrow!

2 replies

gi0baro Jan 8, 2026
Maintainer

Would that satisfy the uvloop approach or do we also need to "install" it as the loop policy/handler to use within the code? I found the bit in the uvloop wiki that says for gunicorn if you tell it to use uvloop it takes care of making sure it's the loop being used, is Granian doing the same or are we making some wrong assumptions?

No, that's correct. If uvloop is installed then Granian will use it.

I switched to 2 workers in one pod, kept settings at 2 threads, and re-enabled the keep-alive (--workers 2 --runtime-threads 2 --http1-keep-alive) and yeah this is already looking like it's developing in the right direction! We had tried 4/4 before and that wasn't great, so definitely feels like 2 threads might be the sweet spot here.

Ah, I thought you were already set to 4 workers.
If you had sized things with 4 interpreters before, then I would keep 4 workers. All the scaling should be pretty similar, there's no particular reason to change all of that.
So, I would definitely go 4/2 or 4/1 with workers and runtime threads.
Also, with Django --runtime-mode mt is usually better.

Backlog 128 / Backpressure 8 = not a good idea as it turns out (with 1024/512 being the default right now)

My bad, again I thought you were still at 128/1. With 4 workers, I would just set the backlog in the 256-512 range, and leave Granian to configure backpressure accordingly.

langesven Jan 9, 2026
Author

Gotcha, thanks for the write-up 😄 Yeah sorry we've been trying a lot of things over the past few weeks and going through different configs, I should've stated more clearly what settings we're currently on!

As of today we're using 4 workers, 2 threads, runtime-mode mt, keep-alive enabled with no explicit backlog/backpressure configuration and it's looking pretty good.

So compared to before that dropped the p99 difference between both servers from about 400ms down to less than 100ms on average. So I'm already super happy with that, thank you! 🙇‍♂️

I will experiment with backlog values on Monday to see if that makes a difference and maybe for the fun of it see where we land if we make it beefier and running 8 workers in one pod

langesven · 2026-01-15T16:39:30Z

langesven
Jan 15, 2026
Author

I tried this on a larger scale today and unfortunately the latency thing is still an issue and I don't feel like I really know what will fix it.

I think it's possible to guess when the config was active and when I rolled it back 😅

Comparison of roughly the same timeframe yesterday on nginx unit

Some services had a surprisingly low amount of connections from the envoy ingress, which strikes me as a bit odd.

I've experimented with GRANIAN_BACKLOG values down to 256 and it did sort of feel like it sometimes made a difference, but then it would spike up again with no obvious reason, so that might've really just been random luck. I did not have the impression that what I did had an immediate effect that I could correlate to the changing of the backlog value, if that makes sense.
The top of the peaks are usually rollouts with either new versions or config changes so they are sort of "expected" here, but the overall jumpiness of the p99 latency and in general highly elevated p95/p99 is not expected and I'd love to get this into a more stable state.

Pods have the same resource configuration as they had on nginx-unit, they have plenty of room to breathe I'd say (using about 30% of requested CPU and 50% of requested memory). Both webservers are running 4 Python interpreters, so this is pretty similar as well.

So I guess what I'm trying to ask is, does any of this stand out to you in any way? I can keep randomly trying things, but I'd love to have at least a glimpse of an idea what the issue might be 😄 does this scream "scale out horizontally to a crazy amount of pods" (low connection count from the ingress pods??), add more workers to each pod (sort of similar to scaling out at the end), fiddle with other config options?

1 reply

gi0baro Jan 15, 2026
Maintainer

I suspect backpressure is preventing envoy to open all the required connections.
Would you be able to set backpressure to a very high value (eg: 10k) and see if that changes anything?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance problems with Django app #782

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Performance problems with Django app #782

Uh oh!

langesven Jan 7, 2026

Replies: 3 comments · 3 replies

Uh oh!

gi0baro Jan 7, 2026 Maintainer

Uh oh!

langesven Jan 7, 2026 Author

Uh oh!

gi0baro Jan 8, 2026 Maintainer

Uh oh!

langesven Jan 9, 2026 Author

Uh oh!

langesven Jan 15, 2026 Author

Uh oh!

gi0baro Jan 15, 2026 Maintainer

langesven
Jan 7, 2026

Replies: 3 comments 3 replies

gi0baro
Jan 7, 2026
Maintainer

langesven
Jan 7, 2026
Author

gi0baro Jan 8, 2026
Maintainer

langesven Jan 9, 2026
Author

langesven
Jan 15, 2026
Author

gi0baro Jan 15, 2026
Maintainer