15,000 user architecture #8470

aleibovici · 2025-07-15T10:31:35Z

aleibovici
Jul 15, 2025

This is a loaded question, but what high-level deployment architecture with high availability would you consider to design a solution for 15,000 users? Of course, not all users will be simultaneous, but let's assume a 15% peak concurrent use. Has anyone tried this kind of load yet? We would need MongoDB and VectorDB replicas. Also, the API service would need to be redundant and fronted by load balancers. The RAG service should be deployed in clusters. Finally, GPUs should also be behind load balancers. Anyway, any directions on the required architecture would be helpful.

danny-avila · 2025-07-15T15:09:08Z

danny-avila
Jul 15, 2025
Maintainer

A large org. has just successfully deployed LibreChat. I've asked if they would share some of their findings to bring in more confidence about scalability, as I would like to write a blog post about some of these details. I'm optimistic they will share soon!

1 reply

aitp-john Aug 5, 2025

That would be a great tech talk/webinar too.

nhtruong · 2025-07-22T23:18:52Z

nhtruong
Jul 22, 2025

We're looking to deploy LC into a cluster of backend servers right now in anticipation of future features that might boggle our current single-node setup. A horizontally scaled setup like that is also better for availability. We have thousands of users daily and don't want to impact the user experience after the multi-server deployment. So, we proceeding with caution and studying the LC codebase looking for any potential issues regarding this matter. Maybe this is a good place for discussion :-)

MongoDB and Redis can be independently scaled and are not concerned of LC. The caching mechanism within the codebase is our focus to make LC ready for horizontal scaling and zero-downtime deployment.

✅ Preventing cross-deployment contamination of Redis cache

In our zero-downtime deployment scheme, two different deployments can coexist for a short period of time: once the new deployment is ready, traffic is drained from the old deployment and routed to the new one. The two deployments are hooked to the same MongoDB and, in the near future, to the same Redis cluster. Having the new deployment using the old cache from the previous deployment is dangerous but we can't just wipe everything in the shared Redis cluster either due to the zero-downtime constrain.

We solved this by using the deployment ID as the global prefix for Redis via REDIS_KEY_PREFIX_VAR.

❌ Assuring consistent behaviors through shared cache

The tools cache stores the list of all tools available in LC, including the MCP tools. There's a bit of a complication: There's a small chance that the servers have different sets of tools due to the fact that they can get different results from the MCP servers during instantiation. With a shared Redis cache, we're dealing with a race condition where all servers try to update this value during boot time at once. Having all servers hitting all MCP servers simultaneously is not ideal either. This task of setting up the cache should be handled by one server only:
- The servers elect a leader among themselves (How this election works is its own discussion).
- Only the leader updates the tools cache on Redis.

Relating to the tools cache above, each backend instance establishes its own list of MCP connections during startup. The shared tools cache might have MCP tools of servers not connected to certain backend instances, causing requests using these MCP tools through these instances to fail.

LibreChat/packages/api/src/mcp/manager.ts

Lines 21 to 22 in 259224d

    
           /** App-level connections initialized at startup */ 
        
           private connections: Map<string, MCPConnection> = new Map();

❌ Static but frequently accessed cache should be in memory

The customConfig cache simply stores the parsed content of librechat.yaml. This cache, however, is used in almost every API request, resulting in a lot of round-trips to the Redis server to retrieve a static object. This degrades the app's performance comparing to a single-server non-redis setup. It also drives up cost and a lot of bandwidth is wasted transferring the unmutated data for every API call. We're trying to solve this with the forced in-memory feature to avoid impacting LC deployment but I truly believe that this cache should always lives in-memory regardless of your setup.
The roles cache stores the RBAC info and is called upon several times per API request for authentication purpose. It's very rarely updated but accessed extremely frequently. So, storing it in Redis is also a performance and cost concern. The forced in-memory feature above is only suitable for deployments that almost never update the roles (A redeployment is required when the roles are updated). To account for the rare occasion that the roles are update, we should utilize Redis pub/sub feature:
- When the servers bootup, they fetch the roles from MongoDb and store them in-memory and subscribe to a redis event for roles updates.
- When a server updates the roles in MongoDB, it publishes an event to Redis, alerting other servers of the change. All servers then refresh their in-memory cache for roles.

6 replies

nhtruong Jul 28, 2025

To give you a bit of context of the scale that we're operating with AI, including LibreChat: We have one or two MCP servers being added to liberchat.yaml every week. The tools lists on these servers are also constantly changing. We also have an MCP discovery service that we also hook LibreChat to (with our own custom code).

So, having initializeMCPs being called only during startup is a bit of an issue for us. We have to redeploy to refresh the tools list when asked (Thankfully it's a zero-downtime deployment). So I'm taking that into account when solving the inconsistent MCP connections pool problem (2nd bullet point in the long comment above), too. Here's how I imagine the MCP connections will be managed in the context of a cluster of backend servers:

When the servers boot up, one is selected as the leader.
This leader is responsible for periodically updating the GLOBAL tools list and the MCP Servers list - both of which are stored on Redis.
All servers, including the leader, establish connections to the MCP servers on-demand, in the lazy loading fashion: The connection doesn't exist until the first request for that MCP server is required on that LC backend server. Each server will have their own list of live connections in memory (this cannot be shared). These connections can expire when not used by anyone for awhile.
We will turn on sticky-sessions on our cloud infra for the LC deployment to make sure that the users are not bounced from one server to another, to reduce the number of duplicate connections across servers, especially the user-specific connections.

danny-avila Aug 7, 2025
Maintainer

@nhtruong on "Static but frequently accessed cache should be in memory" regarding the custom config, I think we should still allow the admin to decide whether this is served by redis or not. In some deployments, redis lives on the same server as the main server, but maybe I can make it a default value for the forced in-memory feature, if none provided?

Just bringing this up since I will be adding a way to serve different configs based on user role to begin with, but later I can envision this extending to user groups.

I understand there might be memory issues with more and more configs to distribute, so I'm starting with roles to keep it small before figuring out a solution there.

Also, there will be value in updating the config on the fly, so pub/sub for this is something I can envision, too. It would be cool to see a pub/sub feature without the use of Redis, too.

Adding to the discussion here since I'm touching related area. For now, I will be working on serving different configs based on user role, and consolidating more custom config logic so it's easier to maintain.

nhtruong Aug 7, 2025

@danny-avila The Force In-memory feature we added earlier will allow us to skip Redis for these static cache (especially the app config from librechat.yaml). There are 2 use cases of Redis that I see:

Sharing cache across different servers of the same service (or even of diff services esp when pub/sub is involved)
Offloading memory usage to Redis so it can be scale more effectively and cheaply
Reliability: when the server is down the cache is still alive on Redis as Redis is much more reliable than the vast majority of web servers.

So for a static config like librechat.yaml, Redis use cases don't apply. But again. with force in memory feature, it's no longer an issue for horizontal scaling :)

Edit: I meant 3 Redis use cases.....There are 3 kinds of people: those who can count and those who can't lol

nhtruong Aug 15, 2025

Hey @danny-avila,

Thanks for reviewing and merging the MCPManager refactoring PR. I did my best to ensure that the refactoring won't cause any issues for LibreChat, but mistakes happen. Please don't hesitate to point them out when you see them; I'll address them as fast as I can.

Regarding your point about updating the MCP configs on the fly, that's a great idea, and we at Shopify will need such a feature soon. The MCPServersRegistry from the aforementioned refactoring would be a great place for this. I have a few ideas myself, but I also need to discuss this with people who work on features that will need it. I'll create a separate discussion for this next week.

The dynamic configs will definitely need to reside on Redis for our setup so that all of our backend servers have the same copy of the config list. However, the LibreChat MCP package, where MCPServersRegistry resides, can't import the cache objects from the backend app, as it will lead to a cyclic dependency. We could inject the cache objects into MCPManager and then pass them to MCPServersRegistry, but that's just a duct-tape solution.

A better solution that I'm considering is moving the caching logic into its own package. This would allow all components of the backend app to interact with the cache stores without having to pass them around. This will also lead to better separation of concerns. For example, this bit of code in an MCP route handles tools discovery and caching for a user's connection. This logic should reside in MCPManager/MCPServersRegistry, but we can't move it there yet because only the backend app can access the cache. As a bonus, moving that logic out of the router also makes it a lot easier to write tests for!

Let me know what you think of this approach. I'd be happy to do the work once we agree on a solution.

danny-avila Aug 15, 2025
Maintainer

that makes sense to me, I can envision other benefits having the caching in its own package!

Lucas-almma · 2025-08-05T03:20:20Z

Lucas-almma
Aug 5, 2025

Hey Theo, Congrats on deploying LibreChat to such a large user base, that's really cool! 👍 I work with enterprises on user adoption strategies, as AI can be quite technical for some non-technical users. I would love to hear about how you're approaching user training for all those users. We've gathered some interesting patterns from other deployments that might be helpful, and I'd be happy to share these insights. Happy to take this to DMs if you'd prefer a more detailed discussion. Cheers, Lucas

On Tue, Jul 29, 2025 at 1:07 AM Theo N. Truong ***@***.***> wrote: To give you a bit of context of the scale that we're operating with AI, including LibreChat: We have one or two MCP servers being added to liberchat.yaml every week. The tools lists on these servers are also constantly changing. We also have an MCP discovery service that we also hook LibreChat to (with our own custom code). So, having initializeMCPs being called only during startup is a bit of an issue for us. We have to redeploy to refresh the tools list when asked (Thankfully it's a zero-downtime deployment). So I'm taking that into account when solving the inconsistent MCP connections pool problem (2nd bullet point in the long comment above), too. Here's how I imagine the MCP connections will be managed in the context of a cluster of backend servers: 1. When the servers boot up, one is selected as the leader. 2. This leader is responsible for periodically updating the GLOBAL tools list and the MCP Servers list - both of which are stored on Redis. 3. All servers, including the leader, establish connections to the MCP servers on-demand, in the lazy loading fashion: The connection doesn't exist until the first request for that MCP server is required on that LC backend server. Each server will have their own list of live connections in memory (this cannot be shared). These connections can expire when not used by anyone for awhile. 4. We will turn on sticky-sessions on our cloud infra for the LC deployment to make sure that the users are not bounced from one server to another, to reduce the number of duplicate connections across servers, especially the user-specific connections. — Reply to this email directly, view it on GitHub <#8470 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BGFKMSVYOZY6EHR7A4WC4DT3K2NJ3AVCNFSM6AAAAACBRRK4D6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGOJRGUYTOMA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

ᐧ

0 replies

Uh oh!

15,000 user architecture #8470

Uh oh!

aleibovici Jul 15, 2025

Replies: 3 comments · 7 replies

Uh oh!

danny-avila Jul 15, 2025 Maintainer

Uh oh!

aitp-john Aug 5, 2025

Uh oh!

Uh oh!

nhtruong Jul 22, 2025

✅ Preventing cross-deployment contamination of Redis cache

❌ Assuring consistent behaviors through shared cache

❌ Static but frequently accessed cache should be in memory

Uh oh!

nhtruong Jul 28, 2025

Uh oh!

danny-avila Aug 7, 2025 Maintainer

Uh oh!

Uh oh!

nhtruong Aug 7, 2025

Uh oh!

Uh oh!

nhtruong Aug 15, 2025

Uh oh!

danny-avila Aug 15, 2025 Maintainer

Uh oh!

Lucas-almma Aug 5, 2025

aleibovici
Jul 15, 2025

Replies: 3 comments 7 replies

danny-avila
Jul 15, 2025
Maintainer

nhtruong
Jul 22, 2025

danny-avila Aug 7, 2025
Maintainer

danny-avila Aug 15, 2025
Maintainer

Lucas-almma
Aug 5, 2025