Orchestrate Prosody using a separate container #236

RemiBardon · 2025-04-22T14:22:46Z

RemiBardon
Apr 22, 2025
Maintainer

Context

Architecture of a Prose Pod

Currently, Prose Pods are made of three parts:

Prose Pod Server, the XMPP server, running Prosody
Prose Pod API, a HTTP ReST API, used to configure prose-pod-server
Prose Pod Dashboard, a web app, used to make calls to prose-pod-api

Prosody is configured using a file on disk, and its data is persisted in different formats as files in a directory.

For security reasons, Prosody data files are only accessible to the Prose Pod Server, and the Prose Pod API has no way to access it. The only thing from a Prose Pod Server that the Prose Pod API has access to is the directory containing configuration files (/etc/prosody) — which could already be considered too much since it contains SSL certificates. For reasons detailed in ADR: Interact with Prosody using a REST API, the Prose Pod API uses a Prosody module exposing a HTTP ReST API to perform actions in Prosody (update team members, reload the configuration file…).

Problems encountered with the current architecture

Cumbersome bootstrapping (which is planned to be redesigned)

This architecture posed a lot of challenges since the very beginning, and continues posing more and more of them. The bootstrapping process¹ already shows limitations² which will be cumbersome for users in production environments, forcing a redesign in the near future. We haven’t though about a new design, but keeping the current architecture we would probably end up with a very bad design.

Factory reset that can leak data

As we introduce higher-level operations like performing a factory reset (#130), we really hit technical limitations:

Factory resets the way we had implemented them don’t work with SQL backends in Prosody (which we wanted to switch to because our benchmark showed a significant improvement), because Prosody doesn’t recreate the database an its connection on prosodyctl reload
A prosodyctl reload could leak data that’s kept in memory (e.g. caches)
A restart of Prosody could leak data even after a full wipe because Prosody might write data from memory to disk when stopping
We had to implement a LifecycleManager to allow the API to reload itself at runtime, which adds a ton of unnecessary complexity
When the API reloads at runtime, caches stored in static variables (e.g. MemberService caches) are not emptied, which could leak data

Backup & restore which would need to be implemented in Lua

Because the API doesn’t have access to Prosody data, backup (#131) and restore (#132) features (which still need to be designed) would need to be implemented in Prosody, in Lua. We are not very experienced in Lua and creating a tarball/zip in Lua then stream it to the API would likely be easy to get wrong.

Another consequence of writing it in Prosody is that we’d have to make a new release of the Server every time we work on the backup or restore features in the API. The Prose Pod Server could be used as a standalone XMPP server, therefore it seems incorrect to make releases just for work on Prose Pod API features.

Prosody storage migration which would add complexity in the API code

The Prose Pod API already supports creating a Prose Pod Server with different storage backends in Prosody. However, once the choice has been made, one cannot change this or it would break their Prose Pod. Prosody has a migration tool called Prosody Migrator, but adding support for it would add a ton of logic in the API code, therefore adding complexity in our day-to-day work. This logic could at least be hidden away.

Proposal: a new container used for orchestration

We create a new Docker image called prose-pod-orchestrator or something like that (though IMO we should avoid prose-pod-ctl in case we create a CLI one day), which would act as an orchestrator for all other parts of the Prose Pod. It wouldn’t expose any port externally, and would be called exclusively by the Prose Pod API. Its role would be to perform lifecycle operations (start, restart, migrate DB…) on other parts of the Prose Pod, and synchronize all of it.

Note: This proposal was sparked by MattJ saying:

For Snikket hosting there is an instance management API which manages stuff at the container level (starting, stopping, migrating) etc.

It's entirely separate to any APIs provided by the instance itself

Cumbersome bootstrapping: fixed

It could, at startup, generate the bootstrapping configuration for Prosody, then start both Prosody and the API. This would greatly simplify the administration of a Prose Pod.

Data leaks on factory resets: fixed

By completely killing Server and API containers, it would ensure no in-memory data can leak, while simplifying the code in both projects.

This would allow also us to use SQLite as Prosody's default backend, improving performance significantly for some operations.

Backups & restore: easy

Since it wouldn’t be accessible via the outside World like the Prose Pod API is, and its feature set would be very limited, we could mount Prosody’s data directory into it and let it handle backup and restore features. Since it would be written in a safer language than Lua (very likely in Rust), performing those operations would be a lot easier to get right and maintain. We could even easily write tests to ensure backups can be restored³!

Prosody storage migrations: easy

We could wrap all migration logic in the orchestrator, allowing it to stop the API while performing migrations on the Server, or even do migrations on the API’s database which aren’t currently supported.

Prosody-specific code in the API: still there partly, but won’t grow too much

Since it would need to generate bootstrapping configurations, and would very likely be written in Rust, we could reuse the prosody-config crate in the orchestrator. However, the API would still have a ton of Prosody-related code, because it needs to send stanzas via mod_rest and interact with mod_admin_rest and mod_http_oauth2. It wouldn’t make sense to use the orchestrator for all of this so we’ll have to keep it in the API unfortunately.

Consequences

TODO

Notes

Since Prosŏdy IM Chatroom messages aren’t persisted, here is an archive of the conversation which sparked this idea (bold style added by me for readability):

Prosŏdy IM Chatroom archive (click to expand)

Rémi (me):

Hello! In a module, what is the idiomatic way to run some code "asynchronously"?

I have a function that deletes all data on the server, then reloads Prosody (via prosodyctl.reload()) and returns a HTTP response. Unfortunately, I'm now using SQLite as the default storage and the storagemanager doesn’t recreate the database on reload. I would like to replace prosodyctl.reload() by prosodyctl.stop() and let Docker (/systemd) restart Prosody, but then the HTTP response might not have enough time to be sent.

Should I emit a custom event, and listen to this event to fire a prosodyctl.stop()? Is there a better way?

MattJ:

Is the HTTP response supposed to be sent before Prosody is stopped or after Prosody is stopped? :)

Rémi (me):

Well, since it’s Prosody that sends it… I don’t really have a choice. It must be before

MattJ:

Then send it and stop Prosody?

Zash:

Feels like a bit of layering violation to have Prosody manage itself like that?

MattJ:

There is stuff that Prosody would write to storage as part of a shutdown, for example

Rémi (me):

Then send it and stop Prosody?

I thought mod_http needed the response to be the handle function’s result, but it doesn’t have to, I overlooked. My code is based on a 1300-lines 2012-old unmaintained module which I didn’t spend time to rewrite so sometimes I miss things like this 😬

Feels like a bit of layering violation to have Prosody manage itself like that?

Yes it does, but I have to manage it from another Docker container and I can’t access prosodyctl from there + I don’t want to mount the whole filesystem so I just expose a "Fatory reset" HTTP endpoint.

There is stuff that Prosody would write to storage as part of a shutdown, for example

I saw mod_graceful_shutdown which could help but in my case I really don’t care and want to restart fresh. (Potentially breaking connections too)

MattJ:

Sure, but I'm saying that if you wipe the data directory while Prosody is running, there is the chance it may not be empty after Prosody stops

You would have to stop Prosody, then wipe it, then start Prosody

Rémi (me):

I was hoping I got you wrong 😬

Then can I kill -9? 😅

MattJ:

*shrug*

I agree with Zash that this API design feels a bit wrong, but if it's a hard requirement for you then you do what you can do

Rémi (me):

Well… I don’t really see any other option

I had searched and tested other ideas but none worked or they would have been very unsafe (like mounting all of Prosody’s data dir in the other container)

("had searched" = a year ago so my memory isn’t very fresh)

MattJ:

Since it's a container, and you want a clean slate, why not just destroy the container?

Rémi (me):

(Thankfully I had documented the architecture decision in https://github.com/prose-im/prose-pod-api/blob/71d7b3c1c9d7c533b9aec8090b2e7268da4940a8/docs/ADRs/2024-04-04-a-prosody-rest-api.md)

MattJ:

I don't know your architecture, but anything less than that runs some risk of persisting something you didn't mean to persist

This is one of the reasons that containers exist, so you can throw them away and start again when you want to

Rémi (me):

Yes exactly… I went for the simplest yet effective-enough solution at the time, but with a SQL backend it doesn’t work anymore

Jonas’:

Rémi, I skimmed the document and there's nothing in there about factory resets?

Rémi (me):

destroy the container

I'll try to find how I can do that, from one container to the other

Jonas: Yes, the feature was only planned at the time, I implemented the factory reset recently. The ADR was more about the interaction between the two containers

MattJ:

Anyway, if you continue down the current path, I propose you do some cleanup before shutdown, such as deactivating all the hosts (just run a loop over prosody.hosts and hostmanager.deactivate() each one)

Jonas’:

to me, the safest would seem to rm -rf the storage directory, then kill -9 prosody.

all while blocking prosody

*the safest that can be done from within prosody

the best is obviously to do this from *outside* the container.

hm, or maybe you can do something to use a different data directory on the next startup

MattJ:

But this is not support of the design, just a way to reduce one aspect of its ugliness (basically performing a partial shutdown before the data wipe)

Zash:

stop container, delete data volume, ???

Jonas’:

like changing data_path based on some environment variable or something

Rémi (me):

if you continue down the current path

I will try the cleaner route, but if it’s too complicated I'll go the kill -9 route (with prior cleanup)

Zash: Well, that data volume is still a bit of a problem in the "destroy the container" route:

If I kill the container and let it restart, the data is still there

If I delete the data via a Prosody module then kill Prosody, Prosody might still write into it

The only working solution I see is: delete the data via a Prosody module then kill the container and let it restart

MattJ:

For Snikket hosting there is an instance management API which manages stuff at the container level (starting, stopping, migrating) etc.

It's entirely separate to any APIs provided by the instance itself

Rémi (me):

That’s an idea, yes!

Well it would require a lot of architectural changes but that could do it (having 3 containers: Prosody, my HTTP API and an orchestrator)

Rémi (me):

Thank you for all your feedback! I will keep you informed about what path I go for (hopefully the cleanest and safest one 🤞🏻)

MattJ:

Good luck :)

Starting Prosody alongside the API with shared credentials and configuration to allow the first interaction between the API and the Server at startup ↩
The bootstrapping configuration file is mounted in both the API and the Server, but the API cannot update it before Prosody starts to add the module it requires, so if it’s incorrect then the Prose Pod is unusable. We will change some configuration over time, to add new administration modules or change the way we configure them (before the first startup), and this will break all clients and force them to update the file manually. It really is bad UX and should be avoided. ↩
Would it be more complicated otherwise? Not entirely sure, but it would with this architecture. ↩

RemiBardon · 2025-09-11T03:21:46Z

RemiBardon
Sep 11, 2025
Maintainer Author

@valeriansaliou I'm wrapping up a few things but after I'm done I'll move on to this. While it might seem a bit overkill just to make and restore backups, it really is a far cleaner solution. Not doing this would probably cause as much work in the long run (maybe even short term, I'm sure it's a can of worms).

Conceptually, are you okay with it?

0 replies

valeriansaliou · 2025-09-11T05:12:30Z

valeriansaliou
Sep 11, 2025
Maintainer

Sounds good

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Orchestrate Prosody using a separate container #236

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Orchestrate Prosody using a separate container #236

Uh oh!

Uh oh!

RemiBardon Apr 22, 2025 Maintainer

Context

Architecture of a Prose Pod

Problems encountered with the current architecture

Cumbersome bootstrapping (which is planned to be redesigned)

Factory reset that can leak data

Backup & restore which would need to be implemented in Lua

Prosody storage migration which would add complexity in the API code

Proposal: a new container used for orchestration

Cumbersome bootstrapping: fixed

Data leaks on factory resets: fixed

Backups & restore: easy

Prosody storage migrations: easy

Prosody-specific code in the API: still there partly, but won’t grow too much

Consequences

Notes

Footnotes

Replies: 2 comments

Uh oh!

RemiBardon Sep 11, 2025 Maintainer Author

Uh oh!

valeriansaliou Sep 11, 2025 Maintainer

RemiBardon
Apr 22, 2025
Maintainer

RemiBardon
Sep 11, 2025
Maintainer Author

valeriansaliou
Sep 11, 2025
Maintainer