Prevent certificate tree corruption #280

AlexVelezLl · 2026-01-07T22:54:36Z

Summary

When many sync sessions were happening concurrently, there were occasions where two certificates could have the same lft and rght. This was due to some concurrency problems in the internals of MTTP. Wrapping these Certificate creations on the remote server and on the local server (after the remote server's response) solved the problem.

TODO

Have tests been written for the new code?
Has documentation been written/updated?
New dependencies (if any) added to requirements file

Reviewer guidance

Try running multiple concurrent sync requests and check that the tree structure is not corrupted.

Issues addressed

Needed to fix learningequality/kolibri#13821

Documentation

If the PR has documentation, link the file here (either .rst in your repo or if built on Read The Docs)

bjester

This is a fairly targeted fix that should address the specific observed issue well. Although, there are a few more ways and areas that Certificate is vulnerable to this issue. I'm suggesting a few revisions on what you've done so far and also suggest a way we can address this holistically for more complex scenarios.

We should also address this within push_signed_client_certificate_chain which is very similar to certificate_signing_request

bjester · 2026-01-07T23:10:32Z

morango/api/viewsets.py

+            with transaction.atomic():
+                # lock the parent certificate row for update
+                parent = Certificate.objects.select_for_update().get(pk=serialized_cert.validated_data['parent'].pk)
+                serialized_cert.validated_data['parent'] = parent


Like you did in the other case, where you wrapped a smaller amount of code, the same can be done here. The validation and certificate signing takes a non-negligible amount of time, since it performs cryptographic operations. So limiting what's wrapped in this transaction will reduce the time that this is actually blocking.

It should be safe to move your additions and the transaction to wrap only around the certificate.save() where MPTT should be involved.

I also tried that, but for some reason, I was still getting the concurrency errors if I just wrapped the save method in the transaction. Will give it another try and see if there was something else happening around there!

bjester · 2026-01-07T23:55:58Z

morango/sync/syncsession.py

        if not Certificate.objects.filter(id=parent_cert.id).exists():
            cert_chain_response = self._get_certificate_chain(
                params={"ancestors_of": parent_cert.id}
            )

            # upon receiving cert chain from server, we attempt to save the chain into our records
            Certificate.save_certificate_chain(
                cert_chain_response.json(), expected_last_id=parent_cert.id
            )


This also needs similar protection. Although, this is dealing with a situation where the ancestors of the certificate may not exist. Since Kolibri doesn't currently use more than 2 levels of certificates, this shouldn't encounter the same MPTT issue. Nevertheless, with 3 or more, it has potential for it.

The difference here though is that we don't have those records to lock via select_for_update. For SQLite, we can trust that the DB is configured to help us enforce this simply by using a transaction (what Richard recently fixed in Kolibri, although we should document this expectation for Morango). For PostgreSQL, we can use advisory locks to block a transaction using something we'll know to be common between parallel executions. This is something we used on studio. The advisory lock pattern is therefore a bit more flexible and allows us to create a reusable pattern for this particular situation.

We should be able to use the same lock_partitions function that is used for sync locking. We'll just need to lock based off the root certificate.

So this would become something like:

if not Certificate.objects.filter(id=parent_cert.id).exists(): cert_chain_response = self._get_certificate_chain( params={"ancestors_of": parent_cert.id} ) with transaction.atomic(): lock_partitions(backend, sync_filter=cert_chain_response[0]["id"]) # check again, now that we have a lock if not Certificate.objects.filter(id=parent_cert.id).exists(): Certificate.save_certificate_chain( cert_chain_response.json(), expected_last_id=parent_cert.id )

We could use this advisory lock pattern everywhere, too, instead of select_for_update in the case we actually have records. It also has the advantage of protecting more complex certificate structures with 3+ levels, since that opens up more surface area for where the select_for_update lock may not protect the integrity of the tree during concurrent writes across it, but like I said, Kolibri doesn't leverage that currently.

AlexVelezLl · 2026-01-08T17:25:11Z

Thanks @bjester! I have moved to use lock_partitions in all cases instead of select_for_update. How does it look? The only other thing I am not entirely sure about is whether we should lock all Certificate saves? Or should we be calling this lock_mptt context manager outside the Model class?

bjester · 2026-01-08T18:48:02Z

How does it look? The only other thing I am not entirely sure about is whether we should lock all Certificate saves? Or should we be calling this lock_mptt context manager outside the Model class?

It looks pretty great! I think what you did, wrapping all saves, is fine. I think it's fine primarily because certificates are rarely created or updated. Having the lock managed outside of the model class is a little more portable and consistent, but I don't think it's necessary to change. I'm going to try running the morango-integration tests on Kolibri with this change. If it all looks good, I will approve and merge. Thanks!

bjester

Thanks @AlexVelezLl

AlexVelezLl marked this pull request as ready for review January 7, 2026 22:55

bjester requested changes Jan 8, 2026

View reviewed changes

AlexVelezLl force-pushed the prevent-certificate-tree-corruption branch from bb4d936 to 8523ad3 Compare January 8, 2026 17:21

Prevent certificate tree corruption

a9bf52c

AlexVelezLl force-pushed the prevent-certificate-tree-corruption branch from 8523ad3 to a9bf52c Compare January 8, 2026 17:28

Bump version and log changes

5981336

bjester approved these changes Jan 8, 2026

View reviewed changes

bjester merged commit 92278a5 into learningequality:release-v0.8.x Jan 8, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent certificate tree corruption #280

Prevent certificate tree corruption #280

Uh oh!

AlexVelezLl commented Jan 7, 2026

Uh oh!

bjester left a comment

Uh oh!

bjester Jan 7, 2026

Uh oh!

AlexVelezLl Jan 8, 2026

Uh oh!

bjester Jan 7, 2026

Uh oh!

AlexVelezLl commented Jan 8, 2026

Uh oh!

bjester commented Jan 8, 2026

Uh oh!

bjester left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prevent certificate tree corruption #280

Prevent certificate tree corruption #280

Uh oh!

Conversation

AlexVelezLl commented Jan 7, 2026

Summary

TODO

Reviewer guidance

Issues addressed

Documentation

Uh oh!

bjester left a comment

Choose a reason for hiding this comment

Uh oh!

bjester Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

AlexVelezLl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

bjester Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

AlexVelezLl commented Jan 8, 2026

Uh oh!

bjester commented Jan 8, 2026

Uh oh!

bjester left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants