-
Notifications
You must be signed in to change notification settings - Fork 23
Prevent certificate tree corruption #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent certificate tree corruption #280
Conversation
bjester
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly targeted fix that should address the specific observed issue well. Although, there are a few more ways and areas that Certificate is vulnerable to this issue. I'm suggesting a few revisions on what you've done so far and also suggest a way we can address this holistically for more complex scenarios.
We should also address this within push_signed_client_certificate_chain which is very similar to certificate_signing_request
morango/api/viewsets.py
Outdated
| with transaction.atomic(): | ||
| # lock the parent certificate row for update | ||
| parent = Certificate.objects.select_for_update().get(pk=serialized_cert.validated_data['parent'].pk) | ||
| serialized_cert.validated_data['parent'] = parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like you did in the other case, where you wrapped a smaller amount of code, the same can be done here. The validation and certificate signing takes a non-negligible amount of time, since it performs cryptographic operations. So limiting what's wrapped in this transaction will reduce the time that this is actually blocking.
It should be safe to move your additions and the transaction to wrap only around the certificate.save() where MPTT should be involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried that, but for some reason, I was still getting the concurrency errors if I just wrapped the save method in the transaction. Will give it another try and see if there was something else happening around there!
morango/sync/syncsession.py
Outdated
| if not Certificate.objects.filter(id=parent_cert.id).exists(): | ||
| cert_chain_response = self._get_certificate_chain( | ||
| params={"ancestors_of": parent_cert.id} | ||
| ) | ||
|
|
||
| # upon receiving cert chain from server, we attempt to save the chain into our records | ||
| Certificate.save_certificate_chain( | ||
| cert_chain_response.json(), expected_last_id=parent_cert.id | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs similar protection. Although, this is dealing with a situation where the ancestors of the certificate may not exist. Since Kolibri doesn't currently use more than 2 levels of certificates, this shouldn't encounter the same MPTT issue. Nevertheless, with 3 or more, it has potential for it.
The difference here though is that we don't have those records to lock via select_for_update. For SQLite, we can trust that the DB is configured to help us enforce this simply by using a transaction (what Richard recently fixed in Kolibri, although we should document this expectation for Morango). For PostgreSQL, we can use advisory locks to block a transaction using something we'll know to be common between parallel executions. This is something we used on studio. The advisory lock pattern is therefore a bit more flexible and allows us to create a reusable pattern for this particular situation.
We should be able to use the same lock_partitions function that is used for sync locking. We'll just need to lock based off the root certificate.
So this would become something like:
if not Certificate.objects.filter(id=parent_cert.id).exists():
cert_chain_response = self._get_certificate_chain(
params={"ancestors_of": parent_cert.id}
)
with transaction.atomic():
lock_partitions(backend, sync_filter=cert_chain_response[0]["id"])
# check again, now that we have a lock
if not Certificate.objects.filter(id=parent_cert.id).exists():
Certificate.save_certificate_chain(
cert_chain_response.json(), expected_last_id=parent_cert.id
)We could use this advisory lock pattern everywhere, too, instead of select_for_update in the case we actually have records. It also has the advantage of protecting more complex certificate structures with 3+ levels, since that opens up more surface area for where the select_for_update lock may not protect the integrity of the tree during concurrent writes across it, but like I said, Kolibri doesn't leverage that currently.
bb4d936 to
8523ad3
Compare
|
Thanks @bjester! I have moved to use |
8523ad3 to
a9bf52c
Compare
It looks pretty great! I think what you did, wrapping all saves, is fine. I think it's fine primarily because certificates are rarely created or updated. Having the lock managed outside of the model class is a little more portable and consistent, but I don't think it's necessary to change. I'm going to try running the morango-integration tests on Kolibri with this change. If it all looks good, I will approve and merge. Thanks! |
bjester
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AlexVelezLl
Summary
When many sync sessions were happening concurrently, there were occasions where two certificates could have the same
lftandrght. This was due to some concurrency problems in the internals of MTTP. Wrapping these Certificate creations on the remote server and on the local server (after the remote server's response) solved the problem.TODO
Reviewer guidance
Issues addressed
Needed to fix learningequality/kolibri#13821
Documentation
If the PR has documentation, link the file here (either .rst in your repo or if built on Read The Docs)