fix(swap): Monero wallet thread safety#281
Conversation
|
I need to think about this a bit more but how long would you like to keep the lock? Over the duration of which process ? |
|
This race condition is the cause of the refund failures of #274. If we take a look at these logs: Then the asb is trying to sync 2 monero wallets at the same, which is not possible with monero-wallet-rpc. We need cannot release the lock unless we finish the complete sequence (opening wallet, sync, sweep) or else monero-wallet-rpc will be in an unexpected (wrong) state and cause refund issues etc. |
Before, monero::Wallet wrapped a Mutex<Client>, and locked the mutex on each operation. This meant releasing the lock in between operations, even though we rely on the operations being executed in order. To remedy this race condition, we wrap monero::Wallet itself in a mutex, requiring any caller to hold the lock for the duration of the operation, including any suboperations.
You are totally correct here. I have encountered this myself on my dev server. |
|
This also means we cannot provide quotes while we are syncing a refund wallet. Which is ok, I guess but is properly not expected behaviour. |
|
Combined with #288 this would probanly mostly resolve the issue, though? We could also return the cached quote while working on creating another one. |
I'd stay with the current policy of only returning a quote if the TTL is accurate. We don't want to return the same outdated quote over a long period (if e.g the syncing takes a long time). A user cannot initiate a swap while we're syncing anyway because we need to sync our main This will be fixed once #244 is merged. No need to prevent to have concurrency, if we don't support it. We should still merge this PR though (if it wasn't clear) |
|
We can add a timeout for the acquisition of the Mutex lock for some operation. If we want to serve a quote and can't lock the Monero wallet for >60s it might not make sense anymore to even bother with responding anymore. |
Due to the newly introduced thread safety, we are currently holding lock to the monero wallet while waiting for confirmations -- since this takes a lot of time, it starves all other tasks that do anything with the monero wallet. In this commit I start implementing a change that enables us to release the lock to the wallet while waiting for confirmations and only acquire it when necessary. This breaks with the current system of passing just a generic client which implements the MoneroWalletRpc trait (which we use to pass a dummy client for testing). This commit is the first step towards a small refactor to that system.
By always passing Arc<Mutex<Wallet>> instead of MoneroWalletRpc clients directly we can allow the wait_for_confirmations functions to lock the Mutex and access the client when they need to, while releasing the lock when waiting for the next tick. This stops the current starving of other tasks waiting for the lock. Since we use a dummy client for testing, this required adding a generic parameter to the Wallet. However, since we specify a default type, this doesn't actually require generic handling anywhere.
I think we just waited for the lock (without timeout) before this change too. But you're right, it totally makes sense to implement this here. |
This commit adds a timeout after 60 seconds when trying to acquire the lock on the monero wallet while making a quote. Should a timout occur, we return an error. This makes sure that we get _some_ return value and that starvation is noticed.
|
Currently, there still seems to be an instance of overly long lock holding and subsequent starvation of other tasks in wait_for_confirmations. Why this happens currently eludes me. |
|
Tests are passing now, could you review this @binarybaron @delta1? Edit: there is one test (reopens_wallet_in_case_not_available) which is probably failing |
I'll review this aspa. Our testnet asb is running this branch. |
|
@Einliterflasche Have you added the changes from #260? |
|
Not yet |
binarybaron
left a comment
There was a problem hiding this comment.
some comments about Mutex / RwLock
|
LGTM but let's do some manual testing before merging this. |
The test runs fine on my machine, but timed out in the action. I'll rerun the action, but it should be fine. |
|
I keep getting this error when trying to do a testnet swap: This isn't related to your changes but it's preventing me from testing this PR. Can you try doing a swap? |
|
I'm adding some changes from #260 into this PR. |
When we fail to create a monero wallet from keys, we will now try to open it instead. I also renamed the method to be more consistent with Wallet::open_or_create. These changes are mostly taken from #260.
eda2268 to
b57dd3b
Compare
b57dd3b to
66ce0c5
Compare
This commit deduplicates logic by using create_from_keys_and_sweep_to in bob's redeem_xmr and also adds the create_from_keys_and_sweep_to method while making create_from_keys_and_sweep a wrapper around it.
|
There was a deadlock possibility in |
Awesome. LGTM. Is this good to merge? |
|
I think so. Since you agree, I'll go ahead and merge this. |
|
Please also add a changelog entry. Awesome that we finally fixed this 💪🚀 |
Our current approach to thread safety regarding
monero::Wallet, which we pass around asArc<monero::Wallet>, is that it contains aMutex::monero_rpc::wallet::Clientand acquires that before using the monero wallet rpc process. Since it does that in each function, calling i.e.create_from_and_loadfromcreate_from_keys_and_sweepresults in the lock being acquired, released and then acquired again, even though we expect an atomic operation. This results in a race condition where another thread acquires the lock after it is first released, even though the function isn't finished yet.We need to migrate to an API where we can guarantee that multiple functions can be called without releasing the lock. The best way to do this would be to always pass
Arc<Mutex<monero::Wallet>>and have the caller lock the mutex. Any call to themonero::Walletwould then be guaranteed to happen within a single lock acquisition. However, this would introduce a lot of (small) changes, since we useArc<monero::Wallet>quite a lot.@binarybaron can you think of another way to fix this with less friction, that is similarly robust?