Skip to content

Commit 949573d

Browse files
committed
Prevent IPC server crash if disconnected during IPC call
Antoine Poinsot <[email protected]> reported a bug with details in #182 (comment) where an IPC client rapidly connecting and disconnecting to the server in a loop could cause problems. One problem this uncovered was hangs on shutdown fixed by the previous commit. Another problem it uncovered is that if a disconnect happened while the IPC server was executing an IPC call, and the onDisconnect() handling code ran before the IPC call finished, memory corruption would occur usually leading to a hang or crash. This happend because the ~ProxyServerBase() destructor was written with the assumption that it would always be called either from the ~Connection() destructor, or before that point, so its m_context.connection pointer would always be valid. Typically this was the case, but not if the connection was closed during an active IPC call. A unit test to reproduce this error is added in the subsequent commit. The fix for the bug here is actually a code simplification. Previously ~ProxyServerBase() destructor would append a std::function callback destroying the ProxyServerBase::m_impl object to m_context.connection->m_async_cleanup_fns, so the object destructor could be called asynchronously without blocking the event loop. Now it just appends the same callback function to m_context.loop->m_async_fns without going through the Connection object. The new code is an improvement over the previous code in two other ways: - It is a code simplification since previous code was needlessly complicated. - In the case where there is no disconnect, and the remote ProxyClient is destroyed, and no "destroy" method is declared, this change causes the ProxyServer::m_impl object to be freed shortly after the client is freed, instead of being delayed until the connection is closed. (If a "destroy" method is declared, this change has no effect because in that case destroying the ProxyClient calls destroy and destroys ProxyServer::m_impl synchronously.) The previous code was trying to make the ProxyServer cleanup with Connection::m_async_cleanup_fns mirror the ProxyClient cleanup with Connection::m_sync_cleanup_fns, but it was just fragile and unncecessarily complicated. Some comments about ProxyClient cleanup code are updated for clarity here but no changes are made.
1 parent ea38392 commit 949573d

File tree

2 files changed

+29
-23
lines changed

2 files changed

+29
-23
lines changed

include/mp/proxy-io.h

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,10 @@ class EventLoop
178178
post(std::forward<Callable>(callable));
179179
}
180180

181+
//! Register cleanup function to run on asynchronous worker thread without
182+
//! blocking the event loop thread.
183+
void addAsyncCleanup(std::function<void()> fn);
184+
181185
//! Start asynchronous worker thread if necessary. This is only done if
182186
//! there are ProxyServerBase::m_impl objects that need to be destroyed
183187
//! asynchronously, without tying up the event loop thread. This can happen
@@ -329,10 +333,6 @@ class Connection
329333
CleanupIt addSyncCleanup(std::function<void()> fn);
330334
void removeSyncCleanup(CleanupIt it);
331335

332-
//! Register asynchronous cleanup function to run on worker thread when
333-
//! disconnect() is called.
334-
void addAsyncCleanup(std::function<void()> fn);
335-
336336
//! Add disconnect handler.
337337
template <typename F>
338338
void onDisconnect(F&& f)
@@ -361,11 +361,10 @@ class Connection
361361
//! ThreadMap.makeThread) used to service requests to clients.
362362
::capnp::CapabilityServerSet<Thread> m_threads;
363363

364-
//! Cleanup functions to run if connection is broken unexpectedly.
365-
//! Lists will be empty if all ProxyClient and ProxyServer objects are
366-
//! destroyed cleanly before the connection is destroyed.
364+
//! Cleanup functions to run if connection is broken unexpectedly. List
365+
//! will be empty if all ProxyClient are destroyed cleanly before the
366+
//! connection is destroyed.
367367
CleanupList m_sync_cleanup_fns;
368-
CleanupList m_async_cleanup_fns;
369368
};
370369

371370
//! Vat id for server side of connection. Required argument to RpcSystem::bootStrap()
@@ -454,6 +453,16 @@ ProxyServerBase<Interface, Impl>::ProxyServerBase(std::shared_ptr<Impl> impl, Co
454453

455454
//! ProxyServer destructor, called from the EventLoop thread by Cap'n Proto
456455
//! garbage collection code after there are no more references to this object.
456+
//! This will typically happen when the corresponding ProxyClient object on the
457+
//! other side of the connection is destroyed. It can also happen earlier if the
458+
//! connection is broken or destroyed. In the latter case this destructor will
459+
//! typically be called inside m_rpc_system.reset() call in the ~Connection
460+
//! destructor while the Connection object still exists. However, because
461+
//! ProxyServer objects are refcounted, and the Connection object could be
462+
//! destroyed while asynchronous IPC calls are still in-flight, it's possible
463+
//! for this destructor to be called after the Connection object no longer
464+
//! exists, so it is NOT valid to dereference the m_context.connection pointer
465+
//! from this function.
457466
template <typename Interface, typename Impl>
458467
ProxyServerBase<Interface, Impl>::~ProxyServerBase()
459468
{
@@ -477,7 +486,7 @@ ProxyServerBase<Interface, Impl>::~ProxyServerBase()
477486
// connection is broken). Probably some refactoring of the destructor
478487
// and invokeDestroy function is possible to make this cleaner and more
479488
// consistent.
480-
m_context.connection->addAsyncCleanup([impl=std::move(m_impl), fns=std::move(m_context.cleanup_fns)]() mutable {
489+
m_context.loop->addAsyncCleanup([impl=std::move(m_impl), fns=std::move(m_context.cleanup_fns)]() mutable {
481490
impl.reset();
482491
CleanupRun(fns);
483492
});

src/mp/proxy.cpp

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,13 @@ ProxyContext::ProxyContext(Connection* connection) : connection(connection), loo
8585

8686
Connection::~Connection()
8787
{
88-
// Shut down RPC system first, since this will garbage collect Server
89-
// objects that were not freed before the connection was closed, some of
90-
// which may call addAsyncCleanup and add more cleanup callbacks which can
91-
// run below.
88+
// Shut down RPC system first, since this will garbage collect any
89+
// ProxyServer objects that were not freed before the connection was closed.
90+
// Typically all ProxyServer objects associated with this connection will be
91+
// freed before this call returns. However that will not be the case if
92+
// there are asynchronous IPC calls over this connection still currently
93+
// executing. In that case, Cap'n Proto will destroy the ProxyServer objects
94+
// after the calls finish.
9295
m_rpc_system.reset();
9396

9497
// ProxyClient cleanup handlers are in sync list, and ProxyServer cleanup
@@ -137,13 +140,6 @@ Connection::~Connection()
137140
m_sync_cleanup_fns.front()();
138141
m_sync_cleanup_fns.pop_front();
139142
}
140-
while (!m_async_cleanup_fns.empty()) {
141-
const Lock lock(m_loop->m_mutex);
142-
m_loop->m_async_fns->emplace_back(std::move(m_async_cleanup_fns.front()));
143-
m_async_cleanup_fns.pop_front();
144-
}
145-
Lock lock(m_loop->m_mutex);
146-
m_loop->startAsyncThread();
147143
}
148144

149145
CleanupIt Connection::addSyncCleanup(std::function<void()> fn)
@@ -166,9 +162,9 @@ void Connection::removeSyncCleanup(CleanupIt it)
166162
m_sync_cleanup_fns.erase(it);
167163
}
168164

169-
void Connection::addAsyncCleanup(std::function<void()> fn)
165+
void EventLoop::addAsyncCleanup(std::function<void()> fn)
170166
{
171-
const Lock lock(m_loop->m_mutex);
167+
const Lock lock(m_mutex);
172168
// Add async cleanup callbacks to the back of the list. Unlike the sync
173169
// cleanup list, this list order is more significant because it determines
174170
// the order server objects are destroyed when there is a sudden disconnect,
@@ -185,7 +181,8 @@ void Connection::addAsyncCleanup(std::function<void()> fn)
185181
// process, otherwise shared pointer counts of the CWallet objects (which
186182
// inherit from Chain::Notification) will not be 1 when WalletLoader
187183
// destructor runs and it will wait forever for them to be released.
188-
m_async_cleanup_fns.emplace(m_async_cleanup_fns.end(), std::move(fn));
184+
m_async_fns->emplace_back(std::move(fn));
185+
startAsyncThread();
189186
}
190187

191188
EventLoop::EventLoop(const char* exe_name, LogFn log_fn, void* context)

0 commit comments

Comments
 (0)