Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions include/mp/proxy-types.h
Original file line number Diff line number Diff line change
Expand Up @@ -726,6 +726,13 @@ kj::Promise<void> serverInvoke(Server& server, CallContext& call_context, Fn fn)
MP_LOG(*server.m_context.loop, Log::Debug) << "IPC server send response #" << req << " " << TypeName<Results>();
MP_LOG(*server.m_context.loop, Log::Trace) << "response data: "
<< LogEscape(call_context.getResults().toString(), server.m_context.loop->m_log_opts.max_chars);
}, [&server, req](::kj::Exception&& e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the local server_capture being captured by reference problematic if invoke is being called async? Possible that's smashing the stack here?

(Maybe I'm way off, these tangled callbacks a breaking my brain).

But I think I'd expect this to be return ReplaceVoid([server_context = std::move(server_context)]()...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should double-check all the lifetimes here, but I think the problem in the gnu32 job is weirder than a lifetime issue, because in gdb I can inspect the kj::Exception&& e variable here and it is a valid object containing the "thread busy" error and full context information. But then two calls deeper in the call stack, the stringify function which is supposed to print the exception, somehow receives a garbage reference to a kj::Exception with a completely different address. So it seems like there is an ABI issue or compiler bug exposed by this change, but I am not sure.

On server vs server_context, the server object is actually the one with the longer lifetime. When you declare an interface MyInterface { myMethod @0 () -> (); } capnproto generates C++ class like:

class MyInterface::Server {
  virtual Promise<void> myMethod(MyMethodContext) = 0;
}

which you inherit from to implement all the methods. The server variable is a reference to the object implementing these methods, and there is also a MyInterface::Client class generated to call the methods. The lifetime of the server object is completely controlled by Cap'n Proto. It will keep the server around until nothing is referencing it, so it will be alive as long as any client on a live connection exists, and as long as any IPC call is in progress, even if the client is released or disconnected.

By contrast server_context is just a local variable in the serverInvoke function, so you would need to copy or move it like you suggested to use it in a promise.then() handler but it is fine to use by reference in ReplaceVoid lambdas because of the way ReplaceVoid is defined, where it just calls them right away before returning. I try to use convention of using [&] and implicit captures for any lambda that will be called right away before the statement returns, and using explicit captures for lambdas that are delayed callbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like there is an ABI issue or compiler bug exposed by this change, but I am not sure.

I haven't looked at anything here, but I presume capnp is compiled with clang in the CI and libmultiprocess with GCC, so it just reminds me of bitcoin/bitcoin#31772. But I haven't checked any of this.

// Call failed for some reason. Cap'n Proto will try to send
// this error to the client as well, but it is good to log the
// failure early here and include the request number.
MP_LOG(*server.m_context.loop, Log::Error) << "IPC server error request #" << req << " " << TypeName<Results>()
<< " " << kj::str("kj::Exception: ", e).cStr();
return kj::mv(e);
});
} catch (const std::exception& e) {
MP_LOG(*server.m_context.loop, Log::Error) << "IPC server unhandled exception: " << e.what();
Expand Down
34 changes: 34 additions & 0 deletions include/mp/type-context.h
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,40 @@ auto PassField(Priority<1>, TypeList<>, ServerContext& server_context, const Fn&
<< "IPC server error request #" << req << ", missing thread to execute request";
throw std::runtime_error("invalid thread handle");
}
}, [&server, req](::kj::Exception&& e) {
// If you see the error "(remote):0: failed: remote exception:
// Called null capability" here, it probably means your Init class
// is missing a declaration like:
//
// construct @0 (threadMap: Proxy.ThreadMap) -> (threadMap :Proxy.ThreadMap);
//
// which passes a ThreadMap reference from the client to the server,
// allowing the server to create threads to run IPC calls on the
// client, and also returns a ThreadMap reference from the server to
// the client, allowing the client to create threads on the server.
// (Typically the latter ThreadMap is used more often because there
// are more client-to-server calls.)
//
// If the other side of the connection did not previously get a
// ThreadMap reference from this side of the connection, when the
// other side calls `m_thread_map.makeThreadRequest()` in
// `BuildField` above, `m_thread_map` will be null, but that call
// will not fail immediately due to Cap'n Proto's request pipelining
// and delayed execution. Instead that call will return an invalid
// Thread reference, and when that reference is passed to this side
// of the connection as `thread_client` above, the
// `getLocalServer(thread_client)` call there will be the first
// thing to overtly fail, leading to an error here.
//
// Potentially there are also other things that could cause errors
// here, but this is the most likely cause.
//
// The log statement here is not strictly necessary since the same
// exception will also be logged in serverInvoke, but this logging
// may provide extra context that could be helpful for debugging.
MP_LOG(*server.m_context.loop, Log::Info)
<< "IPC server error request #" << req << " CapabilityServerSet<Thread>::getLocalServer call failed, did you forget to provide a ThreadMap to the client prior to this IPC call?";
return kj::mv(e);
})
// Wait for the invocation to finish before returning to the caller.
.then([invoke_wait = kj::mv(future.promise)]() mutable { return kj::mv(invoke_wait); });
Expand Down