-
Notifications
You must be signed in to change notification settings - Fork 431
Add rpc/serialization support for TraceItem #4595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cc: @Ankcorn |
definitely needs a compat flag, should be experimental as well until it is fully rolled out You can look abortSignalRpc. |
], | ||
bindings = [ | ||
( name = "SERVICE", service = "http-test" ), | ||
( name = "CACHE_ENABLED", json = "false" ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
} | ||
} | ||
} | ||
auto words = capnp::messageToFlatArray(traceMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's sort of ugly to be embedding a capnp messages into v8-serialized JS objects. The v8-serialized object itself is just going to be embedded into another capnp message.
I think ideally we'd actually serialize this as a straight JS object. That sort of makes sense here, as TraceItem is essentially just a plain JS object (and trace workers commonly reserialize it as JSON, even). However, I guess there isn't a convenient way to actually achieve such serialization from here, because we don't actually have that JS object, we have a C++ wrapped object...
The other thing worth exploring is whether we should use an "external" here. See JsValue::External
in worker-interface.capnp
. You could extend that with traceItem :Trace
. This would mean the serialization code here would only work with Workers RPC, though. You'd start the serialize() function with code like:
auto& handler = JSG_REQUIRE_NONNULL(serializer.getExternalHandler(), DOMDataCloneError,
"Traces can only be serialized for RPC.");
auto externalHandler = dynamic_cast<RpcSerializerExternalHandler*>(&handler);
JSG_REQUIRE(externalHandler != nullptr, DOMDataCloneError,
"Traces can only be serialized for RPC.");
But I dunno, maybe the turducken approach is fine and easiest to implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about turducken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have ALWAYS really disliked the fact that these were defined as jsg::Objects from the beginning. They should have been regular objects from the start and the direction for streaming tail workers, unless it's changed, is that those would be regular objects. I think maybe, tho, we can still serialize these as ordinary objects within the custom serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely going to be easier/better for us to just construct an internal plain-ole-javascript-object representation here and serialize that instead. Or we might just be able to get by with JSON stringifying the object, serializing that, and parsing it on the deserialize side, using the result to reconstruct the TraceItem
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, @Ankcorn is already doing the JSON stringifying thing, and it's taking a lot of cpu time. He's eager about this change especially to see the CPU usage go down. I went the "turducken" approach since it makes the deserialization quite trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - I tried to change the order or JS vs C++ and also tried them individually to check if some connection re-use/other RPC optimizations behind the scenes didn't cause the performance diff - but still saw roughly ~1.7x speedup in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 1.7x actually that big an improvement? It's not a paradigm shift, and moving this responsibility to C++ adds a fair amount of inflexibility and maintenance burden.
I wonder if there are ways we could optimize the JSON serialization to get 70% faster. For example, currently the V8 JSON serializer is being forced to walk these wrapped objects that are fairly slow to access, since each property access is going to go through JSG. If it were walking native JS objects, would it be faster? Probably this API should have been native JS objects all along, and I wonder if we could change it (with a compat flag?). Alternatively, what if these types had native toJSON
methods, so that V8 doesn't have to walk them -- we build the JSON in pure C++ code. How much faster would that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation unfortunately involves the time taken for rest of the RPC lifecycle as well - I tried to sample times more accurate by just calling structuredClone
instead and noticed ~3x speed up in that case. But yeah - better but not a huge improvement.
Also - wouldn't implementing a toJSON
method be quite similar to implementing and maintaining a serialize()
function? As for implementing this natively in JS behind a compat flag, I can certainly take a look but I'm not sure if it impacts/overlaps with the work @fhanau has been doing lately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - wouldn't implementing a toJSON method be quite similar to implementing and maintaining a serialize() function?
The difference is:
- It would immediately speed up existing use cases that are JSON-serializing these objects, with no changes on their end.
- It does not introduce any new protocols that we need to maintain forever.
I think the first point may make it worth doing regardless. JSON-serializing these objects is pretty common in tail workers, and in some cases cannot be replaced with RPC since the JSON is being sent to things that aren't workers.
Maybe once we know how fast that is we can decide if it's worth introducing a new RPC format as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to reiterate, I would expect the reason JSON serialization of these objects is so slow today is not so much because of JSON, but because V8 has to walk the tree of wrapped objects, calling getter callbacks for every property. Implementing toJSON() would avoid that.
I ideally wanted to make all subtypes (eg:
FetchEventInfo
,TraceLog
, etc) serializable as well, and then have a methodtoCapnp()
in each of them which I could reuse in serializingTraceItem
. But the constructors of most of these types need aTrace&
which makes it a bit challenging from the point of view of deserializing.Is there any particular reason passing
Trace&
in the constructor for these types, given it's not used at all? (Is it because we don't want to allow construction of these types unless it's part of aTraceItem
?)Do I need a compat flag for this? If yes - how exactly do I feature-flag
JSG_SERIALIZABLE
?