Skip to content

Conversation

ketanhwr
Copy link
Member

I ideally wanted to make all subtypes (eg: FetchEventInfo, TraceLog, etc) serializable as well, and then have a method toCapnp() in each of them which I could reuse in serializing TraceItem. But the constructors of most of these types need a Trace& which makes it a bit challenging from the point of view of deserializing.

Is there any particular reason passing Trace& in the constructor for these types, given it's not used at all? (Is it because we don't want to allow construction of these types unless it's part of a TraceItem?)

Do I need a compat flag for this? If yes - how exactly do I feature-flag JSG_SERIALIZABLE?

@ketanhwr ketanhwr requested review from jasnell, npaun and fhanau July 23, 2025 15:25
@ketanhwr ketanhwr requested review from a team as code owners July 23, 2025 15:25
@edmundhung
Copy link
Member

cc: @Ankcorn

@danlapid
Copy link
Collaborator

danlapid commented Jul 23, 2025

definitely needs a compat flag, should be experimental as well until it is fully rolled out

You can look abortSignalRpc.

],
bindings = [
( name = "SERVICE", service = "http-test" ),
( name = "CACHE_ENABLED", json = "false" ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

}
}
}
auto words = capnp::messageToFlatArray(traceMessage);
Copy link
Member

@kentonv kentonv Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's sort of ugly to be embedding a capnp messages into v8-serialized JS objects. The v8-serialized object itself is just going to be embedded into another capnp message.

I think ideally we'd actually serialize this as a straight JS object. That sort of makes sense here, as TraceItem is essentially just a plain JS object (and trace workers commonly reserialize it as JSON, even). However, I guess there isn't a convenient way to actually achieve such serialization from here, because we don't actually have that JS object, we have a C++ wrapped object...

The other thing worth exploring is whether we should use an "external" here. See JsValue::External in worker-interface.capnp. You could extend that with traceItem :Trace. This would mean the serialization code here would only work with Workers RPC, though. You'd start the serialize() function with code like:

  auto& handler = JSG_REQUIRE_NONNULL(serializer.getExternalHandler(), DOMDataCloneError,
      "Traces can only be serialized for RPC.");
  auto externalHandler = dynamic_cast<RpcSerializerExternalHandler*>(&handler);
  JSG_REQUIRE(externalHandler != nullptr, DOMDataCloneError,
      "Traces can only be serialized for RPC.");

But I dunno, maybe the turducken approach is fine and easiest to implement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about turducken.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have ALWAYS really disliked the fact that these were defined as jsg::Objects from the beginning. They should have been regular objects from the start and the direction for streaming tail workers, unless it's changed, is that those would be regular objects. I think maybe, tho, we can still serialize these as ordinary objects within the custom serialization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely going to be easier/better for us to just construct an internal plain-ole-javascript-object representation here and serialize that instead. Or we might just be able to get by with JSON stringifying the object, serializing that, and parsing it on the deserialize side, using the result to reconstruct the TraceItem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, @Ankcorn is already doing the JSON stringifying thing, and it's taking a lot of cpu time. He's eager about this change especially to see the CPU usage go down. I went the "turducken" approach since it makes the deserialization quite trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - I tried to change the order or JS vs C++ and also tried them individually to check if some connection re-use/other RPC optimizations behind the scenes didn't cause the performance diff - but still saw roughly ~1.7x speedup in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 1.7x actually that big an improvement? It's not a paradigm shift, and moving this responsibility to C++ adds a fair amount of inflexibility and maintenance burden.

I wonder if there are ways we could optimize the JSON serialization to get 70% faster. For example, currently the V8 JSON serializer is being forced to walk these wrapped objects that are fairly slow to access, since each property access is going to go through JSG. If it were walking native JS objects, would it be faster? Probably this API should have been native JS objects all along, and I wonder if we could change it (with a compat flag?). Alternatively, what if these types had native toJSON methods, so that V8 doesn't have to walk them -- we build the JSON in pure C++ code. How much faster would that be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation unfortunately involves the time taken for rest of the RPC lifecycle as well - I tried to sample times more accurate by just calling structuredClone instead and noticed ~3x speed up in that case. But yeah - better but not a huge improvement.

Also - wouldn't implementing a toJSON method be quite similar to implementing and maintaining a serialize() function? As for implementing this natively in JS behind a compat flag, I can certainly take a look but I'm not sure if it impacts/overlaps with the work @fhanau has been doing lately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - wouldn't implementing a toJSON method be quite similar to implementing and maintaining a serialize() function?

The difference is:

  • It would immediately speed up existing use cases that are JSON-serializing these objects, with no changes on their end.
  • It does not introduce any new protocols that we need to maintain forever.

I think the first point may make it worth doing regardless. JSON-serializing these objects is pretty common in tail workers, and in some cases cannot be replaced with RPC since the JSON is being sent to things that aren't workers.

Maybe once we know how fast that is we can decide if it's worth introducing a new RPC format as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to reiterate, I would expect the reason JSON serialization of these objects is so slow today is not so much because of JSON, but because V8 has to walk the tree of wrapped objects, calling getter callbacks for every property. Implementing toJSON() would avoid that.

@jasnell jasnell added the o11y Observability label Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
o11y Observability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants