(WIP) Improve Binary Encode Performance#964
Conversation
|
Elias Kassell seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
timostamm
left a comment
There was a problem hiding this comment.
Thanks for the PR, Elias!
Unfortunately, resizable ArrayBuffers are relatively new. We'll have to wait until they are more widely available before we can use them.
I left a couple of informative comments.
| // NodeJS strings are by default UTF-8, so we can assume the byte length as the length of | ||
| // the string. | ||
| const valueBytesLength = value.length; |
There was a problem hiding this comment.
Unfortunately this won't work. Strings are UTF-16, and length returns the number of code units.
| if (opts.writeUnknownFields) { | ||
| for (const { no, wireType, data } of msg.getUnknown() ?? []) { | ||
| writer.tag(no, wireType).raw(data); | ||
| writer.tag(no, wireType).bytes(data); |
There was a problem hiding this comment.
The bytes method prefixes the length. The change will corrupt data, and needs to be reverted.
| writeScalarValue(writer, scalarType, item as ScalarValue); | ||
| } | ||
| writer.join(); | ||
| writer.tag(field.number, WireType.LengthDelimited); |
There was a problem hiding this comment.
fork() and join() will take everything between, and write it length-prefixed. The tag needs to come first, then the length prefix, then the data.
| break; | ||
| } | ||
| writer.join(); | ||
| writer.tag(field.number, WireType.LengthDelimited); |
| // TODO(ekrekr): this is really slow, because it has to allocate a whole new array buffer. | ||
| // Instead we should be writing the message directly to the original arraybuffer, then inserting | ||
| // the length beforehand. |
There was a problem hiding this comment.
Yes. Unfortunately, varints have variable length, so this isn't straight-forward.
| /** | ||
| * Encode UTF-8 text to an existing binary. | ||
| */ | ||
| encodeInto: ( |
There was a problem hiding this comment.
It's a breaking change to add a mandatory property to the interface, so we'll have to find other means.
Work in progress - I haven't yet come up with a way to efficiently fork a new message directly within the contigious memory space, for length delimited records.
Efficiency optimisations through avoiding copying of array buffers, by writing directly to an expanding array buffer with dynamic contiguous memory space.
TODO: add performance tests and flamegraphs once working.