-
Notifications
You must be signed in to change notification settings - Fork 924
GODRIVER-3707: Consolidate duplicate binary utility functions from bsoncore and wiremessage packages #2274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-3707: Consolidate duplicate binary utility functions from bsoncore and wiremessage packages #2274
Changes from all commits
dc131bd
56af566
f241f11
1801ea0
3f4a212
78477dd
a71e8ad
6e6903d
8f3d1c5
d2b6ff7
6e8c2c0
000b91b
cd7bea2
672e8b7
815508e
ceadabd
58a4d9a
93fcc89
8d747d3
ddb6a09
8cc3586
0ccd9c1
299cb36
b2ae3df
36f3eae
3ce86a1
18366cc
1e6f7fe
f238fb7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| // Copyright (C) MongoDB, Inc. 2025-present. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
| // not use this file except in compliance with the License. You may obtain | ||
| // a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| // Package binaryutil provides utility functions for working with binary data. | ||
| package binaryutil | ||
|
|
||
| import ( | ||
| "bytes" | ||
| "encoding/binary" | ||
| ) | ||
|
|
||
| // Append32 appends a uint32 or int32 value to dst in little-endian byte order. | ||
| // Byte shifting is done directly to prevent overflow security errors, in | ||
| // compliance with gosec G115. | ||
| // | ||
| // See: https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/encoding/binary/binary.go;l=92 | ||
| func Append32[T ~uint32 | ~int32](dst []byte, v T) []byte { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [blocking] There is a significant performance cost of this function compared to the standard library (0.5335 ns/op v 0.1760 ns/op): Suggest copying the standard library directly: func Append32[T ~uint32 | ~int32](dst []byte, v T) []byte {
return append(dst,
byte(v),
byte(v>>8),
byte(v>>16),
byte(v>>24),
)
}This problem is also present in append64.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct me if this is an incorrect way to do this I got the commands from AI, but when I run the benchmark my results are very close between the new implementation and standard library. Here is my output for averages over 100 runs from the append 32 functions: For append 32 those numbers are nearly the same over 100 runs. For append 64 I do see a very drastic difference: To keep things uniform I will apply your suggestion to both functions, thanks for catching that
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm seeing something similar using benchstat. Regardless, it would be ideal to keep the code as close to the source as possible. |
||
| return append(dst, | ||
| byte(v), | ||
| byte(v>>8), | ||
| byte(v>>16), | ||
| byte(v>>24), | ||
| ) | ||
| } | ||
|
|
||
| // Append64 appends a uint64 or int64 value to dst in little-endian byte order. | ||
| // Byte shifting is done directly to prevent overflow security errors, in | ||
| // compliance with gosec G115. | ||
| // | ||
| // See: https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/encoding/binary/binary.go;l=119 | ||
| func Append64[T ~uint64 | ~int64](dst []byte, v T) []byte { | ||
| return append(dst, | ||
| byte(v), | ||
| byte(v>>8), | ||
| byte(v>>16), | ||
| byte(v>>24), | ||
| byte(v>>32), | ||
| byte(v>>40), | ||
| byte(v>>48), | ||
| byte(v>>56), | ||
| ) | ||
| } | ||
|
|
||
| // ReadU32 reads a uint32 from src in little-endian byte order. ReadU32 and | ||
| // ReadI32 are separate functions to avoid unsafe casting between unsigned and | ||
| // signed integers. | ||
| func ReadU32(src []byte) (uint32, []byte, bool) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [blocking] This function should wrap the standard library function since doing so doesn't require a type conversion: func ReadU32(src []byte) (uint32, []byte, bool) {
if len(src) < 4 {
return 0, src, false
}
return binary.LittleEndian.Uint32(src), src[4:], true
} |
||
| if len(src) < 4 { | ||
| return 0, src, false | ||
| } | ||
|
|
||
| return binary.LittleEndian.Uint32(src), src[4:], true | ||
| } | ||
|
|
||
| // ReadI32 reads an int32 from src in little-endian byte order. | ||
| // Byte shifting is done directly to prevent overflow security errors, in | ||
| // compliance with gosec G115. ReadU32 and ReadI32 are separate functions to | ||
| // avoid unsafe casting between unsigned and signed integers. | ||
| // | ||
| // See: https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/encoding/binary/binary.go;l=79 | ||
| func ReadI32(src []byte) (int32, []byte, bool) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelCenzano Like we discussed in office hours, I can't tell why this isn't benchmarking the same as Just noting here since it sounds like we should accept this solution but are seeing weirdness from the benchmarks. CC: @matthewdale
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gave ai a stab at the question, it's answer stated it may be related to the inlining. This may be wrong but it stated because the code in ReadU is inline already things are easy to compile and run while ReadI it inlines the standard library call, and then it may have issues properly optimizing the checks/hints like Also as a note ReadU64 and ReadI64 have the same difference in benchmark
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelCenzano The Go compiler would inline binary.LittleEndian.Uint32, which is evident in the results from the go-playground example.
Could you elaborate on what you mean by this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I ran the benchmark on those two functions I saw the same issue with performance differences. We discussed in our meeting today. I'm going to spend a few hours looking into this before we merge to try to find a root cause if possible
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelCenzano Somehow the problem is this block: if len(src) < 4 {
return 0, src, false
}The assembly ends up being [effectively] the same for either case, though. This is bizarre. I think we should just do type casting, despite the gosec issues. We should not introduce a regression. func ReadI32(src []byte) (int32, []byte, bool) {
if len(src) < 4 {
return 0, src, false
}
//nolint:gosec // G115: Safe bit-pattern reinterpretation
return int32(binary.LittleEndian.Uint32(src)), src[4:], true
} |
||
| if len(src) < 4 { | ||
| return 0, src, false | ||
| } | ||
|
|
||
| _ = src[3] // bounds check hint to compiler | ||
|
|
||
| value := int32(src[0]) | | ||
| int32(src[1])<<8 | | ||
| int32(src[2])<<16 | | ||
| int32(src[3])<<24 | ||
|
|
||
| return value, src[4:], true | ||
| } | ||
|
|
||
| // ReadU64 reads a uint64 from src in little-endian byte order. ReadU64 and | ||
| // ReadI64 are separate functions to avoid unsafe casting between unsigned and | ||
| // signed integers. | ||
| func ReadU64(src []byte) (uint64, []byte, bool) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [blocking] This function should wrap the standard library function since doing so doesn't require a type conversion. |
||
| if len(src) < 8 { | ||
| return 0, src, false | ||
| } | ||
|
|
||
| return binary.LittleEndian.Uint64(src), src[8:], true | ||
| } | ||
|
|
||
| // ReadI64 reads an int64 from src in little-endian byte order. | ||
| // Byte shifting is done directly to prevent overflow security errors, in | ||
| // compliance with gosec G115. ReadU64 and ReadI64 are separate functions to | ||
| // avoid unsafe casting between unsigned and signed integers. | ||
| // | ||
| // See: https://cs.opensource.google/go/go/+/refs/tags/go1.19:src/encoding/binary/binary.go;l=101 | ||
| func ReadI64(src []byte) (int64, []byte, bool) { | ||
| if len(src) < 8 { | ||
| return 0, src, false | ||
| } | ||
|
|
||
| _ = src[7] // bounds check hint to compiler | ||
|
|
||
| value := int64(src[0]) | | ||
| int64(src[1])<<8 | | ||
| int64(src[2])<<16 | | ||
| int64(src[3])<<24 | | ||
| int64(src[4])<<32 | | ||
| int64(src[5])<<40 | | ||
| int64(src[6])<<48 | | ||
| int64(src[7])<<56 | ||
|
|
||
| return value, src[8:], true | ||
| } | ||
|
|
||
| // ReadCStringBytes reads a null-terminated C string from src as a byte slice. | ||
| // This is the base implementation used by ReadCString to ensure a single source | ||
| // of truth for C string parsing logic. | ||
| func ReadCStringBytes(src []byte) ([]byte, []byte, bool) { | ||
| idx := bytes.IndexByte(src, 0x00) | ||
| if idx < 0 { | ||
| return nil, src, false | ||
| } | ||
| return src[:idx], src[idx+1:], true | ||
| } | ||
|
|
||
| // ReadCString reads a null-terminated C string from src as a string. | ||
| // It delegates to ReadCStringBytes to maintain a single source of truth for | ||
| // C string parsing logic. | ||
| func ReadCString(src []byte) (string, []byte, bool) { | ||
| cstr, rem, ok := ReadCStringBytes(src) | ||
| if !ok { | ||
| return "", src, false | ||
| } | ||
| return string(cstr), rem, true | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[blocking] Each function should have a description. In the description we should note why we do byte shifting directly, instead of using the binary package. We should also note that the order is little-endian:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added