|
| 1 | +--- |
| 2 | +title: "Schemas and Coders and Benchmarks" |
| 3 | +date: 2025-01-22T06:24:24-08:00 |
| 4 | +tags: |
| 5 | +- beam |
| 6 | +- go |
| 7 | +- hobby sdk |
| 8 | +- dev |
| 9 | +categories: |
| 10 | +- Dev |
| 11 | +--- |
| 12 | + |
| 13 | +This weekend I got nerd snipped into working on a part of my hobby SDK, that |
| 14 | +I didn't have too much motivation to do. Beam Schema Row coders. |
| 15 | + |
| 16 | +But they do need to be done eventually, so I implemented them. |
| 17 | + |
| 18 | +In particular, what I needed was a way to take the Beam Schema Proto, and turn |
| 19 | +it into coder that could produce a dynamic row value. The existing Go SDK in |
| 20 | +principle could do it, but it's not easy due to a choice I made years and years |
| 21 | +ago. I'm not convinced it's the wrong choice though. Besides, adding the new handling |
| 22 | +to the API would take longer than adding it to the hobby SDK, or for what I needed |
| 23 | +it for from scratch. |
| 24 | + |
| 25 | +Perhaps that's not entirely true. I'm being paranoid about continuing to broaden |
| 26 | +the API surface of the existing SDK. It's already quite large and complex, and |
| 27 | +the interactions are already quite subtle. |
| 28 | + |
| 29 | +Anyway, I wrote a quick naive implementation of the code, added tests, made them all |
| 30 | +pass, and [wrote a benchmark](https://github.com/lostluck/beam-go/blob/9429632fa47a6752671c4a4d6ad0325742485599/internal/schema/schema_test.go#L143). |
| 31 | + |
| 32 | +```go |
| 33 | +func BenchmarkRoundtrip(b *testing.B) { |
| 34 | + for _, test := range suite { |
| 35 | + b.Run(test.name, func(b *testing.B) { |
| 36 | + c := ToCoder(test.schema) |
| 37 | + b.ReportAllocs() |
| 38 | + b.ResetTimer() |
| 39 | + for range b.N { |
| 40 | + r := coders.Decode(c, test.data) |
| 41 | + if got, want := coders.Encode(c, r), test.data; !cmp.Equal(got, want) { |
| 42 | + b.Errorf("round trip decode-encode not equal: want %v, got %v", want, got) |
| 43 | + } |
| 44 | + } |
| 45 | + }) |
| 46 | + } |
| 47 | +} |
| 48 | +``` |
| 49 | + |
| 50 | +It's a pretty straightforward thing. I took the implementation straight from the |
| 51 | +above tests, set it to report allocations, and restart the benchmark timer, and |
| 52 | +then get to iterating. |
| 53 | + |
| 54 | +This gives bad results though. The problem is here: |
| 55 | + |
| 56 | +```go |
| 57 | +if got, want := coders.Encode(c, r), test.data; !cmp.Equal(got, want) { |
| 58 | + b.Errorf("round trip decode-encode not equal: want %v, got %v", want, got) |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +I kept the comparison in, to validate that everything continues to work as |
| 63 | +desired through the thousands of runs the coders would be put though. Or I was |
| 64 | +lazy about it. `cmp` is not intended to be high performance, it's intended to be |
| 65 | +convenient and correct. |
| 66 | + |
| 67 | +Had I used `bytes.Equal` instead of the general `cmp.Equal`, |
| 68 | +I'd have spent less CPU, and probably wouldn't have noticed. |
| 69 | + |
| 70 | +For this task, I didn't much care about CPU. I cared about allocations, which do |
| 71 | +directly affect CPU and memory usage. And `cmp` is allocation heavy by |
| 72 | +comparison to a straight byte slice equality check. |
| 73 | + |
| 74 | +That's not all though. I had used my convenience functions for the benchmark too, |
| 75 | +to trivially get the through the decode and encode cycle. |
| 76 | +`coders.Decode` and `coders.Encode` are just small wrappers to simplify quick one |
| 77 | +off encodings, such as for testing. |
| 78 | + |
| 79 | +But like `cmp`, they are convenient, not high performance. |
| 80 | + |
| 81 | +The test body [now looks like this](https://github.com/lostluck/beam-go/blob/53c2c2b073dce1eebbd4090b2d642362f277c895/internal/schema/schema_test.go#L225): |
| 82 | + |
| 83 | +```go |
| 84 | +c := ToCoder(test.schema) |
| 85 | +// Mild shenanigans to prevent unnecessary allocations. |
| 86 | +enc := coders.NewEncoder() |
| 87 | +dec := *coders.NewDecoder(test.data) |
| 88 | +n := len(test.data) |
| 89 | +want := test.data |
| 90 | + |
| 91 | +b.ReportAllocs() |
| 92 | +b.ResetTimer() |
| 93 | +for range b.N { |
| 94 | + enc.Reset(n) |
| 95 | + dec = *coders.NewDecoder(test.data) |
| 96 | + r := c.Decode(&dec) |
| 97 | + c.Encode(enc, r) |
| 98 | + if got := enc.Data(); !bytes.Equal(got, want) { |
| 99 | + b.Errorf("encoding not equal: want %v, got %v", want, got) |
| 100 | + } |
| 101 | +} |
| 102 | +``` |
| 103 | + |
| 104 | +Moving the `Encoder` and `Decoder` allocations out of the hot loop, removed them |
| 105 | +from the profile graph. There's a bit of "fun" with the Decoder to avoid it |
| 106 | +getting heap allocated anew for each loop when the test data is being reset. |
| 107 | +Having the comparison back in adds a few nanoseconds per run, but helps keep the |
| 108 | +code staying robust. |
| 109 | + |
| 110 | +All of this was a problem largely because I wanted to collect a nice and clean |
| 111 | +"before" and "after" versions of the metrics as I cleaned up and changed the |
| 112 | +implementation. I like seeing performance improvements. But now the results are |
| 113 | +incomparable, and not apples to apples, so I've cleared them away. |
| 114 | + |
| 115 | +I am happy where this has ended up though. I incorporated the `unsafe` tricks |
| 116 | +protocol buffers use to minimize allocations and space in their protoreflect |
| 117 | +package: https://github.com/protocolbuffers/protobuf-go/blob/master/reflect/protoreflect/value_unsafe_go121.go. |
| 118 | + |
| 119 | +The idea is the same, really. Be able to refer to and mutate values and fields |
| 120 | +efficiently, against a known schema. I'd use their implementation directly if I |
| 121 | +were able to, but they have it quite locked down for the same reasons why I've |
| 122 | +put this stuff in an internal package for the time being. I want it to be able |
| 123 | +to change. |
| 124 | + |
| 125 | +This is basically half of a useful article, other than the lesson about |
| 126 | +being certain you know what you're measuring in your benchmarks. We'll see if |
| 127 | +I turn back the clock and collect clean measurements of the implementations. |
0 commit comments