|
| 1 | +--- |
| 2 | +title: "Split It Out" |
| 3 | +date: 2025-01-01T21:00:00-07:00 |
| 4 | +tags: |
| 5 | + - generics |
| 6 | + - go |
| 7 | + - beam |
| 8 | +categories: |
| 9 | + - Dev |
| 10 | + - Talks |
| 11 | + - Hobby SDK |
| 12 | +--- |
| 13 | + |
| 14 | +Since [my last post](/2024/10/my-hobby-beam-go-sdk) I've been quite busy with |
| 15 | +travel and recovery from same, and thus I haven't poked around trying to work on |
| 16 | +my Hobby SDK. |
| 17 | + |
| 18 | +Which also means, I haven't spoken about more widely either. Just after I |
| 19 | +re-built the site to use Github actions too, to make it easier to write more. |
| 20 | +Ah well. Still need to sort out a few CSS things though, since line wrapping is |
| 21 | +weird... I want to have the markdown look good, but not have that affect the |
| 22 | +rendered output beyond formating... |
| 23 | + |
| 24 | +That's for a later post. This is about how to deal with Stateful DoFns! |
| 25 | + |
| 26 | +----- |
| 27 | + |
| 28 | +The next thing to do for the Hobby SDK is add State and Timers, and ultimately, |
| 29 | +stateful transforms. I was stuck on this for a little bit. Specifically how |
| 30 | +to add these to the graph construction mechanisms, while also avoiding writing |
| 31 | +too much code to replicate the type system. |
| 32 | + |
| 33 | +The trick though is that Stateful DoFns, that is, those that make use State and |
| 34 | +Timers in any capacity, have a hard restriction: they must take in KVs as an |
| 35 | +input. This is how the Beam model enables stateful stream processing, without |
| 36 | +sacrificing parallelism. |
| 37 | + |
| 38 | +The question then, is how do I have the hobby SDK enforce this? |
| 39 | + |
| 40 | +You see, in most Beam SDKs, there is really only a single `ParDo` call as a part |
| 41 | +of graph construction, and it will accept `DoFns` of any kind. The simple light |
| 42 | +weight ones, to having Side Inputs, to Splittable DoFns, and of course, |
| 43 | +stateful DoFns. But then there's a later step that does the enforcement of these |
| 44 | +compatibility requirements. |
| 45 | + |
| 46 | +And that's just a lot of code. Unlike in Java, Go Generics don't erase types, |
| 47 | +nor can I have overloaded implementations to selectively validate different |
| 48 | +constraints. It could be possible to make some way of "detecting" the KV, using |
| 49 | +reflection to enforce validating that the outer type is a KV, but that goes |
| 50 | +against the ethos of this Hobby SDK. |
| 51 | + |
| 52 | +I want to have the Go Compiler do the enforcement, as much as possible. |
| 53 | + |
| 54 | +This means that I'll end up with a separate `StatefulParDo` top level graph |
| 55 | +construction function. There's a benefit to this: It can be clearly documented |
| 56 | +and provides a natural place to put *all* the documentation for stateful DoFns, |
| 57 | +instead of incredibly burdening the standard `ParDo` function. |
| 58 | + |
| 59 | +This `StatefulParDo` will use the same mechanism that `GroupByKey` requires: |
| 60 | +A DoFn that has a KV generic. This way, when folks are writing a pipeline, the |
| 61 | +Go Compiler's type checker will let them know of the problem. |
| 62 | + |
| 63 | +I'll still need to detect stateful DoFns, since those features will still be |
| 64 | +modeled onto Exported fields and field types. That should be fine though. |
| 65 | +The two methods also enable a cleaner split for documenting the incompatibility |
| 66 | +of SplittableDoFns and Stateful DoFns. Sadly, no good way to represten this |
| 67 | +constraint as a type. Fields don't meaningfully participate in the Go type |
| 68 | +system. |
| 69 | + |
| 70 | +We need this because b, a SplittableDoFn can't have State. |
| 71 | +It becomes very difficult to reason about that way due to other transforms that |
| 72 | +happen to SplittableDoFns. Afterall, the key can change while splitting elements. |
| 73 | + |
| 74 | +Sometimes, you just have to have two things that do jobs well separately, instead |
| 75 | +of trying to have a single approach for everything. |
| 76 | + |
| 77 | +-------- |
| 78 | + |
| 79 | +Writing the post isn't the same as actually finishing the work, but sometimes |
| 80 | +the best thing to do, is to split them out into separate posts and work. But |
| 81 | +the work does take longer. |
| 82 | + |
| 83 | +Just gotta do it one bit at a time. |
0 commit comments