Skip to content

Commit 48fa51c

Browse files
committed
Split It out.
1 parent c482181 commit 48fa51c

File tree

1 file changed

+83
-0
lines changed

1 file changed

+83
-0
lines changed
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: "Split It Out"
3+
date: 2025-01-01T21:00:00-07:00
4+
tags:
5+
- generics
6+
- go
7+
- beam
8+
categories:
9+
- Dev
10+
- Talks
11+
- Hobby SDK
12+
---
13+
14+
Since [my last post](/2024/10/my-hobby-beam-go-sdk) I've been quite busy with
15+
travel and recovery from same, and thus I haven't poked around trying to work on
16+
my Hobby SDK.
17+
18+
Which also means, I haven't spoken about more widely either. Just after I
19+
re-built the site to use Github actions too, to make it easier to write more.
20+
Ah well. Still need to sort out a few CSS things though, since line wrapping is
21+
weird... I want to have the markdown look good, but not have that affect the
22+
rendered output beyond formating...
23+
24+
That's for a later post. This is about how to deal with Stateful DoFns!
25+
26+
-----
27+
28+
The next thing to do for the Hobby SDK is add State and Timers, and ultimately,
29+
stateful transforms. I was stuck on this for a little bit. Specifically how
30+
to add these to the graph construction mechanisms, while also avoiding writing
31+
too much code to replicate the type system.
32+
33+
The trick though is that Stateful DoFns, that is, those that make use State and
34+
Timers in any capacity, have a hard restriction: they must take in KVs as an
35+
input. This is how the Beam model enables stateful stream processing, without
36+
sacrificing parallelism.
37+
38+
The question then, is how do I have the hobby SDK enforce this?
39+
40+
You see, in most Beam SDKs, there is really only a single `ParDo` call as a part
41+
of graph construction, and it will accept `DoFns` of any kind. The simple light
42+
weight ones, to having Side Inputs, to Splittable DoFns, and of course,
43+
stateful DoFns. But then there's a later step that does the enforcement of these
44+
compatibility requirements.
45+
46+
And that's just a lot of code. Unlike in Java, Go Generics don't erase types,
47+
nor can I have overloaded implementations to selectively validate different
48+
constraints. It could be possible to make some way of "detecting" the KV, using
49+
reflection to enforce validating that the outer type is a KV, but that goes
50+
against the ethos of this Hobby SDK.
51+
52+
I want to have the Go Compiler do the enforcement, as much as possible.
53+
54+
This means that I'll end up with a separate `StatefulParDo` top level graph
55+
construction function. There's a benefit to this: It can be clearly documented
56+
and provides a natural place to put *all* the documentation for stateful DoFns,
57+
instead of incredibly burdening the standard `ParDo` function.
58+
59+
This `StatefulParDo` will use the same mechanism that `GroupByKey` requires:
60+
A DoFn that has a KV generic. This way, when folks are writing a pipeline, the
61+
Go Compiler's type checker will let them know of the problem.
62+
63+
I'll still need to detect stateful DoFns, since those features will still be
64+
modeled onto Exported fields and field types. That should be fine though.
65+
The two methods also enable a cleaner split for documenting the incompatibility
66+
of SplittableDoFns and Stateful DoFns. Sadly, no good way to represten this
67+
constraint as a type. Fields don't meaningfully participate in the Go type
68+
system.
69+
70+
We need this because b, a SplittableDoFn can't have State.
71+
It becomes very difficult to reason about that way due to other transforms that
72+
happen to SplittableDoFns. Afterall, the key can change while splitting elements.
73+
74+
Sometimes, you just have to have two things that do jobs well separately, instead
75+
of trying to have a single approach for everything.
76+
77+
--------
78+
79+
Writing the post isn't the same as actually finishing the work, but sometimes
80+
the best thing to do, is to split them out into separate posts and work. But
81+
the work does take longer.
82+
83+
Just gotta do it one bit at a time.

0 commit comments

Comments
 (0)