Async state serialisation #1207

josephjclark · 2026-01-07T10:39:53Z

Short Description

Use an async streaming function to serialize state objects at the end of each step in the runtime

Implementation Details

We have seen a problem where huge - I mean seriously big - state objects can cause the worker to get OOMKilled by kubernetes.

This is hard to reproduce but I'm sure it's just the blocking JSON.stringify/parse calls we use on state objects at the end of each step.

The solution here uses a non blocking algorithm. It'll probably be slower but it means it'll yield, which allows gc and alloc to run, which means the worker thread should OOMKill itself before the supervisor process steps in.

Note that the serializer will now throw if an object is particularly large. Technically the state object limit at the end of each step should match the dataclip payload limit allowed to the run. I am concerned that right now, some workflows create large state objects in the middle of the workflow, and tidy up on the last step. So for a 10mb limit, maybe some middle step creates a 20mb state object. And it all works fine because that large state object never leaves the worker. But if we start strictly enforcing that limit, those workflows will fail.

So for now, I've set that limit crazily high, to 1gb. The idea is that any massive state objects will cause an OOM fail, rather than a runtime error, so it's a bit academic.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

josephjclark · 2026-01-07T11:29:40Z

Ah that failing integration test is quite damning.

It's failing to run this job:

each($.ids,
      get(\`https://jsonplaceholder.typicode.com/todos/\${$.data}\`).then(
      (s) => {
        s.results.push(s.data);
        return s;
      }
    )
  )

Because the resulting state looks like this:

[
  { userId: 1, id: 1, title: 'delectus aut autem', completed: false },
  {
    userId: 1,
    id: 2,
    title: 'quis ut nam facilis et officia qui',
    completed: false
  },
  '[Circular]'
]

For some reason, json-stream-stringify thinks that the third json object is a circular reference, and has redacted it. This is clearly wrong and actually pretty concerning

The test passes if I disable circular structure detection (which is probably why we've not seen this in the engine's events processing). Looks like a bug in json-stream-stringify to me. One which I can't get past.

josephjclark added 2 commits January 7, 2026 10:32

runtime: implement an async stringify

1fd2790

runtime changeset

10accda

taylordowns2000 added this to v2 Jan 7, 2026

github-project-automation bot moved this to New Issues in v2 Jan 7, 2026

use a reviver to tidy circular refs

601eec9

josephjclark closed this Jan 7, 2026

github-project-automation bot moved this from New Issues to Done in v2 Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Async state serialisation #1207

Async state serialisation #1207

Uh oh!

josephjclark commented Jan 7, 2026

Uh oh!

josephjclark commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Async state serialisation #1207

Async state serialisation #1207

Uh oh!

Conversation

josephjclark commented Jan 7, 2026

Short Description

Implementation Details

AI Usage

Uh oh!

josephjclark commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

josephjclark commented Jan 7, 2026 •

edited

Loading