Skip to content

Build on Durable Promises? #48

@mitsuhiko

Description

@mitsuhiko

Durable promises keep sticking in my mind, and I'm increasingly convinced that a large part of what absurd needs could be modeled around this idea. The basic thought: checkpoints, events, and tasks all collapse into a single primitive (a durable promise) that lives in Postgres and carries the minimal state required for reliable, recoverable execution.

In this model, each durable promise has a unique ID, a type, a parent, and a payload describing its pending or completed resolution. The key difference from a typical JavaScript-style promise is that a durable promise must always be tied to the computation that will resolve it. It's never just "floating" and waiting indefinitely for someone to poke it. Instead, a promise is created together with the worker logic that can resolve it, and it inherits structured-concurrency semantics from its parent: cancellation, timeout propagation, ownership, and lifecycle boundaries.

Conceptually, the whole system becomes a durable task tree stored in Postgres. Each row is one promise. Each promise declares:

  • who created it (parent)
  • what should run when it’s picked up (its resolver)
  • whether it’s waiting, running, done, failed, or retrying
  • what it depends on (optional dependency edges)

Workers continuously scan for runnable promises. When they pick one up, they execute its resolver code and update the row to reflect success, failure, or a retry. If the worker crashes, Postgres still contains the entire tree of promises and their last known state, so another worker can take over without loss of progress. Checkpoints become trivial: a checkpoint is just a child promise that records partial progress. Events are the same, except they're resolved by external input instead of code (this one conflicts with the structured concurrency part though).

Dependencies are also durable: if a parent needs to wait for a child, it doesn't block a thread but it simply marks the dependency and returns to the queue. When the child resolves, the parent is rescheduled. This gives durable awaits, durable chaining, and durable retries with no in-memory state.

Now the somewhat unclear part: "What happens when a worker runs a task promise, creates checkpoint promises (steps), and then crashes? Do checkpoint promises retry?"

One solution might be that we define one rule:

  • Every promise must be schedulable independently, even if the parent "intends" to run the child inline.

In practice, that means:

  • A parent worker may optimistically resolve child checkpoint promises inline, without handing them to the scheduler.
  • The child promises must still exist in Postgres as schedulable units with a declared resolver.
  • If the worker crashes before finishing the parent, Postgres sees unresolved child promises and can retry them independently.

When the parent promise retries, it simply checks which children are already complete and continues from the last resolved checkpoint.

This would mean that the scheduler remains the ultimate authority and inline execution for steps/checkpoints is just an optimization, not a semantic requirement.

As a result, a task with checkpoints behaves like a durable reducer: replay checkpoints until the latest known good point, then continue the task.

This also aligns perfectly with absurd's goal of staying extremely simple in the data model: one primitive, one table, one scheduling rule.

Would this work? I don't know yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions