feat: add rfc for checkpoints by vitalii-dynamiq · Pull Request #513 · dynamiq-ai/dynamiq

vitalii-dynamiq · 2026-01-07T12:28:46Z

Note

Adds a complete RFC suite for opt-in checkpoint/resume across Dynamiq workflows with strong backward compatibility and HITL support.

New top-level docs/RFC-001-CHECKPOINT-RESUME.md with overview, alternatives, and timeline
Sub-docs in docs/rfc-001-checkpoint-resume/ covering: industry research, runtime integration, node analysis, data models, storage backends (File/SQLite/Redis/PostgreSQL), flow integration, testing/migration, and UI/chat integration
Defines Pydantic-based checkpoint models (FlowCheckpoint, NodeCheckpointState, CheckpointConfig) and statuses, plus protocol (CheckpointableNode) and mixin
Specifies backend interfaces and implementations, API endpoints for resume/list, streaming events, and HITL PENDING_INPUT behavior
Outlines performance targets, risk/edge cases, success metrics, and phased rollout

^{Written by Cursor Bugbot for commit d912a41. This will update automatically on new commits. Configure here.}

cursor · 2026-01-07T12:34:24Z

docs/rfc-001-checkpoint-resume/06-STORAGE-BACKENDS.md

+                    cutoff = datetime.utcnow()
+                    cutoff = cutoff.replace(
+                        day=cutoff.day - older_than_days
+                    )


Incorrect date arithmetic causes ValueError in cleanup

Medium Severity

The SQLite backend's cleanup method uses cutoff.replace(day=cutoff.day - older_than_days) to calculate a date cutoff, but datetime.replace() expects a valid day value (1-31). If the current day minus older_than_days results in zero or negative (e.g., January 5th with older_than_days=10 yields -5), Python raises ValueError: day is out of range for month. The correct approach is to use datetime.utcnow() - timedelta(days=older_than_days) for date arithmetic.

cursor · 2026-01-07T12:34:24Z

docs/rfc-001-checkpoint-resume/04-NODE-ANALYSIS.md

+                            "storage_id": file_id,
+                        }
+                else:
+                    result["files"][fname] = value


Wrong variable assigned in file serialization loop

Medium Severity

In _serialize_tool_output, when iterating over value.items() to get fname, fdata pairs, the else branch incorrectly assigns value (the entire files dictionary) instead of fdata (the individual file's data). This would cause each file entry to contain the entire files dictionary rather than its own content, corrupting the serialized output.

cursor · 2026-01-14T09:47:06Z

docs/rfc-001-checkpoint-resume/04-NODE-ANALYSIS.md

+
+        # === CRITICAL: Set resume loop ===
+        # Agent._run_agent will start from this loop instead of 1
+        self._resume_from_loop = state.get("current_loop", 0)


Agent resume starts from completed loop instead of next

Medium Severity

The RFC example code has an off-by-one error in agent loop resume logic. _resume_from_loop is set to state.get("current_loop", 0), but checkpoints are saved AFTER each loop iteration completes. When resuming, range(start_loop, max_loops + 1) re-executes the already-completed loop. The conversation history already contains that loop's messages (restored from checkpoint), so this would cause duplicate LLM calls and potential message duplication. The fix is state.get("current_loop", 0) + 1 to skip the completed loop.

Additional Locations (1)

docs/rfc-001-checkpoint-resume/04-NODE-ANALYSIS.md#L362-L365

cursor · 2026-01-14T09:47:06Z

docs/rfc-001-checkpoint-resume/04-NODE-ANALYSIS.md

+            except (TypeError, ValueError):
+                # Large or non-serializable results: store truncated
+                if isinstance(value, str) and len(value) > 10000:
+                    serialized[cache_key] = value[:10000] + "...[truncated]"


Non-serializable tool cache entries silently dropped during checkpoint

Low Severity

In _serialize_tool_cache, when a tool result value is not JSON-serializable and is not a large string, the entry is silently dropped with no fallback or warning. On resume, these missing cache entries would cause the corresponding tools to be re-executed unnecessarily, contradicting the RFC's goal of using the tool cache to "skip re-executing identical tool calls on resume." Non-serializable results (e.g., custom objects) are simply not added to serialized.

docs/rfc-001-checkpoint-resume/04-NODE-ANALYSIS.md

github-actions · 2026-01-14T09:50:40Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	22217	7076	68%

report-only-changed-files is enabled. No files were changed during this commit :)

Tests	Skipped	Failures	Errors	Time
1118	34 💤	0 ❌	0 🔥	8m 27s ⏱️

vitalii-dynamiq requested a review from a team as a code owner January 7, 2026 12:28

cursor bot reviewed Jan 7, 2026

View reviewed changes

cursor bot reviewed Jan 14, 2026

View reviewed changes

feat: add rfc for checkpoints

d912a41

acoola force-pushed the add-checkpoints branch from ec64e12 to d912a41 Compare January 20, 2026 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add rfc for checkpoints#513

feat: add rfc for checkpoints#513
vitalii-dynamiq wants to merge 1 commit intomainfrom
add-checkpoints

vitalii-dynamiq commented Jan 7, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot Jan 7, 2026

Uh oh!

cursor bot Jan 7, 2026

Uh oh!

cursor bot Jan 14, 2026

Uh oh!

cursor bot Jan 14, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vitalii-dynamiq commented Jan 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Jan 7, 2026

Choose a reason for hiding this comment

Incorrect date arithmetic causes ValueError in cleanup

Uh oh!

cursor bot Jan 7, 2026

Choose a reason for hiding this comment

Wrong variable assigned in file serialization loop

Uh oh!

cursor bot Jan 14, 2026

Choose a reason for hiding this comment

Agent resume starts from completed loop instead of next

Uh oh!

cursor bot Jan 14, 2026

Choose a reason for hiding this comment

Non-serializable tool cache entries silently dropped during checkpoint

Uh oh!

Uh oh!

github-actions bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vitalii-dynamiq commented Jan 7, 2026 •

edited by cursor bot

Loading

github-actions bot commented Jan 14, 2026 •

edited

Loading