Skip to content

feat: add rfc for checkpoints#513

Open
vitalii-dynamiq wants to merge 1 commit intomainfrom
add-checkpoints
Open

feat: add rfc for checkpoints#513
vitalii-dynamiq wants to merge 1 commit intomainfrom
add-checkpoints

Conversation

@vitalii-dynamiq
Copy link
Contributor

@vitalii-dynamiq vitalii-dynamiq commented Jan 7, 2026

Note

Adds a complete RFC suite for opt-in checkpoint/resume across Dynamiq workflows with strong backward compatibility and HITL support.

  • New top-level docs/RFC-001-CHECKPOINT-RESUME.md with overview, alternatives, and timeline
  • Sub-docs in docs/rfc-001-checkpoint-resume/ covering: industry research, runtime integration, node analysis, data models, storage backends (File/SQLite/Redis/PostgreSQL), flow integration, testing/migration, and UI/chat integration
  • Defines Pydantic-based checkpoint models (FlowCheckpoint, NodeCheckpointState, CheckpointConfig) and statuses, plus protocol (CheckpointableNode) and mixin
  • Specifies backend interfaces and implementations, API endpoints for resume/list, streaming events, and HITL PENDING_INPUT behavior
  • Outlines performance targets, risk/edge cases, success metrics, and phased rollout

Written by Cursor Bugbot for commit d912a41. This will update automatically on new commits. Configure here.

@vitalii-dynamiq vitalii-dynamiq requested a review from a team as a code owner January 7, 2026 12:28
cutoff = datetime.utcnow()
cutoff = cutoff.replace(
day=cutoff.day - older_than_days
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect date arithmetic causes ValueError in cleanup

Medium Severity

The SQLite backend's cleanup method uses cutoff.replace(day=cutoff.day - older_than_days) to calculate a date cutoff, but datetime.replace() expects a valid day value (1-31). If the current day minus older_than_days results in zero or negative (e.g., January 5th with older_than_days=10 yields -5), Python raises ValueError: day is out of range for month. The correct approach is to use datetime.utcnow() - timedelta(days=older_than_days) for date arithmetic.

Fix in Cursor Fix in Web

"storage_id": file_id,
}
else:
result["files"][fname] = value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong variable assigned in file serialization loop

Medium Severity

In _serialize_tool_output, when iterating over value.items() to get fname, fdata pairs, the else branch incorrectly assigns value (the entire files dictionary) instead of fdata (the individual file's data). This would cause each file entry to contain the entire files dictionary rather than its own content, corrupting the serialized output.

Fix in Cursor Fix in Web


# === CRITICAL: Set resume loop ===
# Agent._run_agent will start from this loop instead of 1
self._resume_from_loop = state.get("current_loop", 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent resume starts from completed loop instead of next

Medium Severity

The RFC example code has an off-by-one error in agent loop resume logic. _resume_from_loop is set to state.get("current_loop", 0), but checkpoints are saved AFTER each loop iteration completes. When resuming, range(start_loop, max_loops + 1) re-executes the already-completed loop. The conversation history already contains that loop's messages (restored from checkpoint), so this would cause duplicate LLM calls and potential message duplication. The fix is state.get("current_loop", 0) + 1 to skip the completed loop.

Additional Locations (1)

Fix in Cursor Fix in Web

except (TypeError, ValueError):
# Large or non-serializable results: store truncated
if isinstance(value, str) and len(value) > 10000:
serialized[cache_key] = value[:10000] + "...[truncated]"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-serializable tool cache entries silently dropped during checkpoint

Low Severity

In _serialize_tool_cache, when a tool result value is not JSON-serializable and is not a large string, the entry is silently dropped with no fallback or warning. On resume, these missing cache entries would cause the corresponding tools to be re-executed unnecessarily, contradicting the RFC's goal of using the tool cache to "skip re-executing identical tool calls on resume." Non-serializable results (e.g., custom objects) are simply not added to serialized.

Fix in Cursor Fix in Web

@github-actions
Copy link

github-actions bot commented Jan 14, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL22217707668% 
report-only-changed-files is enabled. No files were changed during this commit :)

Tests Skipped Failures Errors Time
1118 34 💤 0 ❌ 0 🔥 8m 27s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant