Skip to content

Conversation

@anthony-murphy
Copy link
Contributor

This pull request introduces granular checkpointing and rollback capabilities to the container runtime's staging mode, allowing developers to mark, track, and selectively revert sets of staged changes. It also adds comprehensive tests and updates the staging mode interface and stress test infrastructure to support these new features.

New checkpointing and rollback functionality in staging mode:

  • Added checkpointing to StageControlsAlpha and its implementations, allowing users to create, count, and roll back to specific checkpoints within a staging session. Checkpoints are managed as a stack, enabling LIFO rollback of staged changes. Empty and duplicate checkpoints are avoided, and all are cleared when staging mode is exited.
  • Implemented the checkpoint stack and rollback logic in ContainerRuntime, including flushing outbox messages to the PendingStateManager and rolling back only the changes made after the most recent checkpoint.
  • Added popStagedMessagesUpToCount to PendingStateManager to efficiently revert only the staged messages added since a checkpoint.

Testing and validation:

  • Added a comprehensive suite of unit tests for checkpoint creation, rollback behavior, stack management, and edge cases in containerRuntime.spec.ts. These tests ensure correct checkpointing, rollback order, stack size, and robustness to no-ops or empty checkpoints.

Stress test infrastructure updates:

  • Extended the stress test model and data objects to support checkpoint creation and rollback operations, including random operation generation and state tracking for checkpoints. This ensures the new features are exercised under stress scenarios.

API and type updates:

  • Updated the StageControlsAlpha and related types with new methods and detailed documentation for checkpointing and rollback, clarifying their intended use and behavior.

Copilot AI review requested due to automatic review settings November 5, 2025 20:02
@github-actions github-actions bot added area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc public api change Changes to a public API base: main PRs targeted against main branch labels Nov 5, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements checkpoint functionality within staging mode, enabling granular rollback control for staged changes. Checkpoints allow users to mark specific points in their staged changes and selectively roll back to the most recent checkpoint using a stack-based (LIFO) approach.

  • Adds checkpoint(), rollbackCheckpoint(), and checkpointCount to the StageControlsAlpha interface
  • Implements checkpoint management using a message count-based tracking system
  • Updates orderSequentially to use the new checkpoint-based rollback instead of the previous IBatchCheckpoint approach

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/runtime/runtime-definitions/src/stagingMode.ts Adds comprehensive API documentation for checkpoint methods on StageControlsAlpha interface
packages/runtime/runtime-definitions/api-report/runtime-definitions.legacy.alpha.api.md Updates API surface to include new checkpoint properties and methods
packages/runtime/container-runtime/src/containerRuntime.ts Implements checkpoint/rollback logic using a stack-based message count tracking system
packages/runtime/container-runtime/src/pendingStateManager.ts Adds popStagedMessagesUpToCount method to support partial rollback of staged messages
packages/runtime/container-runtime/src/test/containerRuntime.spec.ts Adds comprehensive unit tests for checkpoint creation, tracking, and rollback scenarios
packages/test/local-server-tests/src/test/stagingMode.spec.ts Adds integration tests covering checkpoint usage with DDS operations, remote changes, and disconnection scenarios
packages/test/local-server-stress-tests/src/stressDataObject.ts Adds checkpoint operation interfaces and methods to support stress testing
packages/test/local-server-stress-tests/src/baseModel.ts Integrates checkpoint operations into the stress test model with appropriate preconditions

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

🔗 Found some broken links! 💔

Run a link check locally to find them. See
https://github.com/microsoft/FluidFramework/wiki/Checking-for-broken-links-in-the-documentation for more information.

linkcheck output


> [email protected] ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> [email protected] serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> [email protected] check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

 ELIFECYCLE  Command failed with exit code 1.

@anthony-murphy anthony-murphy marked this pull request as draft November 7, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch public api change Changes to a public API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant