feat: Added realtime chunking to smolvla #2281

Basiljamal1 · 2025-10-21T15:51:56Z

Add Real-Time Action Chunking (RTC) for SmolVLA Policies

Overview

This PR introduces a new Real-Time Action Chunking (RTC) module for SmolVLA policies, as described in the paper "Real-Time Execution of Action Chunking Flow Policies". RTC is a training-free, inference-time wrapper that enables flow-matching VLA policies to execute actions with minimal latency and seamless temporal consistency across chunk boundaries.

Motivation

Traditional action-chunking policies suffer from:

Inference Latency: Delays between chunk requests can cause jerky motion or discontinuities.
Temporal Inconsistency: Chunk boundaries may introduce non-smooth transitions.

RTC addresses these issues by:

Asynchronously generating the next chunk while executing the current one.
Using guided inpainting to ensure the new chunk matches already-committed actions.
Applying a soft, differentiable mask for smooth transitions.

Key Features

Asynchronous Inference: Decouples action execution from chunk generation to minimize latency.
Guided Inpainting: Ensures continuity by constraining the next chunk to match the executed portion of the current chunk.
Soft Masking: Smoothly transitions guidance across the action horizon using a differentiable mask.
Efficient Implementation: Leverages PyTorch's VJP for guided denoising, following the algorithm in the referenced paper.
Worked Example: README includes a detailed example and technical details for clarity.

Files Added/Changed

No files were removed. The RTC for SmolVLA lives in a separate directory under policies/

Architecture and alternatives.

RTC is an approach very specific to flow matching diffusion models. It lives on top of the model, guiding it at inference time. In the PR here, RTCSmolVLA inherits from SmolVLAPolicy and extends it to support RTC by overriding the select_action method.

The flow model RTCVLAFlowMatching inherits from VLAFlowMatching that SmolVLAPolicy and only extends its methods to support queue management logic required by RTC.

In this architecture, the RTC implementation is coupled with the SmolVLAPolicy, and would require some refactoring to enable RTC on other models like Pi0 and Pi05. An alternative approach is presented in the MR here. This alternative approach uses dependency injection and composition or operate on the flow and inference models, as long as they have the same calls.

I would love to have this merged here to enable RTC for SmolVLA since the changes are minimal. The alternative architecture is more code and requires a broader discussion.

References

Real-Time Execution of Action Chunking Flow Policies (arXiv:2506.07339)

Basiljamal1 added 2 commits October 21, 2025 11:37

feat: Added realtime chunking to smolvla via inheritance

59c11d3

fix: Fixed naming of the flow model

47110bc

Basiljamal1 marked this pull request as draft October 21, 2025 15:53

fix: updated RTVSmolVLAConfig naming

7f96e85

Basiljamal1 marked this pull request as ready for review October 21, 2025 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Added realtime chunking to smolvla #2281

feat: Added realtime chunking to smolvla #2281

Basiljamal1 commented Oct 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Added realtime chunking to smolvla #2281

Are you sure you want to change the base?

feat: Added realtime chunking to smolvla #2281

Conversation

Basiljamal1 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Real-Time Action Chunking (RTC) for SmolVLA Policies

Overview

Motivation

Key Features

Files Added/Changed

Architecture and alternatives.

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Basiljamal1 commented Oct 21, 2025 •

edited

Loading