Skip to content

Conversation

@Basiljamal1
Copy link

@Basiljamal1 Basiljamal1 commented Oct 21, 2025

Add Real-Time Action Chunking (RTC) for SmolVLA Policies

Overview

This PR introduces a new Real-Time Action Chunking (RTC) module for SmolVLA policies, as described in the paper "Real-Time Execution of Action Chunking Flow Policies". RTC is a training-free, inference-time wrapper that enables flow-matching VLA policies to execute actions with minimal latency and seamless temporal consistency across chunk boundaries.

Motivation

Traditional action-chunking policies suffer from:

  • Inference Latency: Delays between chunk requests can cause jerky motion or discontinuities.
  • Temporal Inconsistency: Chunk boundaries may introduce non-smooth transitions.

RTC addresses these issues by:

  • Asynchronously generating the next chunk while executing the current one.
  • Using guided inpainting to ensure the new chunk matches already-committed actions.
  • Applying a soft, differentiable mask for smooth transitions.

Key Features

  • Asynchronous Inference: Decouples action execution from chunk generation to minimize latency.
  • Guided Inpainting: Ensures continuity by constraining the next chunk to match the executed portion of the current chunk.
  • Soft Masking: Smoothly transitions guidance across the action horizon using a differentiable mask.
  • Efficient Implementation: Leverages PyTorch's VJP for guided denoising, following the algorithm in the referenced paper.
  • Worked Example: README includes a detailed example and technical details for clarity.

Files Added/Changed

  • No files were removed. The RTC for SmolVLA lives in a separate directory under policies/

Architecture and alternatives.

RTC is an approach very specific to flow matching diffusion models. It lives on top of the model, guiding it at inference time. In the PR here, RTCSmolVLA inherits from SmolVLAPolicy and extends it to support RTC by overriding the select_action method.

The flow model RTCVLAFlowMatching inherits from VLAFlowMatching that SmolVLAPolicy and only extends its methods to support queue management logic required by RTC.

In this architecture, the RTC implementation is coupled with the SmolVLAPolicy, and would require some refactoring to enable RTC on other models like Pi0 and Pi05. An alternative approach is presented in the MR here. This alternative approach uses dependency injection and composition or operate on the flow and inference models, as long as they have the same calls.

I would love to have this merged here to enable RTC for SmolVLA since the changes are minimal. The alternative architecture is more code and requires a broader discussion.

References

@Basiljamal1 Basiljamal1 marked this pull request as draft October 21, 2025 15:53
@Basiljamal1 Basiljamal1 marked this pull request as ready for review October 21, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant