feat: Added realtime chunking to smolvla #2281
Open
+731
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Real-Time Action Chunking (RTC) for SmolVLA Policies
Overview
This PR introduces a new Real-Time Action Chunking (RTC) module for SmolVLA policies, as described in the paper "Real-Time Execution of Action Chunking Flow Policies". RTC is a training-free, inference-time wrapper that enables flow-matching VLA policies to execute actions with minimal latency and seamless temporal consistency across chunk boundaries.
Motivation
Traditional action-chunking policies suffer from:
RTC addresses these issues by:
Key Features
Files Added/Changed
Architecture and alternatives.
RTC is an approach very specific to flow matching diffusion models. It lives on top of the model, guiding it at inference time. In the PR here,
RTCSmolVLAinherits fromSmolVLAPolicyand extends it to support RTC by overriding theselect_actionmethod.The flow model
RTCVLAFlowMatchinginherits fromVLAFlowMatchingthatSmolVLAPolicyand only extends its methods to support queue management logic required by RTC.In this architecture, the RTC implementation is coupled with the SmolVLAPolicy, and would require some refactoring to enable RTC on other models like Pi0 and Pi05. An alternative approach is presented in the MR here. This alternative approach uses dependency injection and composition or operate on the flow and inference models, as long as they have the same calls.
I would love to have this merged here to enable RTC for SmolVLA since the changes are minimal. The alternative architecture is more code and requires a broader discussion.
References