Skip to content

Robotics research demonstrating reliability and robustness in the real world (continuously updated)

License

Notifications You must be signed in to change notification settings

philfung/awesome-reliable-robotics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Reliable Robotics 🤖

Contributions Contributors Last Commit License: MIT

A curated collection of robotics papers focused on real-world reliability and robustness. Originally a personal reference, I'm sharing this list in hopes it helps others.

Prerequisite: must include real-world results.

Contributions are welcome!


Name Date Categories Real World Success Rate Code Paper Project Organization(s) Notes
Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models 02/2026 Representation Learning Average zero-shot success with Picking, Opening, and Closing tasks across 4 robot arms shown below.
CAP results
Code Paper Project NYU, UC Berkeley, UCLA, Hello Robot, Ai2, University of Waterloo Replaces language conditioning with physical contact points. Uses VQ-BeT architecture with contact anchors. Trained on handheld gripper data, generalizes zero-shot to multiple robot embodiments.
TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation 02/2026 Online RL, VLA, Digital Twin 100% success rate across four tasks (Pick-and-Place, Insert-Hexagon-Block, Insert-Triple-Column-Block, Erase-Whiteboard). Converges in ~20 minutes with 30% speedup over prior methods (ConRFT: 77.2%, HiL-SERL: 71.25%). Code(unofficial by Yurong Jiang) Paper Project Peking University, Simplexity Robotics, Tsinghua University, HKUST Digital twin–real-world collaborative RL framework that expands exploration space via synthetic trajectories and uses sim-to-real guided exploration to accelerate online RL for VLA models.
LingBot-VA: Causal World Modeling for Robot Control 01/2026 World Models Real-world Success Rate (SR) / Progress Score (PS): Make Breakfast 75% SR/97% PS, Pick Screws 70% SR/82.5% PS, Fold Clothes 35% SR/48.8% PS, Unpack Delivery 65% SR/84.5% PS, Insert Tubes 40% SR/85.8% PS, Fold Pants 70% SR/76.7% PS. Achieves >20% improvement over π0.5 on challenging tasks with only 50 demos. Code Paper Project Ant Group/Alibaba A powerful autoregressive diffusion framework (5.3B params) that predicts both video movement and actions together via Mixture-of-Transformers architecture. It efficiently plans ahead, learns quickly from data, and handles new situations well. Great at complex, long-term tasks.
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning 01/2026 World Models 93.6% average success on challenging real-world ALOHA bimanual manipulation tasks. With model-based planning, achieves 12.5% higher task completion rate on challenging real-world tasks. Code Paper Project NVIDIA, Stanford University Single-stage fine-tuning of pretrained video model (Cosmos-Predict2-2B) that generates robot actions, future states, and values as latent frames. Uses model-based planning with best-of-N sampling to achieve higher success rates. Can learn from policy rollout data to refine world model and value function.
Does learning from experience benefit small AI robotics models? 12/2025 Imitation Learning 4/5 when training simple ACT on imitation + corrections only. Article Replicating the RL loop behind Physical Intelligence's Pi*0.6 foundation model without VLAs or diffusion.
Ï€*0.6 : a VLA That Learns From Experience 11/2025 VLA The system ran for 13 hours straight making espresso drinks and over two hours folding novel laundry items without interruptions. Success Rates: Laundry (t-shirts & shorts) ~95%, Laundry (Diverse Hardest Items) ~70%, Make Espresso ~90%, Box Assembly ~90% Paper Project Physical Intelligence RECAP is an iterated offline RL framework that improves a Vision-Language-Action (VLA) model ($\pi^* 0.6$) by conditioning it on advantage estimates derived from a value function, allowing the model to learn and self-correct from real-world data like demonstrations, autonomous experience, and human interventions.
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning 10/2025 Online RL 100% success across 7 tasks. 92.5% average zero-shot success on 3 tasks (without any retraining or fine-tuning), 86.7% average few-shot success on 3 tasks

Code(unofficial by Yanjie Ze) Paper Project Shanghai Qizi, Shanghai Jiao Tong, HKU, UNC Chapel Hill code to be released "after paper is accepted"
APO: Human-assisted Robotic Policy Refinement via Action Preference Optimization 10/2025 Human-in-the-loop Improvement on success rates of Dagger, TPO, etc on in-distribution, as well as when position, background, or texture are disrupted.
Code Paper Project ByteDance
HI-ORS: Human-in-the-loop Online Rejection Sampling for Robotic Manipulation 10/2025 Human-in-the-loop Improved RW Success Rates vs vanilla BC, HIL-SERL, Q-Chunking. Code Paper Project TenCent
ARMADA/FLOAT: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation 10/2025 failure detector FLOAT achieves nearly 95% accuracy on average, surpassing prior SOTA failure detection approaches by > 20%. Code Paper Project Shanghai Jiao Tong University
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation 09/2025 Rewards 83% success on folding T-shirts (flattened), 67% success on folding T-shirts (crumpled). Surpasses vanilla BC (8% and 0%). Code (lerobot) Paper Project Stanford, UC Berkeley, xdof.ai Video-based reward modeling framework that jointly predicts high-level task stages and fine-grained progress.
Dual-Actor Fine-Tuning of VLA Models: A Talk-and-Tweak Human-in-the-Loop Approach 09/2025 VLA 100% success across three tasks within 101 minutes of online fine-tuning. For long-horizon tasks, it sustains a 50% success rate over 12 consecutive operations.
WSRL
Paper Project Zhejiang & others no code : (
WSRL: Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data 07/2025 Online RL 100% success rate on Franka peg insertion task in 18 minutes, SERL fails (0/20) even with 50 minutes.
WSRL
Code Paper Project UC Berkeley Overall idea: No data retention during fine-tuning, warmup phase with small rollouts from pre-trained policy. Unfortunately, only 1 real world experiment, all others in sim.
Dyna Robotics (Unknown Model) 07/2025 99.9% success rate in folding towels for 8 hours/day over 3 days (dropped 1 towel on day 2). No intervention. Project Dyna Robotics
Figure (Helix) 06/2025 ~95% accuracy at correctly orienting barcodes. 4.05 seconds per package. Project Figure Adds memory for more robust, long-term tasks and force feedback for improved grip.
RSS 2025 Workshop: Human-in-the-Loop Robot Learning: Teaching, Correcting, and Adapting 06/2025 various results Project various universities
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections 06/2025 Human-in-the-loop book-flipping success rate of 100% (60% improvement) and belt assembly success of 70% (50% improvement)
Code Paper Project Stanford
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations 05/2025 Rewards an hour of real-world RL improves success rate from 12% to 68%, vs 8% to 10% with VLC
WSRL
Code Paper Project U Wash
Dyna Robotics DYNA-1 Model 04/2025 99.4% success rate in folding napkins over 24 hours. No intervention.                                                                                                        Project Dyna Robotics
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy 02/2025 VLA 96.3% avg success rate across tasks, compared to 31.9% w/ HIL-SERL ConRFT Code Paper Chinese Academy of Sciences Online and offline fine-tuning.
HIL-SERL: Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning 10/2024 Online RL 100% success rate on a variety of tasks HIL-SERL Code(official, (sim, lerobot) Paper Project UC Berkeley Online fine-tuning, human intervention allowed. Implementation available in LeRobot.
RLIF: INTERACTIVE IMITATION LEARNING AS REINFORCEMENT LEARNING 03/2024 Imitation Learning 95% success rate in cloth unfolding within 7 rounds, 100% rate success in peg insertion within 6 rounds RLIF Code Paper Project UC Berkeley
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning 01/2024 Online RL 100% success on PCB insertion, cable routing, object relocation Code Paper Project UC Berkeley
</task_progress>
</write_to_file>

About

Robotics research demonstrating reliability and robustness in the real world (continuously updated)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors