Skip to content

Commit 4ccf284

Browse files
authored
Add act documentation (huggingface#2139)
* Add act documentation * remove citation as we link the paper * simplify docs * fix pre commit
1 parent 9a49e57 commit 4ccf284

File tree

2 files changed

+94
-0
lines changed

2 files changed

+94
-0
lines changed

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@
2727
title: Porting Large Datasets
2828
title: "Datasets"
2929
- sections:
30+
- local: act
31+
title: ACT
3032
- local: smolvla
3133
title: SmolVLA
3234
- local: pi0

docs/source/act.mdx

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# ACT (Action Chunking with Transformers)
2+
3+
ACT is a **lightweight and efficient policy for imitation learning**, especially well-suited for fine-grained manipulation tasks. It's the **first model we recommend when you're starting out** with LeRobot due to its fast training time, low computational requirements, and strong performance.
4+
5+
<div class="video-container">
6+
<iframe
7+
width="100%"
8+
height="415"
9+
src="https://www.youtube.com/embed/ft73x0LfGpM"
10+
title="LeRobot ACT Tutorial"
11+
frameborder="0"
12+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
13+
allowfullscreen
14+
></iframe>
15+
</div>
16+
17+
_Watch this tutorial from the LeRobot team to learn how ACT works: [LeRobot ACT Tutorial](https://www.youtube.com/watch?v=ft73x0LfGpM)_
18+
19+
## Model Overview
20+
21+
Action Chunking with Transformers (ACT) was introduced in the paper [Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware](https://arxiv.org/abs/2304.13705) by Zhao et al. The policy was designed to enable precise, contact-rich manipulation tasks using affordable hardware and minimal demonstration data.
22+
23+
### Why ACT is Great for Beginners
24+
25+
ACT stands out as an excellent starting point for several reasons:
26+
27+
- **Fast Training**: Trains in a few hours on a single GPU
28+
- **Lightweight**: Only ~80M parameters, making it efficient and easy to work with
29+
- **Data Efficient**: Often achieves high success rates with just 50 demonstrations
30+
31+
### Architecture
32+
33+
ACT uses a transformer-based architecture with three main components:
34+
35+
1. **Vision Backbone**: ResNet-18 processes images from multiple camera viewpoints
36+
2. **Transformer Encoder**: Synthesizes information from camera features, joint positions, and a learned latent variable
37+
3. **Transformer Decoder**: Generates coherent action sequences using cross-attention
38+
39+
The policy takes as input:
40+
41+
- Multiple RGB images (e.g., from wrist cameras, front/top cameras)
42+
- Current robot joint positions
43+
- A latent style variable `z` (learned during training, set to zero during inference)
44+
45+
And outputs a chunk of `k` future action sequences.
46+
47+
## Installation Requirements
48+
49+
1. Install LeRobot by following our [Installation Guide](./installation).
50+
2. ACT is included in the base LeRobot installation, so no additional dependencies are needed!
51+
52+
## Training ACT
53+
54+
ACT works seamlessly with the standard LeRobot training pipeline. Here's a complete example for training ACT on your dataset:
55+
56+
```bash
57+
lerobot-train \
58+
--dataset.repo_id=${HF_USER}/your_dataset \
59+
--policy.type=act \
60+
--output_dir=outputs/train/act_your_dataset \
61+
--job_name=act_your_dataset \
62+
--policy.device=cuda \
63+
--wandb.enable=true \
64+
--policy.repo_id=${HF_USER}/act_policy
65+
```
66+
67+
### Training Tips
68+
69+
1. **Start with defaults**: ACT's default hyperparameters work well for most tasks
70+
2. **Training duration**: Expect a few hours for 100k training steps on a single GPU
71+
3. **Batch size**: Start with batch size 8 and adjust based on your GPU memory
72+
73+
### Train using Google Colab
74+
75+
If your local computer doesn't have a powerful GPU, you can utilize Google Colab to train your model by following the [ACT training notebook](./notebooks#training-act).
76+
77+
## Evaluating ACT
78+
79+
Once training is complete, you can evaluate your ACT policy using the `lerobot-record` command with your trained policy. This will run inference and record evaluation episodes:
80+
81+
```bash
82+
lerobot-record \
83+
--robot.type=so100_follower \
84+
--robot.port=/dev/ttyACM0 \
85+
--robot.id=my_robot \
86+
--robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
87+
--display_data=true \
88+
--dataset.repo_id=${HF_USER}/eval_act_your_dataset \
89+
--dataset.num_episodes=10 \
90+
--dataset.single_task="Your task description" \
91+
--policy.path=${HF_USER}/act_policy
92+
```

0 commit comments

Comments
 (0)