ARCap/index.html at gh-pages · Stanford-TML/ARCap · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript">google.load("jquery", "1.3.2");</script>
<link href="https://fonts.googleapis.com/css2?family=Open+Sans&display=swap"
      rel="stylesheet">
<link rel="stylesheet" type="text/css" href="./resources/style.css" media="screen"/>

<html lang="en">
<head>
	<title>ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback</title>
    <!-- Facebook automatically scrapes this. Go to https://developers.facebook.com/tools/debug/
        if you update and want to force Facebook to re-scrape. -->
	<meta property="og:image" content="./resources/teaser.png"/>
	<meta property="og:title" content="ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback" />
	<meta property="og:description" content="Paper description." />
    <meta name="viewport" content="width=device-width, initial-scale=1.0">

    <!-- Add your Google Analytics tag here -->
    <!--
    <script async
            src="https://www.googletagmanager.com/gtag/js?id=UA-97476543-1"></script>
    <script>
        window.dataLayer = window.dataLayer || [];
        function gtag() {
            dataLayer.push(arguments);
        }
        gtag('js', new Date());
        gtag('config', 'UA-97476543-1');
    </script>
    -->

</head>

<body>
<div class="container">
    <div class="title">
        ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback
    </div>

    <br>
    <br>

    <div class="author">
        <a href="https://ericcsr.github.io/">Sirui Chen*</a><sup>1</sup>
    </div>
    <div class="author">
        <a href="https://www.chenwangjeremy.net/">Chen Wang*</a><sup>1</sup>
    </div>
    <div class="author">
        <a href="">Kaden Nguyen</a><sup>1</sup>
    </div>
    <div class="author">
        <a href="https://scholar.google.com/citations?user=rDfyQnIAAAAJ&hl=en">Li Fei-Fei</a><sup>1</sup>
    </div>
    <div class="author">
        <a href="https://tml.stanford.edu/people/karen-liu">C. Karen Liu</a><sup>1</sup>
    </div>
    <br>
    <br>

    <div class="affiliation"><sup>1&nbsp;</sup>Stanford University</div>

    <p>
        <i>
            * Equal contribution
        </i>
    </p>

    <br>
    <br>

    <div class="links"><a href="https://arxiv.org/abs/2410.08464">[Paper]</a></div>
    <div class="links"><a href="https://github.com/Ericcsr/ARCap.git">[Code]</a></div>
    <div class="links"><a href="https://huggingface.co/Ericcsr/ARCap_DP">[Model]</a></div>
    <div class="links"><a href="https://huggingface.co/datasets/Ericcsr/ARCap">[Data]</a></div>

    <br>
    <br>
    <div class="video-container">
        <iframe src="./resources/teaser.mp4" frameBorder="0"
                allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
                allowfullscreen></iframe>
    </div>

    <br><br>
    <hr>

    <h1>Abstract</h1>
    <p>
        Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products.
    </p>

    <br><br><hr>

    <h1>AR Feedbacks</h1>

    <div class="teaser">
        <img src="./resources/feedback.gif" alt="Feedback"/>
        <br>
        <i>
            ARCap send visual haptic feedback when virtual robot violate constraints.
        </i>
    </div>

    <br><br><hr>

    <h1>Portable System Design</h1>

    <div class="teaser">
        <img src="./resources/system.png" alt="System"/>
        <br>
        <i>
            ARCap is portable and can be carried inside one backpack.
        </i>
    </div>

    <br><br><hr>

    <h1>Test-time Calibration</h1>

    <div class="teaser">
        <img src="./resources/calibration.gif" alt="calibration"/>
        <br>
        <i>
            With ARCap Unity App, hand eye calibration during test time become aligning virtual robot with the actual robot.
        </i>
    </div>

    <br><br><hr>

    <h1>Diffusion Policy trained from collected data</h1>
    <h2>Manipulation in cluttered scene</h2>
    <div class="teaser">
        <img src="./resources/tennis.gif" alt="tennis"/>
        <br>
        <i>
            Collected using 30 minutes data using ARCap, without data collected from teleoperation.
        </i>
    </div>
    <h2>Long-horizon manipulation with different embodiment</h2>
    <div class="teaser">
        <img src="./resources/lego_new.gif" alt="lego"/>
        <br>
        <i>
            Collected using 60 minutes data using ARCap, without data collected from teleoperation.
        </i>
    </div>
    <h2>Cross embodiment bimanual manipulation</h2>
    <div class="teaser">
        <img src="./resources/bimanual.gif" alt="lego"/>
        <br>
        <i>
            Collected using 60 minutes data using ARCap, without data collected from teleoperation.
        </i>
    </div>
    <br><br><hr>

    <h1>User Study</h1>

    <div class="teaser">
        <img src="./resources/demographic.png" alt="UserStudy"/>
        <br>
        <i>
            We invite 20 users to collect data with ARCap and DexCap. They have different familiarity with robot learning and AR/VR.
        </i>
    </div>

    <div class="teaser">
        <img src="./resources/user_survey_result.png" alt="UserStudy2"/>
        <br>
        <i>
            Most user find ARCap visual haptic feedback helpful
        </i>
    </div>

    <div class="teaser">
        <img src="./resources/user_policy.gif" alt="UserStudy3"/>
        <br>
        <i>
            Combining data from 20 users, we can train autonomous policy
        </i>
    </div>

    <!-- <h1>Compliant grasp resist perturbation</h1>

    <div class="video-container">
        <iframe src="./resources/perturb.mp4" frameBorder="0"
                allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
                allowfullscreen></iframe>
    </div> -->

    <!-- <h1>Compared with force closure baseline</h1>

    <div class="video-container">
        <iframe src="./resources/compare.mp4" frameBorder="0"
                allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
                allowfullscreen></iframe>
    </div> -->

    <br><br><hr>

    <h1>Citation</h1>

    <div class="paper-info">
        <pre><code>@article{chen2024arcap,
            title={ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback},
            author={Chen, Sirui and Wang, Chen and Nguyen, Kaden and Fei-Fei, Li and Liu, C Karen},
            journal={arXiv preprint arXiv:2410.08464},
            year={2024}
          }</code></pre>
    </div>

    <br><br><hr><br>

    <h1>Acknowledgements</h1>
    <p>
        This template was originally made by <a href="http://web.mit.edu/phillipi/">Phillip Isola</a>
        and <a href="http://richzhang.github.io/">Richard Zhang</a> for a
        <a href="http://richzhang.github.io/colorization/">colorful</a> ECCV project. It was
        adapted to be mobile responsive by <a href="https://jasonyzhang.com/">Jason Zhang</a>
        for <a href="https://jasonyzhang.com/phosa/">PHOSA</a>. The code can be found
        <a href="https://github.com/jasonyzhang/webpage-template">here</a>.
    </p>

    <br><br>
</div>

</body>

</html>