Skip to content

Commit d21d61e

Browse files
authored
Merge pull request #12 from TheotimeLH/main
Add GFP, quick page
2 parents 227ac17 + f106efd commit d21d61e

19 files changed

+3304
-0
lines changed

index.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
<li><a href="publications/consensus-to-pl"> Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control.</a></li>
2525
<li><a href="publications/raisim-revisited"> Reconciling RaiSim with the Maximum Dissipation Principle. (IEEE T-RO 2024)</a></li>
2626
<li><a href="publications/proxddp-tro-2025"> ProxDDP: Proximal Constrained Trajectory Optimization. (IEEE T-RO 2025)</a></li>
27+
<li><a href="publications/guided-flow-policy/"> Guided Flow Policy: Learning from High-Value Actions in Offline RL.</a></li>
2728
</ul>
2829
</body>
2930
</html>
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
<!DOCTYPE html>
2+
<html>
3+
4+
<head>
5+
<meta charset="utf-8">
6+
<meta name="description" content="Guided flow policy">
7+
<meta name="keywords"
8+
content="Reinforcement learning, Offline RL, Flow Matching, Diffusion, Behavior cloning, Behavior regularized actor critic, weighted BC">
9+
<meta name="viewport" content="width=device-width, initial-scale=1">
10+
<title>Guided Flow Policy</title>
11+
12+
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
13+
14+
<link rel="stylesheet" href="./static/css/bulma.min.css">
15+
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
16+
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
17+
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
18+
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
19+
<link rel="stylesheet" href="./static/css/index.css">
20+
<link rel="icon" href="./static/images/favicon.svg">
21+
22+
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
23+
<script defer src="./static/js/fontawesome.all.min.js"></script>
24+
<script src="./static/js/bulma-carousel.min.js"></script>
25+
<script src="./static/js/bulma-slider.min.js"></script>
26+
<script src="./static/js/index.js"></script>
27+
</head>
28+
29+
<body>
30+
31+
<nav class="navbar" role="navigation" aria-label="main navigation">
32+
<div class="navbar-brand">
33+
<a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
34+
<span aria-hidden="true"></span>
35+
<span aria-hidden="true"></span>
36+
<span aria-hidden="true"></span>
37+
</a>
38+
</div>
39+
<div class="navbar-menu">
40+
<div class="navbar-start" style="flex-grow: 1; justify-content: center;">
41+
<a class="navbar-item" href="https://simple-robotics.github.io">
42+
<span class="icon">
43+
<i class="fas fa-home"></i>
44+
</span>
45+
</a>
46+
47+
<div class="navbar-item has-dropdown is-hoverable">
48+
<a class="navbar-link">
49+
More Research
50+
</a>
51+
<div class="navbar-dropdown">
52+
<a class="navbar-item" href="https://simple-robotics.github.io/publications/simple-contact-solver/">
53+
Simple simulator
54+
</a>
55+
<a class="navbar-item" href="https://inria.hal.science/hal-05179357v1">
56+
Sobolev diffusion policy
57+
</a>
58+
</div>
59+
</div>
60+
</div>
61+
</div>
62+
</nav>
63+
64+
<section class="hero">
65+
<div class="hero-body">
66+
<div class="container is-max-desktop">
67+
<div class="columns is-centered">
68+
<div class="column has-text-centered">
69+
<h1 class="title is-1 publication-title">Guided Flow Policy: Learning from High-Value Actions in Offline RL
70+
</h1>
71+
<div class="is-size-5 publication-authors">
72+
<span class="author-block">
73+
<a href="https://www.linkedin.com/in/frankinguimatsia/">Franki Nguimatsia
74+
Tiofack</a><sup>1,*</sup>,</span>
75+
<span class="author-block">
76+
<a href="https://www.linkedin.com/in/theotime-le-hellard-a3a066249/">Théotime Le
77+
Hellard</a><sup>1,*</sup>,</span>
78+
<span class="author-block">
79+
<a href="https://fabinsch.github.io/">Fabian Schramm</a><sup>1,*</sup>,</span>
80+
</div>
81+
82+
<div class="is-size-5 publication-authors">
83+
<span class="author-block">
84+
<a href="https://www.isir.upmc.fr/personnel/perrin/?lang=en">Nicolas
85+
Perrin-Gilbert</a><sup>2</sup>,</span>
86+
<span class="author-block">
87+
<a href="https://jcarpent.github.io">Justin Carpentier</a><sup>1</sup></span>
88+
</div>
89+
90+
<div class="is-size-5 publication-authors">
91+
<span class="author-block"><sup>*</sup>Equal contribution.&emsp;&emsp;<sup>1</sup>Willow, Inria -
92+
ENS.&emsp;&emsp;
93+
<sup>2</sup></span>ISIR.
94+
</div>
95+
96+
<div class="column has-text-centered">
97+
<div class="publication-links">
98+
<!-- PDF Link. -->
99+
<span class="link-block">
100+
<a href="./static/paper/guided-flow-policy-2025.pdf"
101+
class="external-link button is-normal is-rounded is-dark">
102+
<span class="icon">
103+
<i class="fas fa-file-pdf"></i>
104+
</span>
105+
<span>Paper</span>
106+
</a>
107+
</span>
108+
<span class="link-block">
109+
<a href="https://github.com/Simple-Robotics/guided-flow-policy"
110+
class="external-link button is-normal is-rounded is-dark">
111+
<span class="icon">
112+
<i class="fab fa-github"></i>
113+
</span>
114+
<span>Code</span>
115+
</a>
116+
</span>
117+
</div>
118+
</div>
119+
</div>
120+
</div>
121+
</div>
122+
</div>
123+
</section>
124+
125+
<section class="section">
126+
<div class="container is-max-desktop">
127+
<!-- Abstract. -->
128+
<div class="columns is-centered has-text-centered">
129+
<div class="column is-four-fifths">
130+
<h2 class="title is-3">Abstract</h2>
131+
<div class="content has-text-justified">
132+
Offline reinforcement learning often relies on behavior regularization that enforces policies to remain
133+
close to the dataset distribution.
134+
However, such approaches fail to distinguish between high-value and low-value actions in their
135+
regularization components.
136+
We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled
137+
one-step actor.
138+
The actor directs the flow policy through weighted behavior cloning to focus on cloning high-value actions
139+
from the dataset rather than indiscriminately imitating all state-action pairs.
140+
In turn, the flow policy constrains the actor to remain aligned with the dataset's best transitions while
141+
maximizing the critic.
142+
This mutual guidance enables GFP to achieve state-of-the-art performance across 144 state and pixel-based
143+
tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and
144+
challenging tasks.
145+
</div>
146+
</div>
147+
</div>
148+
<!--/ Abstract. -->
149+
150+
<!-- Offline RL . -->
151+
<div class="columns is-centered has-text-centered">
152+
<div class="column is-max-desktop">
153+
<h2 class="title is-3">Offline Reinforcement Learning</h2>
154+
<div class="publication-image">
155+
<img src="./static/images/figure-gfp-related-works.png" alt="List of offline RL approaches"
156+
style="width:60%">
157+
</div>
158+
</div>
159+
</div>
160+
<!--/ Offline RL. -->
161+
162+
<!-- Overview . -->
163+
<div class="columns is-centered has-text-centered">
164+
<div class="column is-max-desktop">
165+
<h2 class="title is-3">Overview of GFP</h2>
166+
<div class="publication-image">
167+
<img src="./static/images/figure-gfp-overview.png" alt="GFP Overview" style="width:60%">
168+
</div>
169+
<div class="content has-text-justified">
170+
GFP consists of three main components: (i) in yellow, VaBC, a multi-step flow policy trained via weighted BC
171+
using the guidance term, (ii) in green, a one-step actor distilled from the flow policy, and (iii) in gray,
172+
a critic guiding action evaluation. VaBC regularizes the actor toward high-value actions from the dataset;
173+
in turn, the actor shapes the flow and optimizes the critic following the actor-critic approach.
174+
The different components of the figure are introduced in the paper.
175+
Each drawing represents the probability distribution of actions of a policy, in a current state s, except
176+
for the gray ones, where it is the value of actions in state s, according to the critic.
177+
</div>
178+
</div>
179+
</div>
180+
<!--/ Overview. -->
181+
182+
<!-- Results -->
183+
<div class="columns is-centered has-text-centered">
184+
<div class="column is-max-desktop">
185+
<h2 class="title is-3">Experiments across 144 tasks</h2>
186+
<div class="publication-image">
187+
<img src="./static/images/plots-gfp-perf-profiles.png"
188+
alt="Performance profiles comparing GFP and prior works" style="width:60%">
189+
</div>
190+
<div class="content has-text-justified">
191+
(a) Performance profiles for 50 tasks comparing GFP against a wide range of prior works, showing the
192+
fraction of tasks where each algorithm achieves a score above threshold tau. (b) Performance profiles on 105
193+
tasks, including more challenging ones, and carefully reevaluated prior methods. (c) Performance profiles
194+
restricted to 30 noisy and explore tasks.
195+
</div>
196+
</div>
197+
</div>
198+
<!--/ Results -->
199+
200+
201+
</div>
202+
</section>
203+
204+
</body>
205+
206+
</html>

publications/guided-flow-policy/static/css/bulma-carousel.min.css

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)