|
| 1 | +<!DOCTYPE html> |
| 2 | +<html> |
| 3 | + |
| 4 | +<head> |
| 5 | + <meta charset="utf-8"> |
| 6 | + <meta name="description" content="Guided flow policy"> |
| 7 | + <meta name="keywords" |
| 8 | + content="Reinforcement learning, Offline RL, Flow Matching, Diffusion, Behavior cloning, Behavior regularized actor critic, weighted BC"> |
| 9 | + <meta name="viewport" content="width=device-width, initial-scale=1"> |
| 10 | + <title>Guided Flow Policy</title> |
| 11 | + |
| 12 | + <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet"> |
| 13 | + |
| 14 | + <link rel="stylesheet" href="./static/css/bulma.min.css"> |
| 15 | + <link rel="stylesheet" href="./static/css/bulma-carousel.min.css"> |
| 16 | + <link rel="stylesheet" href="./static/css/bulma-slider.min.css"> |
| 17 | + <link rel="stylesheet" href="./static/css/fontawesome.all.min.css"> |
| 18 | + <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> |
| 19 | + <link rel="stylesheet" href="./static/css/index.css"> |
| 20 | + <link rel="icon" href="./static/images/favicon.svg"> |
| 21 | + |
| 22 | + <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> |
| 23 | + <script defer src="./static/js/fontawesome.all.min.js"></script> |
| 24 | + <script src="./static/js/bulma-carousel.min.js"></script> |
| 25 | + <script src="./static/js/bulma-slider.min.js"></script> |
| 26 | + <script src="./static/js/index.js"></script> |
| 27 | +</head> |
| 28 | + |
| 29 | +<body> |
| 30 | + |
| 31 | + <nav class="navbar" role="navigation" aria-label="main navigation"> |
| 32 | + <div class="navbar-brand"> |
| 33 | + <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false"> |
| 34 | + <span aria-hidden="true"></span> |
| 35 | + <span aria-hidden="true"></span> |
| 36 | + <span aria-hidden="true"></span> |
| 37 | + </a> |
| 38 | + </div> |
| 39 | + <div class="navbar-menu"> |
| 40 | + <div class="navbar-start" style="flex-grow: 1; justify-content: center;"> |
| 41 | + <a class="navbar-item" href="https://simple-robotics.github.io"> |
| 42 | + <span class="icon"> |
| 43 | + <i class="fas fa-home"></i> |
| 44 | + </span> |
| 45 | + </a> |
| 46 | + |
| 47 | + <div class="navbar-item has-dropdown is-hoverable"> |
| 48 | + <a class="navbar-link"> |
| 49 | + More Research |
| 50 | + </a> |
| 51 | + <div class="navbar-dropdown"> |
| 52 | + <a class="navbar-item" href="https://simple-robotics.github.io/publications/simple-contact-solver/"> |
| 53 | + Simple simulator |
| 54 | + </a> |
| 55 | + <a class="navbar-item" href="https://inria.hal.science/hal-05179357v1"> |
| 56 | + Sobolev diffusion policy |
| 57 | + </a> |
| 58 | + </div> |
| 59 | + </div> |
| 60 | + </div> |
| 61 | + </div> |
| 62 | + </nav> |
| 63 | + |
| 64 | + <section class="hero"> |
| 65 | + <div class="hero-body"> |
| 66 | + <div class="container is-max-desktop"> |
| 67 | + <div class="columns is-centered"> |
| 68 | + <div class="column has-text-centered"> |
| 69 | + <h1 class="title is-1 publication-title">Guided Flow Policy: Learning from High-Value Actions in Offline RL |
| 70 | + </h1> |
| 71 | + <div class="is-size-5 publication-authors"> |
| 72 | + <span class="author-block"> |
| 73 | + <a href="https://www.linkedin.com/in/frankinguimatsia/">Franki Nguimatsia |
| 74 | + Tiofack</a><sup>1,*</sup>,</span> |
| 75 | + <span class="author-block"> |
| 76 | + <a href="https://www.linkedin.com/in/theotime-le-hellard-a3a066249/">Théotime Le |
| 77 | + Hellard</a><sup>1,*</sup>,</span> |
| 78 | + <span class="author-block"> |
| 79 | + <a href="https://fabinsch.github.io/">Fabian Schramm</a><sup>1,*</sup>,</span> |
| 80 | + </div> |
| 81 | + |
| 82 | + <div class="is-size-5 publication-authors"> |
| 83 | + <span class="author-block"> |
| 84 | + <a href="https://www.isir.upmc.fr/personnel/perrin/?lang=en">Nicolas |
| 85 | + Perrin-Gilbert</a><sup>2</sup>,</span> |
| 86 | + <span class="author-block"> |
| 87 | + <a href="https://jcarpent.github.io">Justin Carpentier</a><sup>1</sup></span> |
| 88 | + </div> |
| 89 | + |
| 90 | + <div class="is-size-5 publication-authors"> |
| 91 | + <span class="author-block"><sup>*</sup>Equal contribution.  <sup>1</sup>Willow, Inria - |
| 92 | + ENS.   |
| 93 | + <sup>2</sup></span>ISIR. |
| 94 | + </div> |
| 95 | + |
| 96 | + <div class="column has-text-centered"> |
| 97 | + <div class="publication-links"> |
| 98 | + <!-- PDF Link. --> |
| 99 | + <span class="link-block"> |
| 100 | + <a href="./static/paper/guided-flow-policy-2025.pdf" |
| 101 | + class="external-link button is-normal is-rounded is-dark"> |
| 102 | + <span class="icon"> |
| 103 | + <i class="fas fa-file-pdf"></i> |
| 104 | + </span> |
| 105 | + <span>Paper</span> |
| 106 | + </a> |
| 107 | + </span> |
| 108 | + <span class="link-block"> |
| 109 | + <a href="https://github.com/Simple-Robotics/guided-flow-policy" |
| 110 | + class="external-link button is-normal is-rounded is-dark"> |
| 111 | + <span class="icon"> |
| 112 | + <i class="fab fa-github"></i> |
| 113 | + </span> |
| 114 | + <span>Code</span> |
| 115 | + </a> |
| 116 | + </span> |
| 117 | + </div> |
| 118 | + </div> |
| 119 | + </div> |
| 120 | + </div> |
| 121 | + </div> |
| 122 | + </div> |
| 123 | + </section> |
| 124 | + |
| 125 | + <section class="section"> |
| 126 | + <div class="container is-max-desktop"> |
| 127 | + <!-- Abstract. --> |
| 128 | + <div class="columns is-centered has-text-centered"> |
| 129 | + <div class="column is-four-fifths"> |
| 130 | + <h2 class="title is-3">Abstract</h2> |
| 131 | + <div class="content has-text-justified"> |
| 132 | + Offline reinforcement learning often relies on behavior regularization that enforces policies to remain |
| 133 | + close to the dataset distribution. |
| 134 | + However, such approaches fail to distinguish between high-value and low-value actions in their |
| 135 | + regularization components. |
| 136 | + We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled |
| 137 | + one-step actor. |
| 138 | + The actor directs the flow policy through weighted behavior cloning to focus on cloning high-value actions |
| 139 | + from the dataset rather than indiscriminately imitating all state-action pairs. |
| 140 | + In turn, the flow policy constrains the actor to remain aligned with the dataset's best transitions while |
| 141 | + maximizing the critic. |
| 142 | + This mutual guidance enables GFP to achieve state-of-the-art performance across 144 state and pixel-based |
| 143 | + tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and |
| 144 | + challenging tasks. |
| 145 | + </div> |
| 146 | + </div> |
| 147 | + </div> |
| 148 | + <!--/ Abstract. --> |
| 149 | + |
| 150 | + <!-- Offline RL . --> |
| 151 | + <div class="columns is-centered has-text-centered"> |
| 152 | + <div class="column is-max-desktop"> |
| 153 | + <h2 class="title is-3">Offline Reinforcement Learning</h2> |
| 154 | + <div class="publication-image"> |
| 155 | + <img src="./static/images/figure-gfp-related-works.png" alt="List of offline RL approaches" |
| 156 | + style="width:60%"> |
| 157 | + </div> |
| 158 | + </div> |
| 159 | + </div> |
| 160 | + <!--/ Offline RL. --> |
| 161 | + |
| 162 | + <!-- Overview . --> |
| 163 | + <div class="columns is-centered has-text-centered"> |
| 164 | + <div class="column is-max-desktop"> |
| 165 | + <h2 class="title is-3">Overview of GFP</h2> |
| 166 | + <div class="publication-image"> |
| 167 | + <img src="./static/images/figure-gfp-overview.png" alt="GFP Overview" style="width:60%"> |
| 168 | + </div> |
| 169 | + <div class="content has-text-justified"> |
| 170 | + GFP consists of three main components: (i) in yellow, VaBC, a multi-step flow policy trained via weighted BC |
| 171 | + using the guidance term, (ii) in green, a one-step actor distilled from the flow policy, and (iii) in gray, |
| 172 | + a critic guiding action evaluation. VaBC regularizes the actor toward high-value actions from the dataset; |
| 173 | + in turn, the actor shapes the flow and optimizes the critic following the actor-critic approach. |
| 174 | + The different components of the figure are introduced in the paper. |
| 175 | + Each drawing represents the probability distribution of actions of a policy, in a current state s, except |
| 176 | + for the gray ones, where it is the value of actions in state s, according to the critic. |
| 177 | + </div> |
| 178 | + </div> |
| 179 | + </div> |
| 180 | + <!--/ Overview. --> |
| 181 | + |
| 182 | + <!-- Results --> |
| 183 | + <div class="columns is-centered has-text-centered"> |
| 184 | + <div class="column is-max-desktop"> |
| 185 | + <h2 class="title is-3">Experiments across 144 tasks</h2> |
| 186 | + <div class="publication-image"> |
| 187 | + <img src="./static/images/plots-gfp-perf-profiles.png" |
| 188 | + alt="Performance profiles comparing GFP and prior works" style="width:60%"> |
| 189 | + </div> |
| 190 | + <div class="content has-text-justified"> |
| 191 | + (a) Performance profiles for 50 tasks comparing GFP against a wide range of prior works, showing the |
| 192 | + fraction of tasks where each algorithm achieves a score above threshold tau. (b) Performance profiles on 105 |
| 193 | + tasks, including more challenging ones, and carefully reevaluated prior methods. (c) Performance profiles |
| 194 | + restricted to 30 noisy and explore tasks. |
| 195 | + </div> |
| 196 | + </div> |
| 197 | + </div> |
| 198 | + <!--/ Results --> |
| 199 | + |
| 200 | + |
| 201 | + </div> |
| 202 | + </section> |
| 203 | + |
| 204 | +</body> |
| 205 | + |
| 206 | +</html> |
0 commit comments