Skip to content

Commit d378199

Browse files
committed
Adding ICML paper
1 parent 86038b8 commit d378199

File tree

6 files changed

+67
-0
lines changed

6 files changed

+67
-0
lines changed

_data/people.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,14 @@ namansaxena:
172172
#image: /img/people/pramod.jpg
173173
bio: CSA, IISc
174174

175+
subho:
176+
display_name: "Subhojyoti Khastagir"
177+
webpage: "https://www.linkedin.com/in/subhojyoti-khastagir-2a4716152/"
178+
role: alum
179+
#image: /img/people/pramod.jpg
180+
bio: CSA, IISc
181+
182+
175183
abhishekranjan:
176184
display_name: "Abhishek Ranjan"
177185
#webpage: "https://www.linkedin.com/in/lauraleeane/"

_data/pubs.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,3 +159,11 @@
159159
publisher: "IEEE International Conference on Robotics and Automation (ICRA) 2023, London, UK"
160160
pdf: force_lp_ICRA_2023.pdf
161161
projects: [ quadruped ]
162+
163+
- title: "Off-Policy Average Reward Actor-Critic with Deterministic Policy Search"
164+
authors: [Naman Saxena ,Subhojyoti Khastagir, Shishir Kolathaya, Shalabh Bhatnagar]
165+
date: 2023-07-25
166+
pub-type: conference
167+
publisher: "International Conference on Machine Learning (ICML) 2023, Hawaii, US"
168+
pdf: Average_Reward_ICML_2023.pdf
169+
projects: [ learning ]

_projects/AverageRewardRL.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
title: Off-Policy Average Reward Actor-Critic with Deterministic Policy Search
3+
4+
description: |
5+
A framework for utilizing experience for generating predictive simulations and learning from them.
6+
people:
7+
- namansaxena
8+
- subho
9+
- shishir
10+
- shalabh
11+
12+
layout: project
13+
image: "/img/AverageRL/flow_diagram.png"
14+
last-updated: 2023-08-05
15+
---
16+
17+
<br>
18+
#### Abstract
19+
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $\epsilon$-optimal stationary policy with a sample complexity of $\Omega(\epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
20+
21+
Fore more references, refer to paper at [proceedings.mlr.press/v202/saxena23a/saxena23a.pdf](https://proceedings.mlr.press/v202/saxena23a/saxena23a.pdf) and code at [github.com/namansaxena9/ARO-DDPG](https://github.com/namansaxena9/ARO-DDPG)
22+
23+
<br>
24+
## Block Diagram of the algorithm
25+
<div style="text-align:center">
26+
<img src="{{site.base}}/img/DeMoRL/methodology.jpg" alt="drawing"/>
27+
</div>
28+
<br>
29+
30+
## Simulation Results
31+
32+
<p align="center">
33+
<img width="60%" src="{{site.base}}/img/AverageRL/empresults.png">
34+
</p>
35+
<br>
36+
37+
<br/>
38+
## Citations ##
39+
```
40+
@inproceedings{saxena2023off,
41+
title={Off-Policy Average Reward Actor-Critic with Deterministic Policy Search},
42+
author={Saxena, Naman and Khastagir, Subhojyoti and Shishir, NY and Bhatnagar, Shalabh},
43+
booktitle={International Conference on Machine Learning},
44+
pages={30130--30203},
45+
year={2023},
46+
organization={PMLR}
47+
}
48+
```
49+
<br>
50+
<br/>
51+

img/AverageRL/empresults.png

3.73 MB
Loading

img/AverageRL/flow_diagram.png

404 KB
Loading
3.9 MB
Binary file not shown.

0 commit comments

Comments
 (0)