Skip to content

Commit 1cbce72

Browse files
authored
Merge pull request #15 from namansaxena9/master
Adding ICML paper
2 parents 5849895 + 1906da5 commit 1cbce72

File tree

6 files changed

+69
-0
lines changed

6 files changed

+69
-0
lines changed

_data/people.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,14 @@ namansaxena:
177177
#image: /img/people/pramod.jpg
178178
bio: CSA, IISc
179179

180+
subho:
181+
display_name: "Subhojyoti Khastagir"
182+
webpage: "https://www.linkedin.com/in/subhojyoti-khastagir-2a4716152/"
183+
role: alum
184+
#image: /img/people/pramod.jpg
185+
bio: CSA, IISc
186+
187+
180188
abhishekranjan:
181189
display_name: "Abhishek Ranjan"
182190
#webpage: "https://www.linkedin.com/in/lauraleeane/"

_data/pubs.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,3 +159,11 @@
159159
publisher: "IEEE International Conference on Robotics and Automation (ICRA) 2023, London, UK"
160160
pdf: force_lp_ICRA_2023.pdf
161161
projects: [ quadruped ]
162+
163+
- title: "Off-Policy Average Reward Actor-Critic with Deterministic Policy Search"
164+
authors: [Naman Saxena ,Subhojyoti Khastagir, Shishir Kolathaya, Shalabh Bhatnagar]
165+
date: 2023-07-25
166+
pub-type: conference
167+
publisher: "International Conference on Machine Learning (ICML) 2023, Hawaii, US"
168+
pdf: Average_Reward_ICML_2023.pdf
169+
projects: [ learning ]

_projects/AverageRewardRL.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Off-Policy Average Reward Actor-Critic with Deterministic Policy Search
3+
4+
description: |
5+
Policy gradient theorem for average reward criteria with deterministic policy.
6+
people:
7+
- namansaxena
8+
- subho
9+
- shishir
10+
- shalabh
11+
12+
layout: project
13+
image: "/img/AverageRL/flow_diagram.png"
14+
last-updated: 2023-08-05
15+
---
16+
17+
<br>
18+
#### Abstract
19+
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $\epsilon$-optimal stationary policy with a sample complexity of $\Omega(\epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
20+
21+
Fore more references, refer to paper at [proceedings.mlr.press/v202/saxena23a/saxena23a.pdf](https://proceedings.mlr.press/v202/saxena23a/saxena23a.pdf) and code at [github.com/namansaxena9/ARO-DDPG](https://github.com/namansaxena9/ARO-DDPG)
22+
23+
<br>
24+
25+
## Block Diagram of the algorithm
26+
<div style="text-align:center">
27+
<img src="{{site.base}}/img/AverageRL/flow_diagram.jpg" alt="drawing"/>
28+
</div>
29+
<br>
30+
31+
## Simulation Results
32+
33+
<p align="center">
34+
<img width="60%" src="{{site.base}}/img/AverageRL/empresults.png">
35+
</p>
36+
<br>
37+
38+
<br/>
39+
40+
## Citations ##
41+
```
42+
@inproceedings{saxena2023off,
43+
title={Off-Policy Average Reward Actor-Critic with Deterministic Policy Search},
44+
author={Saxena, Naman and Khastagir, Subhojyoti and Shishir, NY and Bhatnagar, Shalabh},
45+
booktitle={International Conference on Machine Learning},
46+
pages={30130--30203},
47+
year={2023},
48+
organization={PMLR}
49+
}
50+
```
51+
<br>
52+
<br/>
53+

img/AverageRL/empresults.png

3.73 MB
Loading

img/AverageRL/flow_diagram.png

404 KB
Loading
3.9 MB
Binary file not shown.

0 commit comments

Comments
 (0)