StochLab
diff --git a/‎_data/people.yml‎
Lines changed: 8 additions & 0 deletions b/‎_data/people.yml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎_data/pubs.yml‎
Lines changed: 8 additions & 0 deletions b/‎_data/pubs.yml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎_projects/AverageRewardRL.md‎
Lines changed: 51 additions & 0 deletions b/‎_projects/AverageRewardRL.md‎
Lines changed: 51 additions & 0 deletions
diff --git a/‎img/AverageRL/empresults.png‎
3.73 MB b/‎img/AverageRL/empresults.png‎
3.73 MB
diff --git a/‎img/AverageRL/flow_diagram.png‎
404 KB b/‎img/AverageRL/flow_diagram.png‎
404 KB
diff --git a/‎papers/Average_Reward_ICML_2023.pdf‎
3.9 MB b/‎papers/Average_Reward_ICML_2023.pdf‎
3.9 MB
@@ -172,6 +172,14 @@ namansaxena:
     #image: /img/people/pramod.jpg
     bio: CSA, IISc
 
+subho:
+    display_name: "Subhojyoti Khastagir"
+    webpage: "https://www.linkedin.com/in/subhojyoti-khastagir-2a4716152/"
+    role: alum
+    #image: /img/people/pramod.jpg
+    bio: CSA, IISc
+
+
 abhishekranjan:
     display_name: "Abhishek Ranjan"
     #webpage: "https://www.linkedin.com/in/lauraleeane/"
 
@@ -159,3 +159,11 @@
   publisher: "IEEE International Conference on Robotics and Automation (ICRA) 2023, London, UK"
   pdf: force_lp_ICRA_2023.pdf
   projects: [ quadruped ]
+
+- title: "Off-Policy Average Reward Actor-Critic with Deterministic Policy Search"
+  authors: [Naman Saxena ,Subhojyoti Khastagir, Shishir Kolathaya, Shalabh Bhatnagar]
+  date: 2023-07-25
+  pub-type: conference
+  publisher: "International Conference on Machine Learning (ICML) 2023, Hawaii, US"
+  pdf: Average_Reward_ICML_2023.pdf
+  projects: [ learning ]
@@ -0,0 +1,51 @@
+---
+title: Off-Policy Average Reward Actor-Critic with Deterministic Policy Search
+
+description: |
+  A framework for utilizing experience for generating predictive simulations and learning from them.
+people:
+  - namansaxena
+  - subho
+  - shishir
+  - shalabh
+
+layout: project
+image: "/img/AverageRL/flow_diagram.png"
+last-updated: 2023-08-05
+---
+
+<br>
+#### Abstract
+The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $\epsilon$-optimal stationary policy with a sample complexity of $\Omega(\epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
+
+Fore more references, refer to paper at [proceedings.mlr.press/v202/saxena23a/saxena23a.pdf](https://proceedings.mlr.press/v202/saxena23a/saxena23a.pdf) and code at [github.com/namansaxena9/ARO-DDPG](https://github.com/namansaxena9/ARO-DDPG)
+
+<br>
+## Block Diagram of the algorithm
+<div style="text-align:center">
+<img src="{{site.base}}/img/DeMoRL/methodology.jpg" alt="drawing"/>
+</div>
+<br>
+
+## Simulation Results
+
+<p align="center">
+  <img width="60%" src="{{site.base}}/img/AverageRL/empresults.png">
+</p>
+<br>
+
+<br/>
+## Citations ##
+```
+@inproceedings{saxena2023off,
+  title={Off-Policy Average Reward Actor-Critic with Deterministic Policy Search},
+  author={Saxena, Naman and Khastagir, Subhojyoti and Shishir, NY and Bhatnagar, Shalabh},
+  booktitle={International Conference on Machine Learning},
+  pages={30130--30203},
+  year={2023},
+  organization={PMLR}
+}
+```
+<br>
+<br/>
+