You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| TD3 | Twin Delayed Deep Deterministic Policy Gradient model |https://github.com/reiniscimurs/DRL-Robot-Navigation-ROS2|
42
+
| SAC | Soft Actor-Critic model |https://github.com/denisyarats/pytorch_sac|
43
+
| PPO | Proximal Policy Optimization model |https://github.com/nikhilbarhate99/PPO-PyTorch|
44
+
| DDPG | Deep Deterministic Policy Gradient model | Updated from TD3 |
45
+
| CNNTD3 | TD3 model with 1D CNN encoding of laser state | - |
46
+
| RCPG | Recurrent Convolution Policy Gradient - adding recurrence layers (lstm/gru/rnn) to CNNTD3 model | - |
47
+
48
+
**Max Upper Bound Models**
49
+
50
+
Models that support the additional loss of Q values exceeding the maximal possible Q value in the episode. Q values that exceed this upper bound are used to calculate a loss for the model. This helps to control the overestimation of Q values in off-policy actor-critic networks.
51
+
To enable max upper bound loss set `use_max_bound = True` when initializing a model.
0 commit comments