Hi Faraz,
I am studying the paper and your implementation is very helpful! I have a question though. It seems that the critic network in the paper takes in history -- which in this case is hidden state of the actor's LSTM, I presume -- rather than the observed state of the environment.
https://github.com/fshamshirdar/pytorch-rdpg/blob/master/rdpg.py#L139-L141
Hi Faraz,
I am studying the paper and your implementation is very helpful! I have a question though. It seems that the critic network in the paper takes in history -- which in this case is hidden state of the actor's LSTM, I presume -- rather than the observed state of the environment.
https://github.com/fshamshirdar/pytorch-rdpg/blob/master/rdpg.py#L139-L141