-
Hi, I wanted therefore to start a discussion on how to properly use the Prioritized Experience Replay class. You do provide a basic example in this repository using DQN which I have some questions about. To update the priorities you seem to use mean square error
I think in the paper they were using abs(target - Q_pred) to update the priorities. My next question is about the importance sampling weights. Does somebody have experience with the hyperparameters? This is roughly how i'm using the PER. Am I doing something conceptually wrong?
Thanks a lot. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Hi, @Kait0 I'm sorry for my incorrect examples. https://github.com/ymd-h/cpprb/blob/444a510282bb8bcba57ec21f6e9050ea2e181de0/example/dqn.py I assume you mention the above example code, right?
You are right. The priority should be absolute value of TD error.
Yes.
No. That is a bug. (To be honest, I haven't understood correctly when I wrote that example code.)
Recently, there is a paper which theoretically explains the contribution of alpha and beta at PER. According to the above paper, PER is somehow disturbed with MSE loss when beta is not 1.
I am still implementing LAP support (hopefully it will come soon), however, you can already implement with
(Internally, it is bit inefficient because of unused weight calculation) If you are still questions, please feel free to ask me. |
Beta Was this translation helpful? Give feedback.
-
Hi,
|
Beta Was this translation helpful? Give feedback.
Hi,
thanks for your reply.
I got the PER to work on my problem now.
In case somebody comes accross the same problems the first one was a technical problem:
In my code all the Q value tensors/TD errors where of matrix shape [batch_size, 1].
The importance sampling weights from the library are of vector shape [batch_size].
When I multiplied them with the loss tensor python broadcasted the result to a matrix [batch_size, batch_size] which I didn't notice because the mean also works and spits out one number if you give it a matrix.
So to fix this I simply had to unsqueeze the weights after reading them.
replay_weights = torch.unsqueeze(torch.from_numpy(batch1['weights']).to(device),1)
The…