Skip to content

Error in projection_distribution (Distributional DQN) ? #8

@pclucas14

Description

@pclucas14

Hi,

I have a question regarding the projection_distribution method. It seems that when you are projecting back on the support/bins, at lines :

proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist * (u.float() - b)).view(-1)) 
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist * (b - l.float()) ).view(-1))

the distribution next_dist is scaled by the support from the line
next_dist = target_model(next_state).data.cpu() * support
It seems like this should not be the case. This results in the final projected distribution not summing up to one. It seems one should do something like

next_dist_raw = target_model(next_state).data.cpu()
next_dist = next_dist_raw * support
next_action = next_dist.sum(2).max(1)[1]
next_action = next_action.unsqueeze(1).unsqueeze(1).expand(next_dist.size(0), 1, next_dist.size(2))
next_dist = next_dist.gather(1, next_action).squeeze(1)
next_dist_raw = next_dist_raw.gather(1, next_action).squeeze(1)
proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist_raw * (u.float() - b)).view(-1))
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist_raw * (b - l.float()) ).view(-1))

This results in a distribution that contains the same amount of mass as the original one.

Thank you,
Lucas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions