Error in projection_distribution (Distributional DQN) ?

Hi, 

I have a question regarding the `projection_distribution` method. It seems that when you are projecting back on the support/bins, at lines : 

```
proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist * (u.float() - b)).view(-1)) 
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist * (b - l.float()) ).view(-1))
```
the distribution `next_dist` is scaled by the support from the line 
`
next_dist   = target_model(next_state).data.cpu() * support
`
It seems like this should not be the case. This results in the final projected distribution not summing up to one.  It seems one should do something like 
```
next_dist_raw = target_model(next_state).data.cpu()
next_dist = next_dist_raw * support
next_action = next_dist.sum(2).max(1)[1]
next_action = next_action.unsqueeze(1).unsqueeze(1).expand(next_dist.size(0), 1, next_dist.size(2))
next_dist = next_dist.gather(1, next_action).squeeze(1)
next_dist_raw = next_dist_raw.gather(1, next_action).squeeze(1)
```

```
proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist_raw * (u.float() - b)).view(-1))
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist_raw * (b - l.float()) ).view(-1))
```

This results in a distribution that contains the same amount of mass as the original one. 


Thank you, 
Lucas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in projection_distribution (Distributional DQN) ? #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error in projection_distribution (Distributional DQN) ? #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions