Skip to content

Commit cb5aea7

Browse files
committed
fix
1 parent 8429fb0 commit cb5aea7

File tree

3 files changed

+6
-6
lines changed

3 files changed

+6
-6
lines changed

ppo_continuous.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -151,13 +151,13 @@ def forward(self, state):
151151
def get_action(self, state, deterministic=False):
152152
state = torch.FloatTensor(state).unsqueeze(0).to(device)
153153
mean, log_std = self.forward(state)
154-
std = log_std.exp()
155-
normal = Normal(0, 1)
156-
z = normal.sample()
154+
157155
if deterministic:
158156
action = mean
159157
else:
160-
action = mean+std*z
158+
std = log_std.exp()
159+
normal = Normal(mean, std)
160+
action = normal.sample()
161161
action = torch.clamp(action, -self.action_range, self.action_range)
162162
return action.squeeze(0)
163163

ppo_continuous3.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
* It merge the losses of critic and actor into one update manner, using a single optimizer
99
instead of one for actor and one for critic.
1010
* It uses the min of clipping value loss and non-clipping value loss.
11-
* It additionally has a policy entropy bonus in loss (line 145)
11+
* It additionally has a policy entropy bonus in loss (line 146).
12+
* It uses MultivariateNormal for policy distribution instead of Normal.
1213
1314
1415
To run

ppo_gae_continuous.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,6 @@ def train_net(self):
123123
for i in range(K_epoch):
124124
td_target = r + gamma * self.v(s_prime) * done_mask
125125
delta = td_target - self.v(s)
126-
advantage = delta
127126
delta = delta.detach().numpy()
128127

129128
advantage_lst = []

0 commit comments

Comments
 (0)