Skip to content

Commit 52bf738

Browse files
committed
update documentation
1 parent d8b9a75 commit 52bf738

File tree

6 files changed

+24
-25
lines changed

6 files changed

+24
-25
lines changed

docs/index.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@
55
Deep Reinforcement Learning algorithm implementation for simulated robot navigation in IR-SIM. Using 2D laser sensor data
66
and information about the goal point a robot learns to navigate to a specified point in the environment.
77

8-
![Example](https://github.com/reiniscimurs/DRL-robot-navigation-IR-SIM/blob/master/out.gif)
9-
108
**Installation**
119

1210
* Package versioning is managed with poetry \

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ nav:
88
Models:
99
- DDPG: api/models/DDPG.md
1010
- TD3: api/models/TD3.md
11-
- CNNTD3: api/models/train.md
11+
- CNNTD3: api/models/cnntd3.md
1212
- RCPG: api/models/RCPG.md
1313
- HCM: api/models/HCM.md
1414
- PPO: api/models/PPO.md

robot_nav/models/CNNTD3/CNNTD3.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ def get_action(self, obs, add_noise):
252252
add_noise (bool): Whether to add exploration noise to the action.
253253
254254
Returns:
255-
np.ndarray: The selected action.
255+
(np.ndarray): The selected action.
256256
"""
257257
if add_noise:
258258
return (
@@ -269,7 +269,7 @@ def act(self, state):
269269
state (np.ndarray): Input state.
270270
271271
Returns:
272-
np.ndarray: Action predicted by the actor network.
272+
(np.ndarray): Action predicted by the actor network.
273273
"""
274274
# Function to get the action from the actor
275275
state = torch.Tensor(state).to(self.device)
@@ -472,7 +472,7 @@ def prepare_state(self, latest_scan, distance, cos, sin, collision, goal, action
472472
action (list or np.ndarray): Last action taken [lin_vel, ang_vel].
473473
474474
Returns:
475-
tuple:
475+
(tuple):
476476
- state (list): Normalized and concatenated state vector.
477477
- terminal (int): Terminal flag (1 if collision or goal, else 0).
478478
"""

robot_nav/models/DDPG/DDPG.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def forward(self, s):
4646
s (torch.Tensor): Input state tensor of shape (batch_size, state_dim).
4747
4848
Returns:
49-
torch.Tensor: Output action tensor of shape (batch_size, action_dim), scaled to [-1, 1].
49+
(torch.Tensor): Output action tensor of shape (batch_size, action_dim), scaled to [-1, 1].
5050
"""
5151
s = F.leaky_relu(self.layer_1(s))
5252
s = F.leaky_relu(self.layer_2(s))
@@ -95,7 +95,7 @@ def forward(self, s, a):
9595
a (torch.Tensor): Action tensor of shape (batch_size, action_dim).
9696
9797
Returns:
98-
torch.Tensor: Q-value tensor of shape (batch_size, 1).
98+
(torch.Tensor): Q-value tensor of shape (batch_size, 1).
9999
"""
100100
s1 = F.leaky_relu(self.layer_1(s))
101101
self.layer_2_s(s1)
@@ -182,7 +182,7 @@ def get_action(self, obs, add_noise):
182182
add_noise (bool): Whether to add exploration noise to the action.
183183
184184
Returns:
185-
np.array: Action selected by the actor network.
185+
(np.array): Action selected by the actor network.
186186
"""
187187
if add_noise:
188188
return (
@@ -199,7 +199,7 @@ def act(self, state):
199199
state (np.array): Environment state.
200200
201201
Returns:
202-
np.array: Action values as output by the actor network.
202+
(np.array): Action values as output by the actor network.
203203
"""
204204
state = torch.Tensor(state).to(self.device)
205205
return self.actor(state).cpu().data.numpy().flatten()
@@ -225,7 +225,7 @@ def train(
225225
Trains the actor and critic networks using a replay buffer and soft target updates.
226226
227227
Args:
228-
replay_buffer (object): Replay buffer object with a sample_batch method.
228+
replay_buffer (ReplayBuffer): Replay buffer object with a sample_batch method.
229229
iterations (int): Number of training iterations.
230230
batch_size (int): Size of each training batch.
231231
discount (float): Discount factor for future rewards.
@@ -397,7 +397,7 @@ def prepare_state(self, latest_scan, distance, cos, sin, collision, goal, action
397397
action (list or np.array): The action taken in the previous step.
398398
399399
Returns:
400-
tuple: (state vector, terminal flag)
400+
(tuple): (state vector, terminal flag)
401401
"""
402402
latest_scan = np.array(latest_scan)
403403

robot_nav/models/HCM/hardcoded_model.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@
88

99
class HCM(object):
1010
"""
11-
A class representing a hybrid control model (HCM) for a robot's navigation system.
11+
A class representing a Hard-Coded model (HCM) for a robot's navigation system.
1212
1313
This class contains methods for generating actions based on the robot's state, preparing state
1414
representations, training (placeholder method), saving/loading models, and logging experiences.
15+
The method is suboptimal in order to collect collisions for pre-training of DRL models.
1516
1617
Attributes:
1718
max_action (float): The maximum possible action value.
@@ -59,7 +60,7 @@ def get_action(self, state, add_noise):
5960
add_noise (bool): Whether to add noise to the action for exploration.
6061
6162
Returns:
62-
list: The computed action [linear velocity, angular velocity].
63+
(list): The computed action [linear velocity, angular velocity].
6364
"""
6465
sin = state[-3]
6566
cos = state[-4]
@@ -99,7 +100,7 @@ def train(
99100
Placeholder method for training the hybrid control model.
100101
101102
Args:
102-
replay_buffer (object): The replay buffer containing past experiences.
103+
replay_buffer (object): The replay buffer containing experiences.
103104
iterations (int): The number of training iterations.
104105
batch_size (int): The batch size for training.
105106
discount (float): The discount factor for future rewards.
@@ -153,7 +154,7 @@ def prepare_state(self, latest_scan, distance, cos, sin, collision, goal, action
153154
action (list): The action taken by the robot, [linear velocity, angular velocity].
154155
155156
Returns:
156-
tuple: A tuple containing the prepared state and a terminal flag (1 if terminal state, 0 otherwise).
157+
(tuple): A tuple containing the prepared state and a terminal flag (1 if terminal state, 0 otherwise).
157158
"""
158159
latest_scan = np.array(latest_scan)
159160

robot_nav/models/PPO/PPO.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,11 @@ def add(self, state, action, reward, terminal, next_state):
4747
Add a transition to the buffer. (Partial implementation.)
4848
4949
Args:
50-
state: The current observed state.
51-
action: The action taken.
52-
reward: The reward received after taking the action.
50+
state (list or np.array): The current observed state.
51+
action (list or np.array): The action taken.
52+
reward (float): The reward received after taking the action.
5353
terminal (bool): Whether the episode terminated.
54-
next_state: The resulting state after taking the action.
54+
next_state (list or np.array): The resulting state after taking the action.
5555
"""
5656
self.states.append(state)
5757
self.rewards.append(reward)
@@ -137,7 +137,7 @@ def act(self, state, sample):
137137
sample (bool): Whether to sample from the action distribution or use mean.
138138
139139
Returns:
140-
Tuple[Tensor, Tensor, Tensor]: Sampled (or mean) action, log probability, and state value.
140+
(Tuple[Tensor, Tensor, Tensor]): Sampled (or mean) action, log probability, and state value.
141141
"""
142142
action_mean = self.actor(state)
143143
cov_mat = torch.diag(self.action_var).unsqueeze(dim=0)
@@ -163,7 +163,7 @@ def evaluate(self, state, action):
163163
action (Tensor): Batch of actions.
164164
165165
Returns:
166-
Tuple[Tensor, Tensor, Tensor]: Action log probabilities, state values, and distribution entropy.
166+
(Tuple[Tensor, Tensor, Tensor]): Action log probabilities, state values, and distribution entropy.
167167
"""
168168
action_mean = self.actor(state)
169169

@@ -306,7 +306,7 @@ def get_action(self, state, add_noise):
306306
add_noise (bool): Whether to sample from the distribution (True) or use the deterministic mean (False).
307307
308308
Returns:
309-
np.ndarray: Sampled action.
309+
(np.ndarray): Sampled action.
310310
"""
311311

312312
with torch.no_grad():
@@ -326,7 +326,7 @@ def train(self, replay_buffer, iterations, batch_size):
326326
Train the policy and value function using PPO loss based on the stored rollout buffer.
327327
328328
Args:
329-
replay_buffer: Placeholder for compatibility (not used).
329+
replay_buffer (object): Placeholder for compatibility (not used).
330330
iterations (int): Number of epochs to optimize the policy per update.
331331
batch_size (int): Batch size (not used; training uses the whole buffer).
332332
"""
@@ -434,7 +434,7 @@ def prepare_state(self, latest_scan, distance, cos, sin, collision, goal, action
434434
action (tuple[float, float]): Last action taken (linear and angular velocities).
435435
436436
Returns:
437-
tuple[list[float], int]: Processed state vector and terminal flag (1 if terminal, else 0).
437+
(tuple[list[float], int]): Processed state vector and terminal flag (1 if terminal, else 0).
438438
"""
439439
latest_scan = np.array(latest_scan)
440440

0 commit comments

Comments
 (0)