local_rollout_batch_size

Total 2 A100 GPUs
per_device_train_batch_size=16
local_rollout_batch_size=10
 It is  necessary for local_rollout_batch_size to be multiplied by the number of GPUs-1 used for Actor to exactly equal the number of tasks?
If I use 8 GPUs and set local Rollout batch size not to 2, I will encounter an error:
    local_token_obs["input_ids"][:, :processed_obs["input_ids"].shape[1]] = processed_obs["input_ids"]
RuntimeError: The expanded size of the tensor (4) must match the existing size (2) at non-singleton dimension 0.  Target sizes: [4, 34].  Tensor sizes: [2, 34]
Actor ObjectRef(2751d69548dba9565d1370028d345bffd68a1c6b0100000001000000) died



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local_rollout_batch_size #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

local_rollout_batch_size #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions