The difference between offical pseudo code and this repository about "num_unroll_steps"

### Search before asking

- [X] I have searched the MuZero [issues](https://github.com/werner-duvaud/muzero-general/issues) and found no similar bug report.


### 🐛 Describe the bug

this is offical pseudocode about update weight:
```
def update_weights(optimizer: tf.train.Optimizer, network: Network, batch,
                   weight_decay: float):
  loss = 0
  for image, actions, targets in batch:
    # Initial step, from the real observation.
    value, reward, policy_logits, hidden_state = network.initial_inference(
        image)
    predictions = [(1.0, value, reward, policy_logits)]

    # Recurrent steps, from action and previous hidden state.
    for action in actions:
      value, reward, policy_logits, hidden_state = network.recurrent_inference(
          hidden_state, action)
      predictions.append((1.0 / len(actions), value, reward, policy_logits))

      hidden_state = scale_gradient(hidden_state, 0.5)

    for prediction, target in zip(predictions, targets):
      gradient_scale, value, reward, policy_logits = prediction
      target_value, target_reward, target_policy = target

      l = (
          scalar_loss(value, target_value) +
          scalar_loss(reward, target_reward) +
          tf.nn.softmax_cross_entropy_with_logits(
              logits=policy_logits, labels=target_policy))

      loss += scale_gradient(l, gradient_scale)

  for weights in network.get_weights():
    loss += weight_decay * tf.nn.l2_loss(weights)

  optimizer.minimize(loss)
```
**and it only train action happend in history, exclude anything past the end of games，but will train action past the end of games in muzero_general** https://github.com/werner-duvaud/muzero-general/blob/0c4c335d0492d48f7cb8979d479b2761b5d267fb/replay_buffer.py#L291

### Add an example

as mentioned above

### Environment

_No response_

### Minimal Reproducible Example

_No response_

### Additional

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The difference between offical pseudo code and this repository about "num_unroll_steps" #221

Search before asking

🐛 Describe the bug

Add an example

Environment

Minimal Reproducible Example

Additional

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The difference between offical pseudo code and this repository about "num_unroll_steps" #221

Description

Search before asking

🐛 Describe the bug

Add an example

Environment

Minimal Reproducible Example

Additional

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions