How to add custom code between loss.backward() and optimizer.step() with hook? #1274

WYYAHYT · 2023-07-28T10:55:35Z

WYYAHYT
Jul 28, 2023

After loss.backward(), i want to modify the loss of some layers, e.g., slightly add.

However, the update_param() function is deep in the mmengine framework, usually at OptimWrapper class, this can be solved by creating custom OptimWrapper, and overwriting update_param() function. But how can i modify the loss by the layer name or type?? Casue the OptimWrapper class seems can not access the model.

Usually, i can implement such logic with pytorch:

...
loss = ...
loss.backward()
## modify loss of some layers ##
for m in model.modules():
    if isinstance(m, layer_type ) # or by layer name
        m.weight.grad.data.add_(0.001)  # for example
#################################
optimizer.step()
optimizer.zero_grad()
....

It is easy to access the model and modify the loss, but how to implement the code above with mmengine? The way i can figure out is using Hook, but there is not mount point between loss.backward() and optimizer.step since they are wrappered deeper than the trainloop:

for inputs, labels in train_dataloader:
    call_hooks('before_train_iter', hooks)
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    call_hooks('after_train_iter', hooks)
    loss.backward()
    ## ----> here is no hook... <------##
    optimizer.step()
call_hooks('after_train_epoch', hooks)

The code of loss.backward() and optimizer.step() implemented by mmengine are:

def update_params(  # type: ignore
    self,
    loss: torch.Tensor,
    step_kwargs: Optional[Dict] = None,
    zero_kwargs: Optional[Dict] = None) -> None:
"""Update parameters in :attr:`optimizer`.

    Args:
        loss (torch.Tensor): A tensor for back propagation.
        step_kwargs (dict): Arguments for optimizer.step.
            Defaults to None.
            New in version v0.4.0.
        zero_kwargs (dict): Arguments for optimizer.zero_grad.
            Defaults to None.
            New in version v0.4.0.
    """
    if step_kwargs is None:
        step_kwargs = {}
    if zero_kwargs is None:
        zero_kwargs = {}
    loss = self.scale_loss(loss)
    self.backward(loss)  # <-------- HERE 
    # Update parameters only if `self._inner_count` is divisible by
    # `self._accumulative_counts` or `self._inner_count` equals to
    # `self._max_counts`
    if self.should_update():
        self.step(**step_kwargs)  # <-------- HERE
        self.zero_grad(**zero_kwargs)

This function is belong to class OptimWrapper, and it does not have access to the model. It's optimizer is the type of torch.optim.Optimizer, but the model parameters passed to the optimizer are tensors and have no ability to recognize layer.

Really hope someone can help!!

zhouzaida · 2023-07-29T09:56:45Z

zhouzaida
Jul 29, 2023
Maintainer

The least intrusive way is to inherit from MMDistributedDataParallel and override the train_step method, where the backward and step methods of optim_wrapper are invoked.

@MODEL_WRAPPERS.register_module()
class CustomDistributedDataParallel(DistributedDataParallel):
    def train_step(self, data: Union[dict, tuple, list],
                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
        # Enable automatic mixed precision training context.
        with optim_wrapper.optim_context(self):
            data = self.module.data_preprocessor(data, training=True)
            losses = self._run_forward(data, mode='loss')
        parsed_loss, log_vars = self.module.parse_losses(losses)

        loss = optim_wrapper.scale_loss(parsed_loss)
        optim_wrapper.backward(loss)

        # modify gradients here

        if optim_wrapper.should_update():
            optim_wrapper.step()
            optim_wrapper.zero_grad()

        if self.detect_anomalous_params:
            detect_anomalous_params(parsed_loss, model=self)
        return log_vars

And using the model wrapper

# https://github.com/open-mmlab/mmengine/blob/237aee386669f0d69a1caf4724bdc1e826178d7d/mmengine/runner/runner.py#L825
model_wrapper_cfg = dict(
    type='CustomDistributedDataParallel',
    # other configs
)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to add custom code between loss.backward() and optimizer.step() with hook? #1274

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to add custom code between loss.backward() and optimizer.step() with hook? #1274

Uh oh!

Uh oh!

WYYAHYT Jul 28, 2023

Replies: 1 comment

Uh oh!

zhouzaida Jul 29, 2023 Maintainer

WYYAHYT
Jul 28, 2023

zhouzaida
Jul 29, 2023
Maintainer