Should I expect the memory usage to reduce if I reparametrize the network with channel mask for convolutional layer?? #6486

qixuanf · 2021-04-18T08:40:28Z

qixuanf
Apr 18, 2021

Suppose that I have a convolutional neural networks like LeNet, I reparametrize it by introducing mask variables for convolution channels: I multiply each output channel of convolution layers by a scalar mask variable. The original parameters of the model before reparametrization are treated as constants, so the only parameters of the reparametrized model are the mask parameters.

My question is, which of the two cases would you expect to consume less memory?

case A: on the original model, take Jacobian of model output with respect to model parameters
case B: on the reparametrized model, take Jacobian of model output with respect to mask parameters

Intuitively, there are far fewer channels (e.g. 70 in LeNet) than parameters (e.g. 657K in LeNet) in the original model, so the Jacobian in case B should require far less memory than case A. However, I found that empirically the maximum batch size allowed in both cases (before out-of-memory error happens) is the same, around 73 samples. The GPU I used has about 11 GB memory. The input data has shape (32, 32, 3).

Below is the code that shows how I reparametrize the model with explicit_reparametrized; the class ChannelMaskLayer is a custom layer that multiply the channel output with mask parameters; lenet_5_caffe is a model whose parameters include both parameters of the original model and the mask parameters; explicit_reparametrized is like a functools.partial that treats the original parameters as constant, the function it outputs is the reparametrized model in case B.

Any help would be really appreciated!

import os
import pdb

import jax
import tree
from jax import numpy as jnp
from typing import Optional, Callable
import haiku as hk

os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"] = "platform"
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"] = "false"


class ChannelMaskLayer(hk.Module):
    def __init__(
        self,
        init: Optional[hk.initializers.Initializer] = None,
        name: Optional[str] = None,
    ):
        super().__init__(name=name)
        self.input_size = None
        self.init = init

    def __call__(self, inputs: jnp.ndarray) -> jnp.ndarray:
        if not inputs.shape:
            raise ValueError("Input must not be scalar.")
        input_size = self.input_size = inputs.shape[-1]
        init = self.init
        dtype = inputs.dtype
        if init is None:
            init = hk.initializers.Constant(1.0)
        mask = hk.get_parameter("mask", [input_size], dtype, init=init)
        out = jnp.multiply(inputs, mask)
        return out


def _lenet_5_caffe(x):
    net = hk.Sequential(
        [
            hk.Conv2D(20, 5, padding="VALID", data_format="NHWC"),
            ChannelMaskLayer(),
            jax.nn.relu,
            hk.MaxPool(
                window_shape=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding="VALID"
            ),
            hk.Conv2D(50, 5, padding="VALID", data_format="NHWC"),
            ChannelMaskLayer(),
            jax.nn.relu,
            hk.MaxPool(
                window_shape=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding="VALID"
            ),
            hk.Flatten(),
            hk.Linear(500),
            jax.nn.relu,
            hk.Linear(10),
        ]
    )
    logits = net(x)
    return logits


def masked_lenet_5_caffe():
    return hk.without_apply_rng(hk.transform(_lenet_5_caffe))


def explicit_reparametrized(params: hk.Params) -> Callable:
    _, apply_fn = masked_lenet_5_caffe()

    def reparametrized_apply_fn(mask_params: hk.Params, *args, **kwargs):
        full_params = hk.data_structures.merge(params, mask_params)
        return apply_fn(full_params, *args, **kwargs)

    return reparametrized_apply_fn


def main():
    key = jax.random.PRNGKey(10)
    input_x = jax.random.normal(key, shape=(79, 32, 32, 3))
    init_fn, _ = masked_lenet_5_caffe()
    full_params = init_fn(key, input_x)
    params, channel_params = hk.data_structures.partition(
        predicate=lambda module_name, v_name, val: "channel_mask_layer" in module_name,
        structure=full_params,
    )
    channel_apply = explicit_reparametrized(params)
    jacobian = jax.jacobian(channel_apply)(channel_params, input_x)
    for x in tree.flatten(jacobian):
        x.block_until_ready()
    jax.profiler.save_device_memory_profile("memory_jac_explicit.prof")


if __name__ == "__main__":
    main()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should I expect the memory usage to reduce if I reparametrize the network with channel mask for convolutional layer?? #6486

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Should I expect the memory usage to reduce if I reparametrize the network with channel mask for convolutional layer?? #6486

Uh oh!

Uh oh!

qixuanf Apr 18, 2021

Replies: 0 comments

qixuanf
Apr 18, 2021