You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose that I have a convolutional neural networks like LeNet, I reparametrize it by introducing mask variables for convolution channels: I multiply each output channel of convolution layers by a scalar mask variable. The original parameters of the model before reparametrization are treated as constants, so the only parameters of the reparametrized model are the mask parameters.
My question is, which of the two cases would you expect to consume less memory?
case A: on the original model, take Jacobian of model output with respect to model parameters
case B: on the reparametrized model, take Jacobian of model output with respect to mask parameters
Intuitively, there are far fewer channels (e.g. 70 in LeNet) than parameters (e.g. 657K in LeNet) in the original model, so the Jacobian in case B should require far less memory than case A. However, I found that empirically the maximum batch size allowed in both cases (before out-of-memory error happens) is the same, around 73 samples. The GPU I used has about 11 GB memory. The input data has shape (32, 32, 3).
Below is the code that shows how I reparametrize the model with explicit_reparametrized; the class ChannelMaskLayer is a custom layer that multiply the channel output with mask parameters; lenet_5_caffe is a model whose parameters include both parameters of the original model and the mask parameters; explicit_reparametrized is like a functools.partial that treats the original parameters as constant, the function it outputs is the reparametrized model in case B.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Suppose that I have a convolutional neural networks like LeNet, I reparametrize it by introducing mask variables for convolution channels: I multiply each output channel of convolution layers by a scalar mask variable. The original parameters of the model before reparametrization are treated as constants, so the only parameters of the reparametrized model are the mask parameters.
My question is, which of the two cases would you expect to consume less memory?
Intuitively, there are far fewer channels (e.g. 70 in LeNet) than parameters (e.g. 657K in LeNet) in the original model, so the Jacobian in case B should require far less memory than case A. However, I found that empirically the maximum batch size allowed in both cases (before out-of-memory error happens) is the same, around 73 samples. The GPU I used has about 11 GB memory. The input data has shape (32, 32, 3).
Below is the code that shows how I reparametrize the model with
explicit_reparametrized
; the classChannelMaskLayer
is a custom layer that multiply the channel output with mask parameters;lenet_5_caffe
is a model whose parameters include both parameters of the original model and the mask parameters;explicit_reparametrized
is like afunctools.partial
that treats the original parameters as constant, the function it outputs is the reparametrized model in case B.Any help would be really appreciated!
Beta Was this translation helpful? Give feedback.
All reactions