This provides both a standalone class and a callback for registering and automatically deregistering PyTorch hooks, along with some pre-defined hooks. Hooks can be attached to any
nn.Module, for either the forward or the backward pass.
We'll start by looking at the pre-defined hook
ActivationStats, then we'll see how to create our own.
ActivationStats saves the layer activations in
self.stats for all
modules passed to it. By default it will save activations for all modules. For instance:
path = untar_data(URLs.MNIST_SAMPLE) data = ImageDataBunch.from_folder(path) #learn = cnn_learner(data, models.resnet18, callback_fns=ActivationStats) learn = Learner(data, simple_cnn((3,16,16,2)), callback_fns=ActivationStats) learn.fit(1)
stats is a
FloatTensor of shape
(2,num_modules,num_batches). The first axis is
torch.Size([2, 3, 193])
So this shows the standard deviation (
axis0==1) of 2th last layer (
axis1==-2) for each batch (
Registers and manually deregisters a PyTorch hook. Your
hook_func will be called automatically when forward/backward (depending on
is_forward) for your module
m is run, and the result of that function is placed in
Deregister the hook, if not called already.
Acts as a
hooks[i]) and an
for hook in hooks) of a group of hooks, one for each module in
ms, with the ability to remove all as a group. Use
stored to get all hook results.
is_forward behavior is the same as
Hook. See the source code for
HookCallback for a simple example.
Deregister all hooks created by this class, if not previously called.
Tests found for
pytest -sv tests/test_basic_train.py::test_export_load_learner[source]
pytest -sv tests/test_callbacks_hooks.py::test_model_summary_collab[source]
pytest -sv tests/test_callbacks_hooks.py::test_model_summary_tabular[source]
pytest -sv tests/test_callbacks_hooks.py::test_model_summary_text[source]
pytest -sv tests/test_callbacks_hooks.py::test_model_summary_vision[source]
To run tests please refer to this guide.
Print a summary of
m using a output text width of
This method only works on a
Learner object with
train_ds in it. If it was created as a result of
load_learner, there is no
data to run through the model and therefore it's not possible to create such summary.
summary looks like:
====================================================================== Layer (type) Output Shape Param # Trainable ====================================================================== Conv2d [64, 176, 176] 9,408 False ______________________________________________________________________ BatchNorm2d [64, 176, 176] 128 True ______________________________________________________________________ ReLU [64, 176, 176] 0 False ______________________________________________________________________ MaxPool2d [64, 88, 88] 0 False ______________________________________________________________________ Conv2d [64, 88, 88] 36,864 False ...
Layer (type) is the name of the corresponding
Output Shape is the shape of the output of the corresponding layer (minus the batch dimension, which is always the same and has no impact on the model params).
Param # is the number of weights (and optionally bias), and it will vary for each layer.
The number of params is calculated differently for each layer type. Here is how it's calculated for some of the most common layer types:
(n_in+bias) * n_out
2 * n_out
n_embed * emb_sz
Trainable indicates whether a layer is trainable or not.
- Layers with
0parameters are always Untrainable (e.g.,
- Other layers are either Trainable or not, usually depending on whether they are frozen or not. See Discriminative layer training.
- Layers with
To better understand this summary it helps to also execute
learn.model and correlate the two outputs.
Let's feed to a
Learner a dataset of 3-channel images size 352x352 and look at the model and its summary:
data.train_ds.data.shape learn = cnn_learner(data, models.resnet34, ...) print(learn.model) print(learn.summary())
Here are the outputs with everything but the relevant to the example lines removed:
torch.Size([3, 352, 352]) [...] (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) [...] (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) [...] (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): Linear(in_features=512, out_features=37, bias=True) ====================================================================== Layer (type) Output Shape Param # Trainable ====================================================================== Conv2d [64, 176, 176] 9,408 False ______________________________________________________________________ BatchNorm2d [64, 176, 176] 128 True ______________________________________________________________________ [...] MaxPool2d [64, 88, 88] 0 False ______________________________________________________________________ Conv2d [64, 88, 88] 36,864 False [...] ______________________________________________________________________ Linear  18,981 True
So let's calculate some params:
Conv2d layers, multiply the first 4 numbers from the corresponding layer definition:
Conv2d(3, 64, kernel_size=(7, 7), ...) 3*64*7*7 = 9,408 Conv2d(64, 64, kernel_size=(3, 3), ...) 64*64*3*3 = 36,864
BatchNorm2d layer, multiply the first number by 2:
BatchNorm2d(64, ...) 64*2 = 128
Linear we multiply the first 2 and include the bias if it's
Linear(in_features=512, out_features=37, bias=True) (512+1)*37 = 18,981
Now let's calculate some output shapes:
We started with 3x352x352 image and run it through this layer:
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
How did we get:
[64, 176, 176]
The number of output channels is
64, that's the first dimension in the number above. And then our image of
352x352 got convolved into
176x176 because of stride
Then we had:
MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
[64, 176, 176] to
[64, 88, 88] again because of stride 2.
And so on, finishing with:
Linear(in_features=512, out_features=37, bias=True)
which reduced everything to just
It can be useful to get the size of each layer of a model (e.g. for printing a summary, or for generating cross-connections for a
DynamicUnet), however they depend on the size of the input. This function calculates the layer sizes by passing in a minimal tensor of
Callback that can be used to register hooks on
modules. Implement the corresponding function in
modules, uses a callback to automatically register a method
self.hook (that you must define in an inherited class) as a hook. This method must have the signature:
def hook(self, m:Model, input:Tensors, output:Tensors)
do_remove then the hook is automatically deregistered at the end of training. See
ActivationStats for a simple example of inheriting from this class.