`Learner` support for computer vision

Computer Vision Learner

vision.learner is the module that defines the cnn_learner method, to easily get a model suitable for transfer learning.

Transfer learning

Transfer learning is a technique where you use a model trained on a very large dataset (usually ImageNet in computer vision) and then adapt it to your own dataset. The idea is that it has learned to recognize many features on all of this data, and that you will benefit from this knowledge, especially if your dataset is small, compared to starting from a randomly initialized model. It has been proved in this article on a wide range of tasks that transfer learning nearly always give better results.

In practice, you need to change the last part of your model to be adapted to your own number of classes. Most convolutional models end with a few linear layers (a part we will call the head). The last convolutional layer will have analyzed features in the image that went through the model, and the job of the head is to convert those in predictions for each of our classes. In transfer learning we will keep all the convolutional layers (called the body or the backbone of the model) with their weights pretrained on ImageNet but will define a new head initialized randomly.

Then we will train the model we obtain in two phases: first we freeze the body weights and only train the head (to convert those analyzed features into predictions for our own data), then we unfreeze the layers of the backbone (gradually if necessary) and fine-tune the whole model (possibly using differential learning rates).

The cnn_learner factory method helps you to automatically get a pretrained model from a given architecture with a custom head that is suitable for your data.


cnn_learner(data:DataBunch, base_arch:Callable, cut:Union[int, Callable]=None, pretrained:bool=True, lin_ftrs:Optional[Collection[int]]=None, ps:Floats=0.5, custom_head:Optional[Module]=None, split_on:Union[Callable, Collection[ModuleList], NoneType]=None, bn_final:bool=False, init='kaiming_normal_', concat_pool:bool=True, **kwargs:Any) → Learner

No tests found for cnn_learner. To contribute a test please refer to this guide and this discussion.

Build convnet style learner.

This method creates a Learner object from the data object and model inferred from it with the backbone given in base_arch. Specifically, it will cut the model defined by arch (randomly initialized if pretrained is False) at the last convolutional layer by default (or as defined in cut, see below) and add:

The blocks are defined by the lin_ftrs and ps arguments. Specifically, the first block will have a number of inputs inferred from the backbone base_arch and the last one will have a number of outputs equal to data.c (which contains the number of classes of the data) and the intermediate blocks have a number of inputs/outputs determined by lin_ftrs (of course a block has a number of inputs equal to the number of outputs of the previous block). The default is to have an intermediate hidden size of 512 (which makes two blocks model_activation -> 512 -> n_classes). If you pass a float then the final dropout layer will have the value ps, and the remaining will be ps/2. If you pass a list then the values are used for dropout probabilities directly.

Note that the very last block doesn't have a nn.ReLU activation, to allow you to use any final activation you want (generally included in the loss function in pytorch). Also, the backbone will be frozen if you choose pretrained=True (so only the head will train if you call fit) so that you can immediately start phase one of training as described above.

Alternatively, you can define your own custom_head to put on top of the backbone. If you want to specify where to split base_arch you should so in the argument cut which can either be the index of a specific layer (the result will not include that layer) or a function that, when passed the model, will return the backbone you want.

The final model obtained by stacking the backbone and the head (custom or defined as we saw) is then separated in groups for gradual unfreezing or differential learning rates. You can specify how to split the backbone in groups with the optional argument split_on (should be a function that returns those groups when given the backbone).

The kwargs will be passed on to Learner, so you can put here anything that Learner will accept (metrics, loss_func, opt_func...)

path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
learner = cnn_learner(data, models.resnet18, metrics=[accuracy])
epoch train_loss valid_loss accuracy time
0 0.132899 0.069354 0.978901 00:06


unet_learner(data:DataBunch, arch:Callable, pretrained:bool=True, blur_final:bool=True, norm_type:Optional[NormType]=None, split_on:Union[Callable, Collection[ModuleList], NoneType]=None, blur:bool=False, self_attention:bool=False, y_range:OptRange=None, last_cross:bool=True, bottle:bool=False, cut:Union[int, Callable]=None, **learn_kwargs:Any) → Learner

No tests found for unet_learner. To contribute a test please refer to this guide and this discussion.

Build Unet learner from data and arch.

This time the model will be a DynamicUnet with an encoder based on arch (maybe pretrained) that is cut depending on split_on. blur_final, norm_type, blur, self_attention, y_range, last_cross and bottle are passed to unet constructor, the kwargs are passed to the initialization of the Learner.

Get predictions

Once you've actually trained your model, you may want to use it on a single image. This is done by using the following method.


predict(item:ItemBase, return_x:bool=False, batch_first:bool=True, with_dropout:bool=False, **kwargs)

Tests found for predict:

  • pytest -sv tests/test_vision_train.py::test_models_meta [source]
  • pytest -sv tests/test_vision_train.py::test_preds [source]

To run tests please refer to this guide.

Return predicted class, label and probabilities for item.

img = learner.data.train_ds[0][0]
(Category 3, tensor(0), tensor([0.6472, 0.3528]))

Here the predict class for our image is '3', which corresponds to a label of 0. The probabilities the model found for each class are 0.65 and 0.35 respectively, so its confidence is pretty high.

Note that if you want to load your trained model and use it on inference mode with the previous function, you should export your Learner.


And then you can load it with an empty data object that has the same internal state like this:

learn = load_learner(path)

Customize your model

You can customize cnn_learner for your own model's default cut and split_on functions by adding them to the dictionary model_meta. The key should be your model and the value should be a dictionary with the keys cut and split_on (see the source code for examples). The constructor will call create_body and create_head for you based on cut; you can also call them yourself, which is particularly useful for testing.


create_body(arch:Callable, pretrained:bool=True, cut:Union[int, Callable, NoneType]=None)

Tests found for create_body:

  • pytest -sv tests/test_vision_learner.py::test_create_body [source]

To run tests please refer to this guide.

Cut off the body of a typically pretrained model at cut (int) or cut the model as specified by cut(model) (function).


create_head(nf:int, nc:int, lin_ftrs:Optional[Collection[int]]=None, ps:Floats=0.5, concat_pool:bool=True, bn_final:bool=False)

Tests found for create_head:

  • pytest -sv tests/test_vision_learner.py::test_create_head [source]

To run tests please refer to this guide.

Model head that takes nf features, runs through lin_ftrs, and ends with nc classes. ps is the probability of the dropouts, as documented above in cnn_learner.

class ClassificationInterpretation[source][test]

ClassificationInterpretation(learn:Learner, preds:Tensor, y_true:Tensor, losses:Tensor, ds_type:DatasetType=<DatasetType.Valid: 2>) :: Interpretation

Tests found for ClassificationInterpretation:

  • pytest -sv tests/test_vision_train.py::test_ClassificationInterpretation [source]

Some other tests where ClassificationInterpretation is used:

  • pytest -sv tests/test_tabular_train.py::test_confusion_tabular [source]
  • pytest -sv tests/test_vision_train.py::test_interp [source]

To run tests please refer to this guide.

Interpretation methods for classification models.

This provides a confusion matrix and visualization of the most incorrect images. Pass in your data, calculated preds, actual y, and your losses, and then use the methods below to view the model interpretation results. For instance:

learn = cnn_learner(data, models.resnet18)
preds,y,losses = learn.get_preds(with_loss=True)
interp = ClassificationInterpretation(learn, preds, y, losses)

The following factory method gives a more convenient way to create an instance of this class:


from_learner(learn:Learner, ds_type:DatasetType=<DatasetType.Valid: 2>, activ:Module=None, tta=False)

Tests found for _cl_int_from_learner:

  • pytest -sv tests/test_vision_train.py::test_interp [source]

To run tests please refer to this guide.

Create an instance of ClassificationInterpretation. tta indicates if we want to use Test Time Augmentation.

You can also use a shortcut learn.interpret() to do the same.


interpret(learn:Learner, ds_type:DatasetType=<DatasetType.Valid: 2>, tta=False)

Tests found for _learner_interpret:

  • pytest -sv tests/test_vision_train.py::test_interp_shortcut [source]

To run tests please refer to this guide.

Create a ClassificationInterpretation object from learner on ds_type with tta.

Note that this shortcut is a Learner object/class method that can be called as: learn.interpret().


plot_top_losses(k, largest=True, figsize=(12, 12), heatmap:bool=False, heatmap_thresh:int=16, alpha:float=0.6, cmap:str='magma', show_text:bool=True, return_fig:bool=None) → Optional[Figure]

No tests found for _cl_int_plot_top_losses. To contribute a test please refer to this guide and this discussion.

Show images in top_losses along with their prediction, actual, loss, and probability of actual class.

The k items are arranged as a square, so it will look best if k is a square number (4, 9, 16, etc). The title of each image shows: prediction, actual, loss, probability of actual class. When heatmap is True (by default it's False) , Grad-CAM heatmaps (http://openaccess.thecvf.com/content_ICCV_2017/papers/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.pdf) are overlaid on each image. plot_top_losses should be used with single-labeled datasets. See plot_multi_top_losses below for a version capable of handling multi-labeled datasets.

interp.plot_top_losses(9, figsize=(7,7))


plot_multi_top_losses(samples:int=3, figsize:Tuple[int, int]=(8, 8), save_misclassified:bool=False)

No tests found for _cl_int_plot_multi_top_losses. To contribute a test please refer to this guide and this discussion.

Show images in top_losses along with their prediction, actual, loss, and probability of predicted class in a multilabeled dataset.

Similar to plot_top_losses() but aimed at multi-labeled datasets. It plots misclassified samples sorted by their respective loss. Since you can have multiple labels for a single sample, they can easily overlap in a grid plot. So it plots just one sample per row.
Note that you can pass save_misclassified=True (by default it's False). In such case, the method will return a list containing the misclassified images which you can use to debug your model and/or tune its hyperparameters.