Computer vision data¶
This module contains the classes that define datasets handling Image
objects and their transformations. As usual, we'll start with a quick overview, before we get in to the detailed API docs.
Before any work can be done a dataset needs to be converted into a DataBunch
object, and in the case of the computer vision data - specifically into an ImageDataBunch
subclass.
This is done with the help of data block API and the ImageList
class and its subclasses.
However, there is also a group of shortcut methods provided by ImageDataBunch
which reduce the multiple stages of the data block API, into a single wrapper method. These shortcuts methods work really well for:
- Imagenet-style of datasets (
ImageDataBunch.from_folder
) - A pandas
DataFrame
with a column of filenames and a column of labels which can be strings for classification, strings separated by alabel_delim
for multi-classification or floats for a regression problem (ImageDataBunch.from_df
) - A csv file with the same format as above (
ImageDataBunch.from_csv
) - A list of filenames and a list of targets (
ImageDataBunch.from_lists
) - A list of filenames and a function to get the target from the filename (
ImageDataBunch.from_name_func
) - A list of filenames and a regex pattern to get the target from the filename (
ImageDataBunch.from_name_re
)
In the last five factory methods, a random split is performed between train and validation, in the first one it can be a random split or a separation from a training and a validation folder.
If you're just starting out you may choose to experiment with these shortcut methods, as they are also used in the first lessons of the fastai deep learning course. However, you can completely skip them and start building your code using the data block API from the very beginning. Internally, these shortcuts use this API anyway.
The first part of this document is dedicated to the shortcut ImageDataBunch
factory methods. Then all the other computer vision data-specific methods that are used with the data block API are presented.
Quickly get your data ready for training¶
To get you started as easily as possible, the fastai provides two helper functions to create a DataBunch
object that you can directly use for training a classifier. To demonstrate them you'll first need to download and untar the file by executing the following cell. This will create a data folder containing an MNIST subset in data/mnist_sample
.
path = untar_data(URLs.MNIST_SAMPLE); path
There are a number of ways to create an ImageDataBunch
. One common approach is to use Imagenet-style folders (see a ways down the page below for details) with ImageDataBunch.from_folder
:
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=24)
Here the datasets will be automatically created in the structure of Imagenet-style folders. The parameters specified:
- the transforms to apply to the images in
ds_tfms
(here withdo_flip
=False because we don't want to flip numbers), - the target
size
of our pictures (here 24).
As with all DataBunch
usage, a train_dl
and a valid_dl
are created that are of the type PyTorch DataLoader
.
If you want to have a look at a few images inside a batch, you can use DataBunch.show_batch
. The rows
argument is the number of rows and columns to display.
data.show_batch(rows=3, figsize=(5,5))
The second way to define the data for a classifier requires a structure like this:
path\
train\
test\
labels.csv
where the labels.csv file defines the label(s) of each image in the training set. This is the format you will need to use when each image can have multiple labels. It also works with single labels:
pd.read_csv(path/'labels.csv').head()
You can then use ImageDataBunch.from_csv
:
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)
data.show_batch(rows=3, figsize=(5,5))
An example of multiclassification can be downloaded with the following cell. It's a sample of the planet dataset.
planet = untar_data(URLs.PLANET_SAMPLE)
If we open the labels files, we seach that each image has one or more tags, separated by a space.
df = pd.read_csv(planet/'labels.csv')
df.head()
data = ImageDataBunch.from_csv(planet, folder='train', size=128, suffix='.jpg', label_delim=' ',
ds_tfms=get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.))
The show_batch
method will then print all the labels that correspond to each image.
data.show_batch(rows=3, figsize=(10,8), ds_type=DatasetType.Valid)
You can find more ways to build an ImageDataBunch
without the factory methods in data_block
.
This is the same initialization as a regular DataBunch
so you probably don't want to use this directly, but one of the factory methods instead.
Factory methods¶
If you quickly want to get a ImageDataBunch
and train a model, you should process your data to have it in one of the formats the following functions handle.
Refer to create_from_ll
to see all the **kwargs
arguments.
"Imagenet-style" datasets look something like this (note that the test folder is optional):
path\
train\
class1\
class2\
...
valid\
class1\
class2\
...
test\
For example:
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=24)
Note that this (and all factory methods in this section) pass any kwargs
to DataBunch.create
.
Refer to create_from_ll
to see all the **kwargs
arguments.
Create an ImageDataBunch
from path
by splitting the data in folder
and labelled in a file csv_labels
between a training and validation set. Use valid_pct
to indicate the percentage of the total images to use as the validation set. An optional test
folder contains unlabelled data and suffix
contains an optional suffix to add to the filenames in csv_labels
(such as '.jpg'). fn_col
is the index (or the name) of the the column containing the filenames and label_col
is the index (indices) (or the name(s)) of the column(s) containing the labels. Use header
to specify the format of the csv header, and delimiter
to specify a non-standard csv-field separator. In case your csv has no header, column parameters can only be specified as indices. If label_delim
is passed, split what's in the label column according to that separator.
For example:
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=24);
Refer to create_from_ll
to see all the **kwargs
arguments.
Same as ImageDataBunch.from_csv
, but passing in a DataFrame
instead of a csv file. e.g
df = pd.read_csv(path/'labels.csv', header='infer')
df.head()
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
Different datasets are labeled in many different ways. The following methods can help extract the labels from the dataset in a wide variety of situations. The way they are built in fastai is constructive: there are methods which do a lot for you but apply in specific circumstances and there are methods which do less for you but give you more flexibility.
In this case the hierarchy is:
ImageDataBunch.from_name_re
: Gets the labels from the filenames using a regular expressionImageDataBunch.from_name_func
: Gets the labels from the filenames using any functionImageDataBunch.from_lists
: Labels need to be provided as an input in a list
Refer to create_from_ll
to see all the **kwargs
arguments.
Creates an ImageDataBunch
from fnames
, calling a regular expression (containing one re group) on the file names to get the labels, putting aside valid_pct
for the validation. In the same way as ImageDataBunch.from_csv
, an optional test
folder contains unlabelled data.
Our previously created dataframe contains the labels in the filenames so we can leverage it to test this new method. ImageDataBunch.from_name_re
needs the exact path of each file so we will append the data path to each filename before creating our ImageDataBunch
object.
fn_paths = [path/name for name in df['name']]; fn_paths[:2]
pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
data.classes
Refer to create_from_ll
to see all the **kwargs
arguments.
Works in the same way as ImageDataBunch.from_name_re
, but instead of a regular expression it expects a function that will determine how to extract the labels from the filenames. (Note that from_name_re
uses this function in its implementation).
To test it we could build a function with our previous regex. Let's try another, similar approach to show that the labels can be obtained in a different way.
def get_labels(file_path): return '3' if '/3/' in str(file_path) else '7'
data = ImageDataBunch.from_name_func(path, fn_paths, label_func=get_labels, ds_tfms=tfms, size=24)
data.classes
Refer to create_from_ll
to see all the **kwargs
arguments.
The most flexible factory function; pass in a list of labels
that correspond to each of the filenames in fnames
.
To show an example we have to build the labels list outside our ImageDataBunch
object and give it as an argument when we call from_lists
. Let's use our previously created function to create our labels list.
labels_ls = list(map(get_labels, fn_paths))
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels_ls, ds_tfms=tfms, size=24)
data.classes
Use bs
, num_workers
, collate_fn
and a potential test
folder. ds_tfms
is a tuple of two lists of transforms to be applied to the training and the validation (plus test optionally) set. tfms
are the transforms to apply to the DataLoader
. The size
and the kwargs
are passed to the transforms for data augmentation.
Other methods¶
In the next few methods we will use another dataset, CIFAR. This is because the second method will get the statistics for our dataset and we want to be able to show different statistics per channel. If we were to use MNIST, these statistics would be the same for every channel. White pixels are [255,255,255] and black pixels are [0,0,0] (or in normalized form [1,1,1] and [0,0,0]) so there is no variance between channels.
path = untar_data(URLs.CIFAR); path
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, valid='test', size=24)
def channel_view(x:Tensor)->Tensor:
"Make channel the first axis of `x` and flatten remaining axes"
return x.transpose(0,1).contiguous().view(x.shape[1],-1)
This function takes a tensor and flattens all dimensions except the channels, which it keeps as the first axis. This function is used to feed ImageDataBunch.batch_stats
so that it can get the pixel statistics of a whole batch.
Let's take as an example the dimensions our MNIST batches: 128, 3, 24, 24.
t = torch.Tensor(128, 3, 24, 24)
t.size()
tensor = channel_view(t)
tensor.size()
data.batch_stats()
In the fast.ai library we have imagenet_stats
, cifar_stats
and mnist_stats
so we can add normalization easily with any of these datasets. Let's see an example with our dataset of choice: MNIST.
data.normalize(cifar_stats)
data.batch_stats()
Data normalization¶
You may also want to normalize your data, which can be done by using the following functions.
On MNIST the mean and std are 0.1307 and 0.3081 respectively (looked on Google). If you're using a pretrained model, you'll need to use the normalization that was used to train the model. The imagenet norm and denorm functions are stored as constants inside the library named imagenet_norm
and imagenet_denorm
. If you're training a model on CIFAR-10, you can also use cifar_norm
and cifar_denorm
.
You may sometimes see warnings about clipping input data when plotting normalized data. That's because even although it's denormalized when plotting automatically, sometimes floating point errors may make some values slightly out or the correct range. You can safely ignore these warnings in this case.
data = ImageDataBunch.from_folder(untar_data(URLs.MNIST_SAMPLE),
ds_tfms=tfms, size=24)
data.normalize()
data.show_batch(rows=3, figsize=(6,6))
To use this dataset and collate samples into batches, you'll need to following function:
ItemList specific to vision¶
The vision application adds a few subclasses of ItemList
specific to images.
It inherits from ItemList
and overwrite ItemList.get
to call open_image
in order to turn an image file in Path
object into an Image
object. label_cls
can be specified for the labels, xtra
contains any extra information (usually in the form of a dataframe) and processor
is applied to the ItemList
after splitting and labelling.
How ImageList.__init__
overwrites on ItemList.__init__
?
ImageList.__init__
creates additional attributes like convert_mode
, after_open
, c
, size
upon ItemList.__init__
; and convert_mode
and sizes
in particular are necessary to make use of ImageList.get
(which also overwrites on ItemList.get
) and ImageList.open
.
How ImageList.from_folder
overwrites on ItemList.from_folder
?
ImageList.from_folder
adds some constraints on extensions
upon ItemList.from_folder
, to work with image files specifically; and can take additional input arguments like convert_mode
and after_open
which are not available to ItemList
.
Let's get a feel of how open
is used with the following example.
from fastai.vision import *
path_data = untar_data(URLs.PLANET_TINY); path_data.ls()
imagelistRGB = ImageList.from_folder(path_data/'train'); imagelistRGB
open
takes only one input fn
as filename
in the type of Path
or String
.
imagelistRGB.items[10]
imagelistRGB.open(imagelistRGB.items[10])
imagelistRGB[10]
print(imagelistRGB[10])
The reason why imagelistRGB[10]
print out an image, is because behind the scene we have ImageList.get
calls ImageList.open
which calls open_image
which uses PIL.Image.open(fn).convert(convert_mode)
to open an image file (how we print the image), and finally turns it into an Image object with shape (3, 128, 128)
Internally, ImageList.open
passes ImageList.convert_mode
and ImageList.after_open
to open_image
to adjust the appearance of the Image object. For example, setting convert_mode
to L
can make images black and white.
imagelistRGB.convert_mode = 'L'
imagelistRGB.open(imagelistRGB.items[10])
Building your own dataset¶
This module also contains a few helper functions to allow you to build you own dataset for image classification.
It will try if every image in this folder can be opened and has n_channels
. If n_channels
is 3 – it'll try to convert image to RGB. If delete=True
, it'll be removed it this fails. If resume
– it will skip already existent images in dest
. If max_size
is specified, image is resized to the same ratio so that both sizes are less than max_size
, using interp
. Result is stored in dest
, ext
forces an extension type, img_format
and kwargs
are passed to PIL.Image.save. Use max_workers
CPUs.