Beginner's tutorial, explains how to quickly look at your data or model predictions

Viewing inputs and outputs

In this tutorial, we'll see how the same API allows you to get a look at the inputs and outputs of your model, whether in the vision, text or tabular application. We'll go over a lot of different tasks and each time, grab some data in a DataBunch with the data block API, see how to get a look at a few inputs with the show_batch method, train an appropriate Learner then use the show_results method to see what the outputs of our model actually look like.

Vision

To quickly get access to all the vision functions inside fastai, we use the usual import statements.

from fastai.vision import *

A classification problem

Let's begin with our sample of the MNIST dataset.

mnist = untar_data(URLs.MNIST_TINY)
tfms = get_transforms(do_flip=False)

It's set up with an imagenet structure so we use it to load our training and validation datasets, then label, transform, convert them into ImageDataBunch and finally, normalize them.

data = (ImageList.from_folder(mnist)
        .split_by_folder()          
        .label_from_folder()
        .transform(tfms, size=32)
        .databunch()
        .normalize(imagenet_stats))

Once your data is properly set up in a DataBunch, we can call data.show_batch() to see what a sample of a batch looks like.

data.show_batch()

Note that the images were automatically de-normalized before being showed with their labels (inferred from the names of the folder). We can specify a number of rows if the default of 5 is too big, and we can also limit the size of the figure.

data.show_batch(rows=3, figsize=(4,4))

Now let's create a Learner object to train a classifier.

learn = cnn_learner(data, models.resnet18, metrics=accuracy)
learn.fit_one_cycle(1,1e-2)
learn.save('mini_train')
epoch train_loss valid_loss accuracy time
0 0.779994 0.744115 0.779685 00:01

Our model has quickly reached around 91% accuracy, now let's see its predictions on a sample of the validation set. For this, we use the show_results method.

learn.show_results()

Since the validation set is usually sorted, we get only images belonging to the same class. We can then again specify a number of rows, a figure size, but also the dataset on which we want to make predictions.

learn.show_results(ds_type=DatasetType.Train, rows=4, figsize=(8,10))

A multilabel problem

Now let's try these on the planet dataset, which is a little bit different in the sense that each image can have multiple tags (and not just one label).

planet = untar_data(URLs.PLANET_TINY)
planet_tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)

Here each images is labelled in a file named 'labels.csv'. We have to add 'train' as a prefix to the filenames, '.jpg' as a suffix and the labels are separated by spaces.

data = (ImageList.from_csv(planet, 'labels.csv', folder='train', suffix='.jpg')
        .split_by_rand_pct()
        .label_from_df(label_delim=' ')
        .transform(planet_tfms, size=128)
        .databunch()
        .normalize(imagenet_stats))

And we can have look at our data with data.show_batch.

data.show_batch(rows=2, figsize=(9,7))

Then we can then create a Learner object pretty easily and train it for a little bit.

learn = cnn_learner(data, models.resnet18)
learn.fit_one_cycle(5,1e-2)
learn.save('mini_train')
epoch train_loss valid_loss time
0 1.024820 1.014537 00:01
1 0.948518 1.114616 00:01
2 0.887977 1.109744 00:01
3 0.839809 0.983482 00:01
4 0.794769 0.869911 00:01

And to see actual predictions, we just have to run learn.show_results().

learn.show_results(rows=3, figsize=(12,15))

A regression example

For the next example, we are going to use the BIWI head pose dataset. On pictures of persons, we have to find the center of their face. For the fastai docs, we have built a small subsample of the dataset (200 images) and prepared a dictionary for the correspondance filename to center.

biwi = untar_data(URLs.BIWI_SAMPLE)
fn2ctr = pickle.load(open(biwi/'centers.pkl', 'rb'))

To grab our data, we use this dictionary to label our items. We also use the PointsItemList class to have the targets be of type ImagePoints (which will make sure the data augmentation is properly applied to them). When calling transform we make sure to set tfm_y=True.

data = (PointsItemList.from_folder(biwi)
        .split_by_rand_pct(seed=42)
        .label_from_func(lambda o:fn2ctr[o.name])
        .transform(get_transforms(), tfm_y=True, size=(120,160))
        .databunch()
        .normalize(imagenet_stats))

Then we can have a first look at our data with data.show_batch().

data.show_batch(rows=3, figsize=(9,6))

We train our model for a little bit before using learn.show_results().

learn = cnn_learner(data, models.resnet18, lin_ftrs=[100], ps=0.05)
learn.fit_one_cycle(5, 5e-2)
learn.save('mini_train')
epoch train_loss valid_loss time
0 2.439939 161.106430 00:01
1 4.106437 64.897110 00:01
2 3.450002 10.024299 00:01
3 2.684047 20.760201 00:01
4 2.189258 11.589193 00:01
learn.show_results(rows=3)

A segmentation example

Now we are going to look at the camvid dataset (at least a small sample of it), where we have to predict the class of each pixel in an image. Each image in the 'images' subfolder as an equivalent in 'labels' that is its segmentations mask.

camvid = untar_data(URLs.CAMVID_TINY)
path_lbl = camvid/'labels'
path_img = camvid/'images'

We read the classes in 'codes.txt' and the function maps each image filename with its corresponding mask filename.

codes = np.loadtxt(camvid/'codes.txt', dtype=str)
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'

The data block API allows us to uickly get everything in a DataBunch and then we can have a look with show_batch.

data = (SegmentationItemList.from_folder(path_img)
        .split_by_rand_pct()
        .label_from_func(get_y_fn, classes=codes)
        .transform(get_transforms(), tfm_y=True, size=128)
        .databunch(bs=16, path=camvid)
        .normalize(imagenet_stats))
data.show_batch(rows=2, figsize=(7,5))

Then we train a Unet for a few epochs.

learn = unet_learner(data, models.resnet18)
learn.fit_one_cycle(3,1e-2)
learn.save('mini_train')
epoch train_loss valid_loss time
0 17.764713 3.646572 00:04
1 9.990233 2.052990 00:01
2 6.974250 1.872651 00:01
learn.show_results()

Text

Next application is text, so let's start by importing everything we'll need.

from fastai.text import *

Language modelling

First we'll fine-tune a pretrained language model on our subset of imdb.

imdb = untar_data(URLs.IMDB_SAMPLE)
data_lm = (TextList.from_csv(imdb, 'texts.csv', cols='text')
                   .split_by_rand_pct()
                   .label_for_lm()
                   .databunch())
data_lm.save()

data.show_batch() will work here as well. For a language model, it shows us the beginning of each sequence of text along the batch dimension (the target being to guess the next word).

data_lm.show_batch()
idx text
0 ! ! ! xxmaj finally this was directed by the guy who did xxmaj big xxmaj xxunk ? xxmaj must be a xxunk of xxmaj jonestown - hollywood style . xxmaj xxunk ! xxbos xxmaj every once in a long while a movie will come along that will be so awful that i feel compelled to warn people . xxmaj if i labor all my days and i can save
1 a grand voyage for the audience as well as the two principals . xxmaj the imagery throughout is impressive , especially the final scenes in xxmaj xxunk . xxmaj it xxunk for me once again how much different the world can be , but also at the same time , how similar . xxmaj the same was true for the father and son in this film . \n \n
2 xxunk between the xxunk -- resulting in a xxup we vs. xxup they mentality . xxmaj later , an explosion causes a huge xxunk in the xxmaj french and the xxmaj xxunk refuse to sit back and do nothing . xxmaj xxunk their own lives , they prove that there is true xxunk between miners and men in general . \n \n xxmaj the film is a strong criticism
3 put the camera man on roller xxunk and pushed him along . xxmaj the story ( if it can be called that ) is so full of holes it 's almost funny , xxmaj it never really explains why the hell he survived in the first place , or needs human flesh in order to survive . xxmaj the script is poorly written and the dialogue xxunk on just plane
4 them and insults them because they play woods and blah blah blah xxmaj the phantom helps these xxunk kids out and trains them and all this crap , he gets them to play airball and basically xxunk all the xxunk including the " xxunk " . \n \n xxmaj so what exactly is wrong with the movie ? xxmaj well the budget is a huge thing , a paintball

Now let's define a language model learner

learn = language_model_learner(data_lm, AWD_LSTM)
learn.fit_one_cycle(2, 1e-2)
learn.save('mini_train_lm')
learn.save_encoder('mini_train_encoder')
epoch train_loss valid_loss accuracy time
0 4.353577 3.759297 0.292604 00:04
1 4.070464 3.740016 0.294851 00:04

Then we can have a look at the results. It shows a certain amount of words (default 20), then the next 20 target words and the ones that were predicted.

learn.show_results()
text target pred
xxbos xxmaj this is one of those movies that 's difficult to review without giving away the plot . xxmaj xxunk to say there are weird things and unexpected twists going on , beyond the initial xxunk " xxmaj tom it is the that 's a things about n't xxunk and on , but the xxunk xxunk of . xxunk
we are going to green light ! ! " xxmaj and whoever that person is , should have his or her head examined for actual brain xxunk . xxmaj because whoever is responsible for actually xxunk out money to have her xxunk xxunk . the xxunk xxunk . xxmaj the of 's xxunk for the xxunk the the , make
a society which is supposedly gone and yet somehow is still with us . xxbos xxmaj for those who like their murder xxunk busy , this is definitely the one to see , as it is xxunk full of interesting this xxunk , , , xxmaj is a a best that watch the and xxmaj is a . of xxunk
awhile but not all of them get the treatment they deserve . xxmaj the nice supporting cast includes xxmaj xxunk xxmaj xxunk , at his best in a xxunk comic performance as a xxunk xxunk , xxmaj xxunk xxmaj xxunk xxmaj xxunk , xxmaj least best , the xxunk , book , xxmaj xxunk , , and xxunk xxmaj xxunk
\n \n xxmaj as such , when i first heard about the xxunk of a prequel series some months got a sick feeling in my xxunk . i was afraid that the formula that made xxmaj xxunk so successful was a lot xxunk of the xxunk . xxmaj was not of i movie was i me xxunk xxmaj xxunk

Classification

Now let's see a classification example. We have to use the same vocabulary as for the language model if we want to be able to use the encoder we saved.

data_clas = (TextList.from_csv(imdb, 'texts.csv', cols='text', vocab=data_lm.vocab)
                   .split_from_df(col='is_valid')
                   .label_from_df(cols='label')
                   .databunch(bs=42))

Here show_batch shows the beginning of each review with its target.

data_clas.show_batch()
text target
xxbos xxmaj xxunk xxmaj victor xxmaj xxunk : a xxmaj review \n \n xxmaj you know , xxmaj xxunk xxmaj victor xxmaj xxunk is like sticking your hands into a big , xxunk xxunk of xxunk . xxmaj it 's warm and xxunk , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and xxunk xxmaj xxunk xxmaj negative
xxbos xxup the xxup shop xxup around xxup the xxup xxunk is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that tries too hard , nor does it come up with positive
xxbos xxmaj now that xxmaj che(2008 ) has finished its relatively short xxmaj australian cinema run ( extremely limited xxunk screen in xxmaj xxunk , after xxunk ) , i can xxunk join both xxunk of " xxmaj at xxmaj the xxmaj movies " in taking xxmaj steven xxmaj soderbergh to task . \n \n xxmaj it 's usually satisfying to watch a film director change his style / negative
xxbos xxmaj this film sat on my xxmaj xxunk for weeks before i watched it . i xxunk a self - indulgent xxunk flick about relationships gone bad . i was wrong ; this was an xxunk xxunk into the xxunk - up xxunk of xxmaj new xxmaj xxunk . \n \n xxmaj the format is the same as xxmaj max xxmaj xxunk ' " xxmaj la xxmaj xxunk positive
xxbos xxmaj many xxunk that this is n't just a classic due to the fact that it 's the first xxup 3d game , or even the first xxunk - up . xxmaj it 's also one of the first xxunk games , one of the xxunk definitely the first ) truly claustrophobic games , and just a pretty well - xxunk gaming experience in general . xxmaj with graphics positive

And we can train a classifier that uses our previous encoder.

learn = text_classifier_learner(data_clas, AWD_LSTM)
learn.load_encoder('mini_train_encoder')
learn.fit_one_cycle(2, slice(1e-3,1e-2))
learn.save('mini_train_clas')
epoch train_loss valid_loss accuracy time
0 0.673165 0.649376 0.670000 00:04
1 0.622713 0.607453 0.700000 00:04
learn.show_results()
text target prediction
xxbos \n \n i 'm sure things did n't exactly go the same way in the real life of xxmaj homer xxmaj hickam as they did in the film adaptation of his book , xxmaj rocket xxmaj boys , but the movie " xxmaj october xxmaj sky " ( an xxunk of the book 's title ) is good enough to stand alone . i have not read xxmaj positive positive
xxbos xxmaj to review this movie , i without any doubt would have to quote that memorable scene in xxmaj tarantino 's " xxmaj pulp xxmaj fiction " ( xxunk ) when xxmaj jules and xxmaj vincent are talking about xxmaj mia xxmaj wallace and what she does for a living . xxmaj jules tells xxmaj vincent that the " xxmaj only thing she did worthwhile was pilot " . negative positive
xxbos xxmaj how viewers react to this new " adaption " of xxmaj shirley xxmaj jackson 's book , which was xxunk as xxup not being a remake of the original 1963 movie ( true enough ) , will be based , i suspect , on the following : those who were big fans of either the book or original movie are not going to think much of this one negative negative
xxbos xxmaj the trouble with the book , " xxmaj memoirs of a xxmaj geisha " is that it had xxmaj japanese xxunk but underneath the xxunk it was all an xxmaj american man 's way of thinking . xxmaj reading the book is like watching a magnificent ballet with great music , sets , and costumes yet performed by xxunk animals dressed in those xxunk far from xxmaj japanese negative negative
xxbos xxmaj bonanza had a great cast of wonderful actors . xxmaj xxunk xxmaj xxunk , xxmaj pernell xxmaj whitaker , xxmaj michael xxmaj xxunk , xxmaj dan xxmaj blocker , and even xxmaj guy xxmaj williams ( as the cousin who was brought in for several episodes during 1964 to replace xxmaj adam when he was leaving the series ) . xxmaj the cast had chemistry , and they positive positive

Tabular

Last application brings us to tabular data. First let's import everything we'll need.

from fastai.tabular import *

We'll use a sample of the adult dataset here. Once we read the csv file, we'll need to specify the dependant variable, the categorical variables, the continuous variables and the processors we want to use.

adult = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(adult/'adult.csv')
dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
cont_names = ['education-num', 'hours-per-week', 'age', 'capital-loss', 'fnlwgt', 'capital-gain']
procs = [FillMissing, Categorify, Normalize]

Then we can use the data block API to grab everything together before using data.show_batch()

data = (TabularList.from_df(df, path=adult, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(valid_idx=range(800,1000))
                           .label_from_df(cols=dep_var)
                           .databunch())
data.show_batch()
workclass education marital-status occupation relationship race sex native-country education-num_na education-num hours-per-week age capital-loss fnlwgt capital-gain target
Private Some-college Married-spouse-absent Adm-clerical Not-in-family White Female United-States False -0.0312 -1.9796 0.9831 -0.2164 0.2733 1.7560 >=50k
Private 11th Never-married Handlers-cleaners Own-child White Male United-States False -1.2046 -0.8456 -1.5823 -0.2164 0.3955 -0.1459 <50k
Private HS-grad Never-married Other-service Not-in-family Black Male ? False -0.4224 -0.0356 -0.6294 -0.2164 -0.4278 -0.1459 <50k
Local-gov HS-grad Married-civ-spouse Craft-repair Husband White Male United-States False -0.4224 -0.4406 1.0564 -0.2164 -0.6222 0.5311 <50k
Private Bachelors Separated Exec-managerial Not-in-family Black Female United-States False 1.1422 0.3694 -0.4095 -0.2164 1.4279 -0.1459 <50k

Here we grab a tabular_learner that we train for a little bit.

learn = tabular_learner(data, layers=[200,100], metrics=accuracy)
learn.fit(5, 1e-2)
learn.save('mini_train')
epoch train_loss valid_loss accuracy time
0 0.321381 0.343558 0.845000 00:06
1 0.339366 0.338962 0.845000 00:05
2 0.331168 0.342357 0.840000 00:05
3 0.323553 0.343808 0.850000 00:05
4 0.327218 0.351525 0.835000 00:05

And we can use learn.show_results().

learn.show_results()
workclass education marital-status occupation relationship race sex native-country education-num_na education-num hours-per-week age capital-loss fnlwgt capital-gain target prediction
Private Some-college Divorced Handlers-cleaners Unmarried White Female United-States True -0.0312 -0.0356 0.4701 -0.2164 -0.8793 -0.1459 <50k <50k
Self-emp-inc Prof-school Married-civ-spouse Prof-specialty Husband White Male United-States True -0.0312 1.5843 0.5434 -0.2164 0.0290 1.8829 >=50k >=50k
Private Assoc-voc Divorced #na# Not-in-family White Male United-States True -0.0312 -0.1976 -0.1896 -0.2164 1.7704 -0.1459 <50k <50k
Federal-gov Bachelors Never-married Tech-support Not-in-family White Male United-States True -0.0312 0.3694 -0.9959 -0.2164 -1.3242 -0.1459 <50k <50k
Private Bachelors Married-civ-spouse #na# Husband White Male United-States True -0.0312 -0.0356 -0.1163 -0.2164 -0.2389 -0.1459 <50k <50k