`Learner` support for computer vision

Computer Vision Interpret

vision.interpret is the module that implements custom Interpretation classes for different vision tasks by inheriting from it.

class SegmentationInterpretation[source][test]

SegmentationInterpretation(learn:Learner, preds:Tensor, y_true:Tensor, losses:Tensor, ds_type:DatasetType=<DatasetType.Valid: 2>) :: Interpretation

No tests found for SegmentationInterpretation. To contribute a test please refer to this guide and this discussion.

Interpretation methods for segmenatation models.

top_losses[source][test]

top_losses(sizes:Tuple, k:int=None, largest=True)

No tests found for top_losses. To contribute a test please refer to this guide and this discussion.

Reduce flatten loss to give a single loss value for each image

_interp_show[source][test]

_interp_show(ims:ImageSegment, classes:Collection[T_co]=None, sz:int=20, cmap='tab20', title_suffix:str=None)

No tests found for _interp_show. To contribute a test please refer to this guide and this discussion.

Show ImageSegment with color mapping labels

show_xyz[source][test]

show_xyz(i, classes:list=None, sz=10)

No tests found for show_xyz. To contribute a test please refer to this guide and this discussion.

show (image, true and pred) from self.ds with color mappings, optionally only plot

_generate_confusion[source][test]

_generate_confusion()

No tests found for _generate_confusion. To contribute a test please refer to this guide and this discussion.

Average and Per Image Confusion: intersection of pixels given a true label, true label sums to 1

_plot_intersect_cm[source][test]

_plot_intersect_cm(cm, title='Intersection with Predict given True')

No tests found for _plot_intersect_cm. To contribute a test please refer to this guide and this discussion.

Plot confusion matrices: self.mean_cm or self.single_img_cm generated by _generate_confusion

Let's show how SegmentationInterpretation can be used once we train a segmentation model.

train

camvid = untar_data(URLs.CAMVID_TINY)
path_lbl = camvid/'labels'
path_img = camvid/'images'
codes = np.loadtxt(camvid/'codes.txt', dtype=str)
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'
data = (SegmentationItemList.from_folder(path_img)
        .split_by_rand_pct()
        .label_from_func(get_y_fn, classes=codes)
        .transform(get_transforms(), tfm_y=True, size=128)
        .databunch(bs=16, path=camvid)
        .normalize(imagenet_stats))
data.show_batch(rows=2, figsize=(7,5))
learn = unet_learner(data, models.resnet18)
learn.fit_one_cycle(3,1e-2)
learn.save('mini_train')
epoch train_loss valid_loss time
0 10.024513 3.442348 00:15
1 6.325253 2.343699 00:03
2 4.759998 2.108100 00:02

interpret

interp = SegmentationInterpretation.from_learner(learn)

Since FlattenedLoss of CrossEntropyLoss() is used we reshape and then take the mean of pixel losses per image. In order to do so we need to pass sizes:tuple to top_losses()

top_losses, top_idxs = interp.top_losses(sizes=(128,128))
(tensor([3.3195, 3.1692, 2.6574, 2.5976, 2.4910, 2.3759, 2.3710, 2.2064, 2.0871,
         2.0834, 2.0479, 1.8645, 1.8412, 1.7956, 1.7013, 1.6126, 1.6015, 1.5470,
         1.4495, 1.3423]),
 tensor([12,  4, 17, 13, 19, 18,  7,  8, 10,  1, 15,  0,  2,  9, 16, 11, 14,  5,
          6,  3]))

Next, we can generate a confusion matrix similar to what we usually have for classification. Two confusion matrices are generated: mean_cm which represents the global label performance and single_img_cm which represents the same thing but for each individual image in dataset.

Values in the matrix are calculated as:

\begin{align} \ CM_{ij} & = IOU(Predicted , True | True) \\ \end{align}

Or in plain english: ratio of pixels of predicted label given the true pixels

learn.data.classes
array(['Animal', 'Archway', 'Bicyclist', 'Bridge', 'Building', 'Car', 'CartLuggagePram', 'Child', 'Column_Pole',
       'Fence', 'LaneMkgsDriv', 'LaneMkgsNonDriv', 'Misc_Text', 'MotorcycleScooter', 'OtherMoving', 'ParkingBlock',
       'Pedestrian', 'Road', 'RoadShoulder', 'Sidewalk', 'SignSymbol', 'Sky', 'SUVPickupTruck', 'TrafficCone',
       'TrafficLight', 'Train', 'Tree', 'Truck_Bus', 'Tunnel', 'VegetationMisc', 'Void', 'Wall'], dtype='<U17')
mean_cm, single_img_cm = interp._generate_confusion()
((32, 32), (20, 32, 32))

_plot_intersect_cm first displays a dataframe showing per class score using the IOU definition we made earlier. These are the diagonal values from the confusion matrix which is displayed after.

NaN indicate that these labels were not present in our dataset, in this case validation set. As you can imagine it also helps you to maybe construct a better representing validation set.

df = interp._plot_intersect_cm(mean_cm, "Mean of Ratio of Intersection given True Label")
label score
Sky 0.851616
Road 0.793361
Building 0.274023
Tree 0.00469498
Void 6.70092e-05
Animal 0
Pedestrian 0
VegetationMisc 0
Truck_Bus 0
TrafficLight 0
SUVPickupTruck 0
SignSymbol 0
Sidewalk 0
ParkingBlock 0
Archway 0
OtherMoving 0
Misc_Text 0
LaneMkgsDriv 0
Fence 0
Column_Pole 0
Child 0
CartLuggagePram 0
Car 0
Bicyclist 0
Wall 0
Bridge NaN
LaneMkgsNonDriv NaN
MotorcycleScooter NaN
RoadShoulder NaN
TrafficCone NaN
Train NaN
Tunnel NaN

Next let's look at the single worst prediction in our dataset. It looks like this dummy model just predicts everything as Road :)

i = top_idxs[0]
df = interp._plot_intersect_cm(single_img_cm[i], f"Ratio of Intersection given True Label, Image:{i}")
label score
Road 0.999367
Sky 0.405882
Building 0.0479275
Tree 0.00365813
Bicyclist 0
Void 0
TrafficLight 0
SUVPickupTruck 0
Sidewalk 0
Pedestrian 0
OtherMoving 0
Misc_Text 0
LaneMkgsDriv 0
Column_Pole 0
CartLuggagePram 0
Car 0
Wall 0
Animal NaN
Archway NaN
Bridge NaN
Child NaN
Fence NaN
LaneMkgsNonDriv NaN
MotorcycleScooter NaN
ParkingBlock NaN
RoadShoulder NaN
SignSymbol NaN
TrafficCone NaN
Train NaN
Truck_Bus NaN
Tunnel NaN
VegetationMisc NaN

Finally we will visually inspect this single prediction

interp.show_xyz(i, sz=15)
{'Animal': 0,
 'Archway': 1,
 'Bicyclist': 2,
 'Bridge': 3,
 'Building': 4,
 'Car': 5,
 'CartLuggagePram': 6,
 'Child': 7,
 'Column_Pole': 8,
 'Fence': 9,
 'LaneMkgsDriv': 10,
 'LaneMkgsNonDriv': 11,
 'Misc_Text': 12,
 'MotorcycleScooter': 13,
 'OtherMoving': 14,
 'ParkingBlock': 15,
 'Pedestrian': 16,
 'Road': 17,
 'RoadShoulder': 18,
 'Sidewalk': 19,
 'SignSymbol': 20,
 'Sky': 21,
 'SUVPickupTruck': 22,
 'TrafficCone': 23,
 'TrafficLight': 24,
 'Train': 25,
 'Tree': 26,
 'Truck_Bus': 27,
 'Tunnel': 28,
 'VegetationMisc': 29,
 'Void': 30,
 'Wall': 31}

class ObjectDetectionInterpretation[source][test]

ObjectDetectionInterpretation(learn:Learner, preds:Tensor, y_true:Tensor, losses:Tensor, ds_type:DatasetType=<DatasetType.Valid: 2>) :: Interpretation

No tests found for ObjectDetectionInterpretation. To contribute a test please refer to this guide and this discussion.

Interpretation methods for classification models.