Basic helper functions for the fastai library

Basic core

This module contains all the basic functions we need in other modules of the fastai library (split with torch_core that contains the ones requiring pytorch). Its documentation can easily be skipped at a first read, unless you want to know what a given function does.

Global constants

default_cpus = min(16, num_cpus())

Check functions

has_arg[source][test]

has_arg(func, arg) → bool

No tests found for has_arg. To contribute a test please refer to this guide and this discussion.

Check if func accepts arg.

Examples for two fastai.core functions. Docstring shown before calling has_arg for reference

has_arg(download_url,'url')
True
has_arg(index_row,'x')
False
has_arg(index_row,'a')
True

ifnone[source][test]

ifnone(a:Any, b:Any) → Any

Tests found for ifnone:

  • pytest -sv tests/test_core.py::test_ifnone [source]

To run tests please refer to this guide.

a if a is not None, otherwise b.

param,alt_param = None,5
ifnone(param,alt_param)
5
param,alt_param = None,[1,2,3]
ifnone(param,alt_param)
[1, 2, 3]

is1d[source][test]

is1d(a:Collection[T_co]) → bool

Tests found for is1d:

  • pytest -sv tests/test_core.py::test_is1d [source]

To run tests please refer to this guide.

Return True if a is one-dimensional

two_d_array = np.arange(12).reshape(6,2)
print( two_d_array )
print( is1d(two_d_array) )
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]]
False
is1d(two_d_array.flatten())
True

is_listy[source][test]

is_listy(x:Any) → bool

Tests found for is_listy:

  • pytest -sv tests/test_core.py::test_listy [source]

To run tests please refer to this guide.

Check if x is a Collection. Tuple or List qualify

some_data = [1,2,3]
is_listy(some_data)
True
some_data = (1,2,3)
is_listy(some_data)
True
some_data = 1024
print( is_listy(some_data) )
False
print( is_listy( [some_data] ) )
True
some_data = dict([('a',1),('b',2),('c',3)])
print( some_data )
print( some_data.keys() )
{'a': 1, 'b': 2, 'c': 3}
dict_keys(['a', 'b', 'c'])
print( is_listy(some_data) )
print( is_listy(some_data.keys()) )
False
False
print( is_listy(list(some_data.keys())) )
True

is_tuple[source][test]

is_tuple(x:Any) → bool

Tests found for is_tuple:

  • pytest -sv tests/test_core.py::test_tuple [source]

To run tests please refer to this guide.

Check if x is a tuple.

print( is_tuple( [1,2,3] ) )
False
print( is_tuple( (1,2,3) ) )
True

arange_of[source][test]

arange_of(x)

No tests found for arange_of. To contribute a test please refer to this guide and this discussion.

Same as range_of but returns an array.

arange_of([5,6,7])
array([0, 1, 2])
type(arange_of([5,6,7]))
numpy.ndarray

array[source][test]

array(a, dtype:type=None, **kwargs) → ndarray

Tests found for array:

Some other tests where array is used:

  • pytest -sv tests/test_core.py::test_arrays_split [source]
  • pytest -sv tests/test_core.py::test_even_mults [source]
  • pytest -sv tests/test_core.py::test_idx_dict [source]
  • pytest -sv tests/test_core.py::test_is1d [source]
  • pytest -sv tests/test_core.py::test_itembase_eq [source]
  • pytest -sv tests/test_core.py::test_itembase_hash [source]
  • pytest -sv tests/test_core.py::test_one_hot [source]
  • pytest -sv tests/test_torch_core.py::test_model_type [source]
  • pytest -sv tests/test_torch_core.py::test_tensor_array_monkey_patch [source]
  • pytest -sv tests/test_torch_core.py::test_tensor_with_ndarray [source]
  • pytest -sv tests/test_torch_core.py::test_to_detach [source]

To run tests please refer to this guide.

Same as np.array but also handles generators. kwargs are passed to np.array with dtype.

array([1,2,3])
array([1, 2, 3])

Note that after we call the generator, we do not reset. So the array call has 5 less entries than it would if we ran from the start of the generator.

def data_gen():
    i = 100.01
    while i<200:
        yield i
        i += 1.

ex_data_gen = data_gen()
for _ in range(5):
    print(next(ex_data_gen))
100.01
101.01
102.01
103.01
104.01
array(ex_data_gen)
array([105.01, 106.01, 107.01, 108.01, ..., 196.01, 197.01, 198.01, 199.01])
ex_data_gen_int = data_gen()

array(ex_data_gen_int,dtype=int)  #Cast output to int array
array([100, 101, 102, 103, ..., 196, 197, 198, 199])

arrays_split[source][test]

arrays_split(mask:ndarray, *arrs:NPArrayableList) → SplitArrayList

Tests found for arrays_split:

  • pytest -sv tests/test_core.py::test_arrays_split [source]

To run tests please refer to this guide.

Given arrs is [a,b,...] and maskindex - return[(a[mask],a[~mask]),(b[mask],b[~mask]),...].

data_a = np.arange(15)
data_b = np.arange(15)[::-1]

mask_a = (data_a > 10)
print(data_a)
print(data_b)
print(mask_a)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[14 13 12 11 10  9  8  7  6  5  4  3  2  1  0]
[False False False False False False False False False False False  True  True  True  True]
arrays_split(mask_a,data_a)
[(array([11, 12, 13, 14]),),
 (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),)]
np.vstack([data_a,data_b]).transpose().shape
(15, 2)
arrays_split(mask_a,np.vstack([data_a,data_b]).transpose()) #must match on dimension 0
[(array([[11,  3],
         [12,  2],
         [13,  1],
         [14,  0]]),), (array([[ 0, 14],
         [ 1, 13],
         [ 2, 12],
         [ 3, 11],
         [ 4, 10],
         [ 5,  9],
         [ 6,  8],
         [ 7,  7],
         [ 8,  6],
         [ 9,  5],
         [10,  4]]),)]

chunks[source][test]

chunks(l:Collection[T_co], n:int) → Iterable

Tests found for chunks:

  • pytest -sv tests/test_core.py::test_chunks [source]

To run tests please refer to this guide.

Yield successive n-sized chunks from l.

You can transform a Collection into an Iterable of 'n' sized chunks by calling chunks:

data = [0,1,2,3,4,5,6,7,8,9]
for chunk in chunks(data, 2):
    print(chunk)
[0, 1]
[2, 3]
[4, 5]
[6, 7]
[8, 9]
for chunk in chunks(data, 3):
    print(chunk)
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

df_names_to_idx[source][test]

df_names_to_idx(names:IntsOrStrs, df:DataFrame)

Tests found for df_names_to_idx:

  • pytest -sv tests/test_core.py::test_df_names_to_idx [source]

To run tests please refer to this guide.

Return the column indexes of names in df.

ex_df = pd.DataFrame.from_dict({"a":[1,1,1],"b":[2,2,2]})
print(ex_df)
   a  b
0  1  2
1  1  2
2  1  2
df_names_to_idx('b',ex_df)
[1]

extract_kwargs[source][test]

extract_kwargs(names:StrList, kwargs:KWArgs)

No tests found for extract_kwargs. To contribute a test please refer to this guide and this discussion.

Extract the keys in names from the kwargs.

key_word_args = {"a":2,"some_list":[1,2,3],"param":'mean'}
key_word_args
{'a': 2, 'some_list': [1, 2, 3], 'param': 'mean'}
(extracted_val,remainder) = extract_kwargs(['param'],key_word_args)
print( extracted_val,remainder )
{'param': 'mean'} {'a': 2, 'some_list': [1, 2, 3]}

idx_dict[source][test]

idx_dict(a)

Tests found for idx_dict:

  • pytest -sv tests/test_core.py::test_idx_dict [source]

To run tests please refer to this guide.

Create a dictionary value to index from a.

idx_dict(['a','b','c'])
{'a': 0, 'b': 1, 'c': 2}

index_row[source][test]

index_row(a:Union[Collection[T_co], DataFrame, Series], idxs:Collection[int]) → Any

No tests found for index_row. To contribute a test please refer to this guide and this discussion.

Return the slice of a corresponding to idxs.

a is basically something you can index into like a dataframe, an array or a list.

data = [0,1,2,3,4,5,6,7,8,9]
index_row(data,4)
4
index_row(pd.Series(data),7)
7
data_df = pd.DataFrame([data[::-1],data]).transpose()
data_df
0 1
0 9 0
1 8 1
2 7 2
3 6 3
4 5 4
5 4 5
6 3 6
7 2 7
8 1 8
9 0 9
index_row(data_df,7)
0    2
1    7
Name: 7, dtype: int64

listify[source][test]

listify(p:OptListOrItem=None, q:OptListOrItem=None)

Tests found for listify:

  • pytest -sv tests/test_core.py::test_listify [source]

To run tests please refer to this guide.

Make p listy and the same length as q.

to_match = np.arange(12)
listify('a',to_match)
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']
listify('a',5)
['a', 'a', 'a', 'a', 'a']
listify(77.1,3)
[77.1, 77.1, 77.1]
listify( (1,2,3) )
[1, 2, 3]
listify((1,2,3),('a','b','c'))
[1, 2, 3]

random_split[source][test]

random_split(valid_pct:float, *arrs:NPArrayableList) → SplitArrayList

Tests found for random_split:

  • pytest -sv tests/test_core.py::test_random_split [source]

To run tests please refer to this guide.

Randomly split arrs with valid_pct ratio. good for creating validation set.

Splitting is done here with random.uniform() so you may not get the exact split percentage for small data sets

data = np.arange(20).reshape(10,2)
data.tolist()
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7],
 [8, 9],
 [10, 11],
 [12, 13],
 [14, 15],
 [16, 17],
 [18, 19]]
random_split(0.20,data.tolist())
[(array([[ 0,  1],
         [ 2,  3],
         [ 4,  5],
         [ 6,  7],
         [ 8,  9],
         [10, 11],
         [12, 13],
         [14, 15],
         [16, 17],
         [18, 19]]),), (array([], shape=(0, 2), dtype=int64),)]
random_split(0.20,pd.DataFrame(data))
[(array([[ 0,  1],
         [ 4,  5],
         [ 8,  9],
         [10, 11],
         [16, 17],
         [18, 19]]),), (array([[ 2,  3],
         [ 6,  7],
         [12, 13],
         [14, 15]]),)]

range_of[source][test]

range_of(x)

No tests found for range_of. To contribute a test please refer to this guide and this discussion.

Create a range from 0 to len(x).

range_of([5,4,3])
[0, 1, 2]
range_of(np.arange(10)[::-1])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

series2cat[source][test]

series2cat(df:DataFrame, *col_names)

Tests found for series2cat:

  • pytest -sv tests/test_core.py::test_series2cat [source]

To run tests please refer to this guide.

Categorifies the columns col_names in df.

data_df = pd.DataFrame.from_dict({"a":[1,1,1,2,2,2],"b":['f','e','f','g','g','g']})
data_df
a b
0 1 f
1 1 e
2 1 f
3 2 g
4 2 g
5 2 g
data_df['b']
0    f
1    e
2    f
3    g
4    g
5    g
Name: b, dtype: object
series2cat(data_df,'b')
data_df['b']
0    f
1    e
2    f
3    g
4    g
5    g
Name: b, dtype: category
Categories (3, object): [e < f < g]
series2cat(data_df,'a')
data_df['a']
0    1
1    1
2    1
3    2
4    2
5    2
Name: a, dtype: category
Categories (2, int64): [1 < 2]

split_kwargs_by_func[source][test]

split_kwargs_by_func(kwargs, func)

No tests found for split_kwargs_by_func. To contribute a test please refer to this guide and this discussion.

Split kwargs between those expected by func and the others.

key_word_args = {'url':'http://fast.ai','dest':'./','new_var':[1,2,3],'testvalue':42}
split_kwargs_by_func(key_word_args,download_url)
({'url': 'http://fast.ai', 'dest': './'},
 {'new_var': [1, 2, 3], 'testvalue': 42})

to_int[source][test]

to_int(b:Any) → Union[int, List[int]]

Tests found for to_int:

  • pytest -sv tests/test_core.py::test_to_int [source]

To run tests please refer to this guide.

Recursively convert b to an int or list/dict of ints; raises exception if not convertible.

to_int(3.1415)
3
data = [1.2,3.4,7.25]
to_int(data)
[1, 3, 7]

uniqueify[source][test]

uniqueify(x:Series, sort:bool=False) → List[T]

Tests found for uniqueify:

  • pytest -sv tests/test_core.py::test_uniqueify [source]

To run tests please refer to this guide.

Return sorted unique values of x.

uniqueify( pd.Series(data=['a','a','b','b','f','g']) )
['a', 'b', 'f', 'g']

Metaclasses

show_doc(PrePostInitMeta)

class PrePostInitMeta[source][test]

PrePostInitMeta(name, bases, dct) :: type

No tests found for PrePostInitMeta. To contribute a test please refer to this guide and this discussion.

A metaclass that calls optional __pre_init__ and __post_init__ methods

class _T(metaclass=PrePostInitMeta):
    def __pre_init__(self):  self.a  = 0; assert self.a==0
    def __init__(self):      self.a += 1; assert self.a==1
    def __post_init__(self): self.a += 1; assert self.a==2

t = _T()
t.a
2

Files management and downloads

download_url[source][test]

download_url(url:str, dest:str, overwrite:bool=False, pbar:ProgressBar=None, show_progress=True, chunk_size=1048576, timeout=4, retries=5)

Tests found for download_url:

  • pytest -sv tests/test_core.py::test_download_url [source]

To run tests please refer to this guide.

Download url to dest unless it exists and not overwrite.

find_classes[source][test]

find_classes(folder:Path) → FilePathList

Tests found for find_classes:

  • pytest -sv tests/test_core.py::test_find_classes [source]

To run tests please refer to this guide.

List of label subdirectories in imagenet-style folder.

join_path[source][test]

join_path(fname:PathOrStr, path:PathOrStr='.') → Path

Tests found for join_path:

  • pytest -sv tests/test_core.py::test_join_paths [source]

To run tests please refer to this guide.

Return Path(path)/Path(fname), path defaults to current dir.

join_paths[source][test]

join_paths(fnames:FilePathList, path:PathOrStr='.') → FilePathList

Tests found for join_paths:

Some other tests where join_paths is used:

  • pytest -sv tests/test_core.py::test_join_paths [source]

To run tests please refer to this guide.

Join path to every file name in fnames.

loadtxt_str[source][test]

loadtxt_str(path:PathOrStr) → ndarray

No tests found for loadtxt_str. To contribute a test please refer to this guide and this discussion.

Return ndarray of str of lines of text from path.

save_texts[source][test]

save_texts(fname:PathOrStr, texts:StrList)

No tests found for save_texts. To contribute a test please refer to this guide and this discussion.

Save in fname the content of texts.

Multiprocessing

num_cpus[source][test]

num_cpus() → int

Tests found for num_cpus:

  • pytest -sv tests/test_core.py::test_cpus [source]

To run tests please refer to this guide.

Get number of cpus

parallel[source][test]

parallel(func, arr:Collection[T_co], max_workers:int=None, leave=False)

No tests found for parallel. To contribute a test please refer to this guide and this discussion.

Call func on every element of arr in parallel using max_workers.

func must accept both the value and index of each arr element.

def my_func(value, index):
    print("Index: {}, Value: {}".format(index, value))
 
my_array = [i*2 for i in range(5)]
parallel(my_func, my_array, max_workers=3)
100.00% [5/5 00:00<00:00]
Index: 0, Value: 0
Index: 1, Value: 2
Index: 2, Value: 4
Index: 4, Value: 8
Index: 3, Value: 6

partition[source][test]

partition(a:Collection[T_co], sz:int) → List[Collection[T_co]]

Tests found for partition:

  • pytest -sv tests/test_core.py::test_partition_functionality [source]

Some other tests where partition is used:

  • pytest -sv tests/test_core.py::test_partition [source]

To run tests please refer to this guide.

Split iterables a in equal parts of size sz

partition_by_cores[source][test]

partition_by_cores(a:Collection[T_co], n_cpus:int) → List[Collection[T_co]]

No tests found for partition_by_cores. To contribute a test please refer to this guide and this discussion.

Split data in a equally among n_cpus cores

Data block API

class ItemBase[source][test]

ItemBase(data:Any)

Tests found for ItemBase:

Some other tests where ItemBase is used:

  • pytest -sv tests/test_core.py::test_itembase_eq [source]
  • pytest -sv tests/test_core.py::test_itembase_hash [source]

To run tests please refer to this guide.

Base item type in the fastai library.

All items used in fastai should subclass this. Must have a data field that will be used when collating in mini-batches.

apply_tfms[source][test]

apply_tfms(tfms:Collection[T_co], **kwargs)

No tests found for apply_tfms. To contribute a test please refer to this guide and this discussion.

Subclass this method if you want to apply data augmentation with tfms to this ItemBase.

show[source][test]

show(ax:Axes, **kwargs)

No tests found for show. To contribute a test please refer to this guide and this discussion.

Subclass this method if you want to customize the way this ItemBase is shown on ax.

The default behavior is to set the string representation of this object as title of ax.

class Category[source][test]

Category(data, obj) :: ItemBase

Tests found for Category:

  • pytest -sv tests/test_core.py::test_itembase_eq [source]

Some other tests where Category is used:

  • pytest -sv tests/test_core.py::test_itembase_hash [source]

To run tests please refer to this guide.

Basic class for single classification labels.

Create a Category with an obj of index data in a certain classes list.

class EmptyLabel[source][test]

EmptyLabel() :: ItemBase

No tests found for EmptyLabel. To contribute a test please refer to this guide and this discussion.

Should be used for a dummy label.

class MultiCategory[source][test]

MultiCategory(data, obj, raw) :: ItemBase

Tests found for MultiCategory:

  • pytest -sv tests/test_core.py::test_itembase_eq [source]

Some other tests where MultiCategory is used:

  • pytest -sv tests/test_core.py::test_itembase_hash [source]

To run tests please refer to this guide.

Basic class for multi-classification labels.

Create a MultiCategory with an obj that is a collection of labels. data corresponds to the one-hot encoded labels and raw is a list of associated string.

class FloatItem[source][test]

FloatItem(obj) :: ItemBase

Tests found for FloatItem:

  • pytest -sv tests/test_core.py::test_itembase_eq [source]

Some other tests where FloatItem is used:

  • pytest -sv tests/test_core.py::test_itembase_hash [source]

To run tests please refer to this guide.

Basic class for float items.

Others

camel2snake[source][test]

camel2snake(name:str) → str

Tests found for camel2snake:

  • pytest -sv tests/test_core.py::test_camel2snake [source]

To run tests please refer to this guide.

Change name from camel to snake style.

camel2snake('DeviceDataLoader')
'device_data_loader'

even_mults[source][test]

even_mults(start:float, stop:float, n:int) → ndarray

Tests found for even_mults:

  • pytest -sv tests/test_core.py::test_even_mults [source]

To run tests please refer to this guide.

Build log-stepped array from start to stop in n steps.

In linear scales each element is equidistant from its neighbors:

# from 1 to 10 in 5 steps
t = np.linspace(1, 10, 5)
t
array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])
for i in range(len(t) - 1):
    print(t[i+1] - t[i])
2.25
2.25
2.25
2.25

In logarithmic scales, each element is a multiple of the previous entry:

t = even_mults(1, 10, 5)
t
array([ 1.      ,  1.778279,  3.162278,  5.623413, 10.      ])
# notice how each number is a multiple of its predecessor
for i in range(len(t) - 1):
    print(t[i+1] / t[i])
1.7782794100389228
1.7782794100389228
1.7782794100389228
1.7782794100389228

func_args[source][test]

func_args(func) → bool

No tests found for func_args. To contribute a test please refer to this guide and this discussion.

Return the arguments of func.

func_args(download_url)
('url',
 'dest',
 'overwrite',
 'pbar',
 'show_progress',
 'chunk_size',
 'timeout',
 'retries')

Additionally, func_args can be used with functions that do not belong to the fastai library

func_args(np.linspace)
('start', 'stop', 'num', 'endpoint', 'retstep', 'dtype')

noop[source][test]

noop(x)

Tests found for noop:

  • pytest -sv tests/test_core.py::test_noop [source]

To run tests please refer to this guide.

Return x.

# object is returned as-is
noop([1,2,3])
[1, 2, 3]

one_hot[source][test]

one_hot(x:Collection[int], c:int)

Tests found for one_hot:

  • pytest -sv tests/test_core.py::test_one_hot [source]

To run tests please refer to this guide.

One-hot encode x with c classes.

One-hot encoding is a standard machine learning technique. Assume we are dealing with a 10-class classification problem and we are supplied a list of labels:

y = [1, 4, 4, 5, 7, 9, 2, 4, 0]
jekyll_note("""y is zero-indexed, therefore its first element (1) belongs to class 2, its second element (4) to class 5 and so on.""")
len(y)
9

y can equivalently be expressed as a matrix of 9 rows and 10 columns, where each row represents one element of the original y.

for label in y:
    print(one_hot(label, 10))
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

show_some[source][test]

show_some(items:Collection[T_co], n_max:int=5, sep:str=',')

No tests found for show_some. To contribute a test please refer to this guide and this discussion.

Return the representation of the first n_max elements in items.

# select 3 elements from a list
some_data = show_some([10, 20, 30, 40, 50], 3) 
some_data
'10,20,30...'
type(some_data) 
str
# the separator can be changed
some_data = show_some([10, 20, 30, 40, 50], 3, sep = '---') 
some_data
'10---20---30...'
some_data[:-3]
'10---20---30'

show_some can take as input any class with __len__ and __getitem__

class Any(object):
    def __init__(self, data):
        self.data = data
    def __len__(self):
        return len(self.data)
    def __getitem__(self,i):
        return self.data[i]
 
some_other_data = Any('nice')
show_some(some_other_data, 2)
'n,i...'

subplots[source][test]

subplots(rows:int, cols:int, imgsize:int=4, figsize:Optional[Tuple[int, int]]=None, title=None, **kwargs)

Tests found for subplots:

  • pytest -sv tests/test_core.py::test_subplots_multi_row_cols [source]
  • pytest -sv tests/test_core.py::test_subplots_single [source]

To run tests please refer to this guide.

Like plt.subplots but with consistent axs shape, kwargs passed to fig.suptitle with title

text2html_table[source][test]

text2html_table(items:Tokens) → str

No tests found for text2html_table. To contribute a test please refer to this guide and this discussion.

Put the texts in items in an HTML table, widths are the widths of the columns in %.