datasets¶

This module has the necessary functions to be able to download several useful datasets that we might be interested in using in our models.

This contains all the datasets' and models' URLs, and some classmethods to help use them - you don't create objects of this class. The supported datasets are (with their calling name): S3_NLP, S3_COCO, MNIST_SAMPLE, MNIST_TINY, IMDB_SAMPLE, ADULT_SAMPLE, ML_SAMPLE, PLANET_SAMPLE, CIFAR, PETS, MNIST. To get details on the datasets you can see the fast.ai datasets webpage. Datasets with SAMPLE in their name are subsets of the original datasets. In the case of MNIST, we also have a TINY dataset which is even smaller than MNIST_SAMPLE.

Models is now limited to WT103 but you can expect more in the future!

URLs.MNIST_SAMPLE

'http://files.fast.ai/data/examples/mnist_sample'

Downloading Data¶

For the rest of the datasets you will need to download them with untar_data or download_data. untar_data will decompress the data file and download it while download_data will just download and save the compressed file in .tgz format.

The locations where the data and models are downloaded are set in config.yml, which by default is located in ~/.fastai. This directory can be changed via the optional environment variable FASTAI_HOME (e.g FASTAI_HOME=/home/.fastai).

If no config.yml is present in the specified directory, a default one will be created with data_archive_path, data_path and models_path entries. The data_path and models_path entries point respectively to data folder and models folder in the same directory as config.yml. The data_archive_path allows you to set a separate folder to save compressed datasets for archiving purposes. It defaults to the same directory as data_path.

Configure those download locations by editing data_archive_path, data_path and models_path in config.yml.

In general, untar_data uses a url to download a tgz file under fname, and then un-tgz fname into a folder under dest.

If you have run untar_data before, then running untar_data(URLs.something) again will just return you dest without downloading again.

If you have run untar_data before, then running untar_data again with force_download=True or the tgz file under fname being corrupted somehow, will remove the existing fname and dest and start downloading again.

If you have run untar_data before, but dest does not exist, meaning no folder under dest exists (the folder could be removed or renamed somehow), then running untar_data(URLs.something) again will execute download_data. Furthermore, if the tgz file under fname does exist, then there will be no actual downloading rather than un-tgz fname into dest; if fname does not exist, then downloading for the tgz file will be actually executed.

Note: the url you feed to untar_data must be one of URLs.something.

untar_data(URLs.PLANET_SAMPLE)

PosixPath('/home/ubuntu/.fastai/data/planet_sample')

Note: If the data file already exists in a data directory inside the notebook, that data file will be used instead of the one present in the folder specified in config.yml. config.yml is located in the directory specified in optional environment variable FASTAI_HOME (defaults to ~/.fastai/). Paths are resolved by calling the function datapath4file - which checks if data exists locally (data/) first, before downloading to the folder specified in config.yml.

Example:

download_data(URLs.PLANET_SAMPLE)

PosixPath('/home/ubuntu/.fastai/data/planet_sample.tgz')

All the downloading functions use this to decide where to put the tgz and expanded folder. If filename already exists in a data directory in the same place as the calling notebook/script, that is used as the parent directly; otherwise, config.yml is read to see what path to use, which defaults to ~/.fastai/data is used. To override this default, simply modify the value in your config.yml:

data_archive_path: ~/.fastai/data
data_path: ~/.fastai/data

config.yml is located in the directory specified in the optional environment variable FASTAI_HOME (defaults to ~/.fastai/).

You probably won't need to use this yourself - it's used by URLs.datapath4file.

Get the key corresponding to path in the Config.

Get the Path where the data is stored.

Deprecated: This is v1 of fastai, which is not supported.

datasets¶

`class` `URLs`[source][test]

Downloading Data¶

`untar_data`[source][test]

`download_data`[source][test]

`datapath4file`[source][test]

`url2path`[source][test]

`class` `Config`[source][test]

`get_path`[source][test]

`data_path`[source][test]

`model_path`[source][test]

Deprecated: This is v1 of fastai, which is not supported.

datasets¶

class URLs[source][test]

Downloading Data¶

untar_data[source][test]

download_data[source][test]

datapath4file[source][test]

url2path[source][test]

class Config[source][test]

get_path[source][test]

data_path[source][test]

model_path[source][test]

`class` `URLs`[source][test]

`untar_data`[source][test]

`download_data`[source][test]

`datapath4file`[source][test]

`url2path`[source][test]

`class` `Config`[source][test]

`get_path`[source][test]

`data_path`[source][test]

`model_path`[source][test]