Skip to content

Commit 48e27c4

Browse files
committed
Merge branch 'master' of github.com:FluxML/FastAI.jl
2 parents bfa9bf8 + 975d5f9 commit 48e27c4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1730
-397
lines changed

Project.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ version = "0.4.3"
55

66
[deps]
77
Animations = "27a7e980-b3e6-11e9-2bcd-0b925532e340"
8-
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
98
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
109
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
1110
ColorVectorSpace = "c3611d14-8923-5661-9e6a-0046d554d3a4"
@@ -14,14 +13,15 @@ DataAugmentation = "88a5189c-e7ff-4f85-ac6b-e6158070f02e"
1413
DataDeps = "124859b0-ceae-595e-8997-d05f6a7a8dfe"
1514
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
1615
DataLoaders = "2e981812-ef13-4a9c-bfa0-ab13047b12a9"
16+
FeatureRegistries = "c6aefb4f-3ac3-4095-8805-528476b02c02"
1717
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
1818
FilePathsBase = "48062228-2e41-5def-b9a4-89aafe57970f"
1919
FixedPointNumbers = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
2020
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
2121
FluxTraining = "7bf95e4d-ca32-48da-9824-f0dc5310474f"
2222
Glob = "c27321d9-0574-5035-807b-f59d2c89b15c"
23-
ImageInTerminal = "d8c32880-2388-543b-8c61-d9f865259254"
2423
ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
24+
ImageInTerminal = "d8c32880-2388-543b-8c61-d9f865259254"
2525
IndirectArrays = "9b13fd28-a010-5f03-acff-a1bbcff69959"
2626
InlineTest = "bd334432-b1e7-49c7-a2dc-dd9149e4ebd6"
2727
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
@@ -32,6 +32,7 @@ MosaicViews = "e94cdb99-869f-56ef-bcf0-1ae2bcbe0389"
3232
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
3333
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
3434
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
35+
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
3536
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
3637
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
3738
Setfield = "efcf1570-3423-57d1-acb7-fd33fddbac46"
@@ -53,14 +54,15 @@ DataAugmentation = "0.2.4"
5354
DataDeps = "0.7"
5455
DataFrames = "1"
5556
DataLoaders = "0.1"
57+
FeatureRegistries = "0.1"
5658
FileIO = "1.7"
5759
FilePathsBase = "0.9"
5860
FixedPointNumbers = "0.8"
5961
Flux = "0.12, 0.13"
6062
FluxTraining = "0.2, 0.3"
6163
Glob = "1"
62-
ImageInTerminal = "0.4"
6364
ImageIO = "0.6"
65+
ImageInTerminal = "0.4"
6466
IndirectArrays = "0.5, 1"
6567
InlineTest = "0.2"
6668
JLD2 = "0.4"

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ As an example, here is how to train an image classification model:
1717

1818
```julia
1919
using FastAI
20-
data, blocks = loaddataset("imagenette2-160", (Image, Label))
20+
data, blocks = load(datarecipes()["imagenette2-160"])
2121
task = ImageClassificationSingle(blocks)
2222
learner = tasklearner(task, data, callbacks=[ToGPU()])
2323
fitonecycle!(learner, 10)

docs/api.md

Lines changed: 0 additions & 63 deletions
This file was deleted.

docs/background/blocksencodings.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -101,25 +101,22 @@ task = BlockTask(
101101

102102
Now `encode` expects a sample and just runs the encodings over that, giving us an encoded input `x` and an encoded target `y`.
103103

104-
{cell=main}
105104
```julia
106-
data = loadfolderdata(joinpath(datasetpath("dogscats"), "train"), filterfn=isimagefile, loadfn=(loadfile, parentname))
105+
data = loadfolderdata(joinpath(load(datasets()["dogscats"]), "train"), filterfn=isimagefile, loadfn=(loadfile, parentname))
107106
sample = getobs(data, 1)
108107
x, y = encodesample(task, Training(), sample)
109108
summary(x), summary(y)
110109
```
111110

112111
This is equivalent to:
113112

114-
{cell=main}
115113
```julia
116114
x, y = encode(task.encodings, Training(), FastAI.getblocks(task).sample, sample)
117115
summary(x), summary(y)
118116
```
119117

120118
Image segmentation looks almost the same except we use a `Mask` block as target. We're also using `OneHot` here, because it also has an `encode` task for `Mask`s. For this task, `ProjectiveTransforms` will be applied to both the `Image` and the `Mask`, using the same random state for cropping and augmentation.
121119

122-
{cell=main}
123120
```julia
124121
task = BlockTask(
125122
(Image{2}(), Mask{2}(1:10)),
@@ -133,19 +130,16 @@ task = BlockTask(
133130

134131
The easiest way to understand how encodings are applied to each block is to use [`describetask`](#) and [`describeencodings`](#) which print a table of how each encoding is applied successively to each block. Rows where a block is **bolded** indicate that the data was transformed by that encoding.
135132

136-
{cell=main}
137133
```julia
138134
describetask(task)
139135
```
140136

141137
The above tables make it clear what happens during training ("encoding a sample") and inference (encoding an input and "decoding an output"). The more general form [`describeencodings`](#) takes in encodings and blocks directly and can be useful for building an understanding of how encodings apply to some blocks.
142138

143-
{cell=main}
144139
```julia
145140
FastAI.describeencodings(task.encodings, (Image{2}(),))
146141
```
147142

148-
{cell=main}
149143
```julia
150144
FastAI.describeencodings((OneHot(),), (Label(1:10), Mask{2}(1:10), Image{2}()))
151145
```

docs/background/datapipelines.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ using DataLoaders: batchviewcollated
2626
using FastAI
2727
using FastAI.Datasets
2828

29-
data = loadtaskdata(datasetpath("imagenette2-320"), ImageClassification)
30-
task = ImageClassification(Datasets.getclassesclassification("imagenette2-320"), (224, 224))
29+
data, blocks = load(datarecipes()["imagenette2-320"])
30+
task = ImageClassificationSingle(blocks, size=(224, 224))
3131

3232
# maps data processing over `data`
3333
taskdata = taskdataset(data, task, Training())
@@ -68,7 +68,8 @@ using FastAI
6868
using FastAI.Datasets
6969
using FluxTraining: step!
7070

71-
data = loaddataset("imagenette2-320", (Image, Label))
71+
72+
data, blocks = load(datarecipes()["imagenette2-320"])
7273
task = ImageClassificationSingle(blocks)
7374
learner = tasklearner(task, data)
7475

@@ -130,13 +131,14 @@ If the data loading is still slowing down training, you'll probably have to spee
130131
For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px.
131132

132133
```julia
133-
data_orig, _ = loaddataset("imagenette2", (Image, Label))
134+
135+
data_orig = load(datarecipes()["imagenette2"])
134136
@time eachobsparallel(data_orig, buffered = false)
135137

136-
data_320px, _ = loaddataset("imagenette2-320", (Image, Label))
138+
data_320px = load(datarecipes()["imagenette2-320"])
137139
@time eachobsparallel(data_320px, buffered = false)
138140

139-
data_160px, _ = loaddataset("imagenette2-160", (Image, Label))
141+
data_160px = load(datarecipes()["imagenette2-160"])
140142
@time eachobsparallel(data_160px, buffered = false)
141143
```
142144

docs/data_containers.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
1313
{cell=main, output=false}
1414
```julia
1515
using FastAI
16-
import FastAI: Image
17-
data, _ = loaddataset("imagenette2-160", (Image, Label))
16+
data, _ = load(findfirst(datarecipes(datasetid="imagenette2-160")))
1817
```
1918

2019
A data container is any type that holds observations of data and allows us to load them with `getobs` and query the number of observations with `nobs`. In this case, each observation is a tuple of an image and the corresponding class; after all, we want to use it for image classification.
@@ -31,15 +30,15 @@ image
3130
nobs(data)
3231
```
3332

34-
[`loaddataset`](#) makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.
33+
`load(`[`datasets`](#)`[id])` makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.
3534

3635
## Creating data containers from files
3736

38-
Before we recreate the data container, [`datasetpath`](#) downloads a dataset and returns the path to the extracted files.
37+
Before we recreate the data container, we'll download the dataset and get the path where the files are saved to:
3938

4039
{cell=main}
4140
```julia
42-
dir = datasetpath("imagenette2-160")
41+
dir = load(datasets()["imagenette2-160"])
4342
```
4443

4544
Now we'll start with [`FileDataset`](#) which creates a data container (here a `Vector`) of files given a path. We'll use the path of the downloaded dataset:
@@ -127,12 +126,16 @@ Using this official split, it will be easier to compare the performance of your
127126

128127
## Dataset recipes
129128

130-
We saw above how different image classification datasets can be loaded with the same logic as long as they are in a common format. To encapsulate the logic for loading common dataset formats, FastAI.jl has `DatasetRecipe`s. When we used [`finddatasets`](#) in the [discovery tutorial](discovery.md), it returned pairs of a dataset name and a `DatasetRecipe`. For example, `"imagenette2-160"` has an associated [`ImageFolders`](#) recipe and we can load it using [`loadrecipe`] and the path to the downloaded dataset:
129+
We saw above how different image classification datasets can be loaded with the same logic as long as they are in a common format. To encapsulate the logic for loading common dataset formats, FastAI.jl has [`DatasetRecipe`](#)s. When we used [`datarecipes`](#) in the [discovery tutorial](discovery.md), it showed us such recipes that allow loading a dataset for a specific task. For example, `"imagenette2-160"` has an associated [`ImageFolders`](#) recipe which we can load by getting the entry and calling `load` on it:
131130

132131
{cell=main}
133132
```julia
134-
name, recipe = finddatasets(blocks=(Image, Label), name="imagenette2-160")[1]
135-
data, blocks = loadrecipe(recipe, datasetpath(name))
133+
entry = datarecipes()["imagenette2-160"]
134+
```
135+
136+
{cell=main}
137+
```julia
138+
data, blocks = load(entry)
136139
```
137140

138141
These recipes also take care of loading the data block information for the dataset. Read the [discovery tutorial](discovery.md) to find out more about that.

docs/discovery.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@ For finding both, we can make use of `Block`s. A `Block` represents a kind of da
66

77
## Finding a dataset
88

9-
To find a dataset with compatible samples, we can pass the types of these blocks to [`finddatasets`](#) which will return a list of dataset names and recipes to load them in a suitable way.
9+
To find a dataset with compatible samples, we can pass the types of these blocks as a filter to [`datasets`](#) which will show us only dataset recipes for loading those blocks.
1010

1111
{cell=main}
1212
```julia
1313
using FastAI
1414
import FastAI: Image
15-
finddatasets(blocks=(Image, Mask))
15+
datarecipes(blocks=(Image, Mask))
1616
```
1717

18-
We can see that the `"camvid_tiny"` dataset can be loaded so that each sample is a pair of an image and a segmentation mask. Let's use [`loaddataset`](#) to load a [data container](data_containers.md) and concrete blocks.
18+
We can see that the `"camvid_tiny"` dataset can be loaded so that each sample is a pair of an image and a segmentation mask. Let's use a data recipe to load a [data container](data_containers.md) and concrete blocks.
1919

2020
{cell=main, result=false, output=false style="display:none;"}
2121
```julia
@@ -24,7 +24,7 @@ ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
2424

2525
{cell=main, output=false}
2626
```julia
27-
data, blocks = loaddataset("camvid_tiny", (Image, Mask))
27+
data, blocks = load(findfirst(datarecipes(id="camvid_tiny", blocks=(Image, Mask))))
2828
```
2929

3030
As with every data container, we can load a sample using `getobs` which gives us a tuple of an image and a segmentation mask.
@@ -35,7 +35,7 @@ image, mask = sample = getobs(data, 1)
3535
size.(sample), eltype.(sample)
3636
```
3737

38-
`loaddataset` also returned `blocks` which are the concrete `Block` instances for the dataset. We passed in _types_ of blocks (`(Image, Mask)`) and get back _instances_ since the specifics of some blocks depend on the dataset. For example, the returned target block carries the labels for every class that a pixel can belong to.
38+
Loading the dataset recipe also returned `blocks`, which are the concrete [`Block`] instances for the dataset. We passed in _types_ of blocks (`(Image, Mask)`) and get back _instances_ since the specifics of some blocks depend on the dataset. For example, the returned target block carries the labels for every class that a pixel can belong to.
3939

4040
{cell=main}
4141
```julia
@@ -55,8 +55,8 @@ checkblock((inputblock, targetblock), (image, mask))
5555
In short, if you have a learning task in mind and want to load a dataset for that task, then
5656

5757
1. define the types of input and target block, e.g. `blocktypes = (Image, Label)`,
58-
2. use [`finddatasets`](#)`(blocks=blocktypes)` to find compatbile datasets; and
59-
3. run [`loaddataset`](#)`(datasetname, blocktypes)` to load a data container and the concrete blocks
58+
2. use `filter(`[`datarecipes`](#)`(), blocks=blocktypes)` to find compatbile dataset recipes; and
59+
3. run `load(`[`datarecipes`](#)`()[id])` to load a data container and the concrete blocks
6060

6161
### Exercises
6262

@@ -66,14 +66,14 @@ In short, if you have a learning task in mind and want to load a dataset for tha
6666

6767
## Finding a learning task
6868

69-
Armed with a dataset, we can go to the next step: creating a learning task. Since we already have blocks defined, this amounts to defining the encodings that are applied to the data before it is used in training. Here, FastAI.jl already defines some convenient constructors for learning tasks and you can find them with [`findlearningtasks`](#). Here we can pass in either block types as above or the block instances we got from `loaddataset`.
69+
Armed with a dataset, we can go to the next step: creating a learning task. Since we already have blocks defined, this amounts to defining the encodings that are applied to the data before it is used in training. Here, FastAI.jl already defines some convenient constructors for learning tasks and you can find them with [`learningtasks`](#). Here we can pass in either block types as above or the block instances:
7070

7171
{cell=main}
7272
```julia
73-
findlearningtasks(blocks)
73+
learningtasks(blocks=blocks)
7474
```
7575

76-
Looks like we can use the [`ImageSegmentation`](#) function to create a learning task for our learning task. Every function returned can be called with `blocks` and, optionally, some keyword arguments for customization.
76+
Looks like we can use the [`ImageSegmentation`](#) function to create a learning task. Every function returned can be called with `blocks` and, optionally, some keyword arguments for customization.
7777

7878
{cell=main}
7979
```julia

docs/fastai_api_comparison.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ FastAI.jl is in many ways similar to the original Python [fastai](docs.fast.ai),
66

77
FastAI.jl's own data block API makes it possible to derive every part of a high-level interface with a unified API across tasks. Instead it suffices to create a learning task and based on the blocks and encodings specified the proper model builder, loss function, and visualizations are implemented (see below). For a high-level API, a complete `Learner` can be constructed using [`tasklearner`](#) without much boilerplate. There are some helper functions for creating these learning tasks, for example [`ImageClassificationSingle`](#) and [`ImageSegmentation`](#).
88

9-
FastAI.jl additionally has a unified API for registering and discovering functionality across applications also based on the data block abstraction. `finddatasets` and `loaddataset` let you quickly load common datasets matching some data modality and `findlearningtask` lets you find learning task helpers for common tasks. See [the discovery tutorial](discovery.md) for more info.
9+
FastAI.jl additionally has a unified API for registering and discovering functionality across applications also based on the data block abstraction. [`datasets`](#) and [`datarecipes`](#) let you quickly load common datasets matching some data modality and [`learningtasks`] lets you find learning task helpers for common tasks. See [the discovery tutorial](discovery.md) for more info.
1010

1111
### Vision
1212

@@ -122,7 +122,7 @@ Metrics are handled by the [`Metrics`](#) callback which takes in reducing metri
122122

123123
### fastai.data.external
124124

125-
FastAI.jl makes all the same datasets available in `fastai.data.external` available. See `FastAI.Datasets.DATASETS` for a list of all datasets and use [`datasetpath`](#)`(name)` to download and extract a dataset.
125+
FastAI.jl makes all the same datasets available in `fastai.data.external` available. See [`datasets`](#) for a list of all datasets that can be downloaded.
126126

127127
### funcs_kwargs and DataLoader, fastai.data.core
128128

docs/howto/augmentvision.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ using FastAI
1515
import FastAI: Image
1616
import CairoMakie; CairoMakie.activate!(type="png")
1717

18-
data, blocks = loaddataset("imagenette2-160", (Image, Label))
18+
data, blocks = load(datarecipes()["imagenette2-160"])
1919
task = BlockTask(
2020
blocks,
2121
(

docs/howto/findfunctionality.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# How to find functionality
2+
3+
For some kinds of functionality, FastAI.jl provides feature registries that allow you to search for and use features. The following registries currently exist:
4+
5+
- [`datasets`](#) to download and unpack datasets,
6+
- [`datarecipes`](#) to load datasets into [data containers](/documents/docs/data_containers.md) that are compatible with a learning task; and
7+
- [`learningtasks`](#) to find learning tasks that are compatible with a dataset
8+
9+
To load functionality:
10+
11+
1. Get an entry using its ID
12+
{cell}
13+
```julia
14+
using FastAI
15+
entry = datasets()["mnist_var_size_tiny"]
16+
```
17+
2. And load it
18+
{cell}
19+
```julia
20+
load(entry)
21+
```
22+
23+
24+
## Datasets
25+
26+
{cell}
27+
```julia
28+
using FastAI
29+
datasets()
30+
```
31+
32+
## Data recipes
33+
34+
{cell}
35+
```julia
36+
using FastAI
37+
datarecipes()
38+
```
39+
40+
## Learning tasks
41+
42+
{cell}
43+
```julia
44+
using FastAI
45+
learningtasks()
46+
```

0 commit comments

Comments
 (0)