You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,18 @@ All notable changes to this project will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
-
## v0.4.3
8
+
## v0.5 (unreleased)
9
+
10
+
### Changed
11
+
12
+
- (BREAKING) Now uses [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl) to create and load datasets and data containers
13
+
- Replaces dependencies MLDataPattern.jl, LearnBase.jl, and DataLoaders.jl
14
+
- Data containers must now implement the `Base.getindex`/`MLUtils.getobs` and `Base.length`/`MLUtils.numobs` interfaces.
15
+
- Previously exported `MLDataPattern.datasubset` has been replaced by `MLUtils.ObsView`
16
+
- Documentation has been updated appropriately
17
+
18
+
19
+
## v0.4.3 (2022/05/14)
9
20
10
21
### Added
11
22
@@ -17,7 +28,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
17
28
- the old APIs for registries have been removed and functionality for accessing them (`finddatasets`, `loaddataset`) has been deprecated. See the updated docs for how to find functionality using the new feature registries.
A data container is any type that holds observations of data and allows us to load them with `getobs` and query the number of observations with `nobs`. In this case, each observation is a tuple of an image and the corresponding class; after all, we want to use it for image classification.
19
+
A data container is any type that holds observations of data and allows us to load them with `getobs` and query the number of observations with `numobs`. In this case, each observation is a tuple of an image and the corresponding class; after all, we want to use it for image classification.
20
20
21
21
{cell=main}
22
22
```julia
@@ -27,7 +27,7 @@ image
27
27
28
28
{cell=main}
29
29
```julia
30
-
nobs(data)
30
+
numobs(data)
31
31
```
32
32
33
33
`load(`[`datasets`](#)`[id])` makes it easy to a load a data container that is compatible with some block types, but to get a better feel for what it does, let's look under the hood by creating the same data container using some mid-level APIs.
@@ -41,11 +41,11 @@ Before we recreate the data container, we'll download the dataset and get the pa
41
41
dir =load(datasets()["imagenette2-160"])
42
42
```
43
43
44
-
Now we'll start with [`FileDataset`](#) which creates a data container (here a `Vector`) of files given a path. We'll use the path of the downloaded dataset:
44
+
Now we'll start with `loadfolderdata` which creates a data container (here a `Vector`) of files given a path. We'll use the path of the downloaded dataset:
45
45
46
46
{cell=main}
47
47
```julia
48
-
files =FileDataset(dir)
48
+
files =loadfolderdata(dir)
49
49
```
50
50
51
51
`files` is a data container where each observation is a path to a file. We'll confirm that using `getobs`:
Copy file name to clipboardExpand all lines: docs/fastai_api_comparison.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# fastai API comparison
2
2
3
-
FastAI.jl is in many ways similar to the original Python [fastai](docs.fast.ai), but also has its differences. This reference goes through all the sections in the [fastai: A Layered API for Deep Learning](https://arxiv.org/abs/2002.04688) paper and comments what the interfaces for the same functionality in FastAI.jl are, and where they differ or functionality is still missing.
3
+
FastAI.jl is in many ways similar to the original Python [fastai](http://docs.fast.ai), but also has its differences. This reference goes through all the sections in the [fastai: A Layered API for Deep Learning](https://arxiv.org/abs/2002.04688) paper and comments what the interfaces for the same functionality in FastAI.jl are, and where they differ or functionality is still missing.
4
4
5
5
## Applications
6
6
@@ -10,15 +10,16 @@ FastAI.jl additionally has a unified API for registering and discovering functio
10
10
11
11
### Vision
12
12
13
-
Computer vision is the most developed part of FastAI.jl with good support for different tasks and optimized data pipelines with N-dimensional images, masks and keypoints. See the tutorial section for many examples.
13
+
Computer vision is well-supported in FastAI.jl with different tasks and optimized data pipelines for N-dimensional images, masks and keypoints. See the tutorial section for many examples.
14
14
15
15
### Tabular
16
16
17
-
Support for tabular data is merged into master but is lacking documentation which will come with the next release (0.2.0).
17
+
FastAI.jl also has support for tabular data.
18
18
19
19
### Deployment
20
20
21
-
Through FastAI.jl's [`LearningTask` interface](./learning_tasks.md), the data processing logic is decoupled from the dataset creation and training and can be easily serialized and loaded to make predictions. See the tutorial on [saving and loading models](../notebooks/serialization.ipynb).
21
+
Through FastAI.jl's [`LearningTask`](#) interface, the data processing logic is decoupled from the dataset creation and training and can be easily serialized and loaded to make predictions. See the tutorial on [saving and loading models](../notebooks/serialization.ipynb).
22
+
22
23
23
24
---
24
25
@@ -76,8 +77,7 @@ res = lrfind(learner); plot(res) # Run learning rate finder and plot suggestio
76
77
Since it is a Julia package, FastAI.jl is not written on top of PyTorch, but a Julia library for deep learning: [Flux.jl](http://www.fluxml.ai). In any case, the point of this section is to note that the abstractions in fastai are decoupled and existing projects can easily be reused. This is also the case for FastAI.jl as it is built on top of several decoupled libraries. Many of these were built specifically for FastAI.jl, but they are unaware of each other and useful in their own right:
77
78
78
79
-[Flux.jl](https://github.com/FluxML/Flux.jl) provides models, optimizers, and loss functions, fulfilling a similar role to PyTorch
79
-
-[MLDataPattern.jl](https://github.com/JuliaML/MLDataPattern.jl) gives you tools for building and transforming data containers
80
-
-[DataLoaders.jl](https://github.com/lorenzoh/DataLoaders.jl) takes care of efficient, parallelized iteration of data containers
80
+
-[MLUtils.jl](https://github.com/JuliaML/MLUtils.jl) gives you tools for building and transforming data containers. Also, it takes care of efficient, parallelized iteration of data containers.
81
81
-[DataAugmentation.jl](https://github.com/lorenzoh/DataAugmentation.jl) takes care of the lower levels of high-performance, composable data augmentations.
82
82
-[FluxTraining.jl](https://github.com/lorenzoh/FluxTraining.jl) contributes a highly extensible training loop with 2-way callbacks
83
83
@@ -126,14 +126,14 @@ FastAI.jl makes all the same datasets available in `fastai.data.external` availa
126
126
127
127
### funcs_kwargs and DataLoader, fastai.data.core
128
128
129
-
In FastAI.jl, you are not restricted to a specific type of data iterator and can pass any iterator over batches to `Learner`. In cases where performance is important [`DataLoader`](#) can speed up data iteration by loading and batching samples in parallel on background threads. All transformations of data happen through the data container interface which requires a type to implement `LearnBase.getobs` and `LearnBase.nobs`, similar to PyTorch's `torch.utils.data.Dataset`. Data containers are then transformed into other data containers. Some examples:
129
+
In FastAI.jl, you are not restricted to a specific type of data iterator and can pass any iterator over batches to `Learner`. In cases where performance is important [`DataLoader`](#) can speed up data iteration by loading and batching samples in parallel on background threads. All transformations of data happen through the data container interface which requires a type to implement `Base.getindex`/`MLUtils.getobs` and `Base.length`/`MLUtils.numobs`, similar to PyTorch's `torch.utils.data.Dataset`. Data containers are then transformed into other data containers. Some examples:
130
130
131
131
-[`mapobs`](#)`(f, data)` lazily maps a function `f` of over `data` such that `getobs(mapobs(f, data), idx) == f(getobs(data, idx))`. For example `mapobs(loadfile, files)` turns a vector of image files into a data container of images.
132
-
-`DataLoader(data, batchsize)` is a wrapper around `batchviewcollated` which turns a data container of samples into one of collated batches and `eachobsparallel` which creates a parallel, buffered iterator over the observations (here batches) in the resulting container.
132
+
-`DataLoader(data; batchsize)` is a wrapper around [`BatchView`](#) which turns a data container of samples into one of collated batches and `eachobsparallel` which creates a parallel, buffered iterator over the observations (here batches) in the resulting container.
133
133
-[`groupobs`](#)`(f, data)` splits a container into groups using a grouping function `f`. For example, `groupobs(grandparentname, files)` creates training splits for files where the grandparent folder indicates the split.
134
-
-[`datasubset`](#)`(data, idxs)` lazily takes a subset of the observations in `data`.
134
+
-[`MLUtils.ObsView`](#)`(data, idxs)` lazily takes a subset of the observations in `data`.
135
135
136
-
For more information, see the [data container tutorial](data_containers.md) and the [MLDataPattern.jl docs](https://mldatapatternjl.readthedocs.io/en/latest/). At a higher level, there are also convenience functions like [`FileDataset`](#) to create data containers.
136
+
For more information, see the [data container tutorial](data_containers.md) and the [MLUtils.jl docs](https://juliaml.github.io/MLUtils.jl/dev/). At a higher level, there are also convenience functions like `loadfolderdata` to create data containers.
Copy file name to clipboardExpand all lines: docs/glossary.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Terms commonly used in *FastAI.jl*.
6
6
7
7
In many docstrings, generic types are abbreviated with the following symbols. Many of these refer to a learning task; the context should make clear which task is meant.
8
8
9
-
-`DC{T}`: A [data container](#data-container) of type T, meaning a type that implements the data container interface `getobs` and `nobs` where `getobs : (DC{T}, Int) -> Int`, that is, each observation is of type `T`.
9
+
-`DC{T}`: A [data container](data_containers.md) of type T, meaning a type that implements the data container interface `getindex`/`getobs` and `length`/`numobs` where `getobs : (DC{T}, Int) -> Int`, that is, each observation is of type `T`.
10
10
-`I`: Type of the unprocessed input in the context of a task.
11
11
-`T`: Type of the target variable.
12
12
-`X`: Type of the processed input. This is fed into a `model`, though it may be batched beforehand. `Xs` represents a batch of processed inputs.
@@ -23,7 +23,7 @@ Some examples of these in use:
23
23
24
24
### Data container
25
25
26
-
A data structure that is used to load a number of data observations separately and lazily. It defines how many observations it holds with `nobs` and how to load a single observation with `getobs`.
26
+
A data structure that is used to load a number of data observations separately and lazily. It defines how many observations it holds with `numobs` and how to load a single observation with `getobs`.
This line downloads and loads the [ImageNette](https://github.com/fastai/imagenette) image classification dataset, a small subset of ImageNet with 10 different classes. `data` is a [data container](data_containers.md) that can be used to load individual observations, here of images and the corresponding labels. We can use `getobs(data, i)` to load the `i`-th observation and `nobs` to find out how many observations there are.
36
+
This line downloads and loads the [ImageNette](https://github.com/fastai/imagenette) image classification dataset, a small subset of ImageNet with 10 different classes. `data` is a [data container](data_containers.md) that can be used to load individual observations, here of images and the corresponding labels. We can use `getobs(data, i)` to load the `i`-th observation and `numobs` to find out how many observations there are.
0 commit comments