Skip to content

Commit fb440cf

Browse files
committed
add documentation
1 parent da94645 commit fb440cf

File tree

13 files changed

+213
-377
lines changed

13 files changed

+213
-377
lines changed

.travis.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ before_script:
2222
install:
2323
#- sudo pip install pymdown-extensions
2424

25+
after_success:
26+
- julia -e 'Pkg.add("Documenter")'
27+
- julia -e 'cd(Pkg.dir("MLDatasets")); include(joinpath("docs", "make.jl"))'
28+
2529
script:
2630
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
2731
- julia -e 'Pkg.clone(pwd()); Pkg.build("MLDatasets"); Pkg.test("MLDatasets"; coverage=true)'

README.md

Lines changed: 71 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,75 @@
11
# MLDatasets.jl
22

3+
[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaML.github.io/MLDatasets.jl/stable)
34
[![Build Status](https://travis-ci.org/JuliaML/MLDatasets.jl.svg?branch=master)](https://travis-ci.org/JuliaML/MLDatasets.jl)
45

5-
`MLDatasets` provides access to common machine learning datasets
6-
for [Julia](http://julialang.org/). Currently, julia 0.6 is
7-
supported.
6+
This package represents a community effort to provide a common
7+
interface for accessing common Machine Learning (ML) datasets. In
8+
contrast to other data-related Julia packages, the focus of
9+
`MLDatasets.jl` is specifically on downloading, unpacking, and
10+
accessing benchmark dataset. Functionality for the purpose of
11+
data processing or visualization is only provided to a degree
12+
that is special to some dataset.
813

9-
## Installation
10-
11-
```julia
12-
julia> Pkg.clone("https://github.com/JuliaML/MLDatasets.jl.git")
13-
```
14+
This package is a part of the
15+
[`JuliaML`](https://github.com/JuliaML) ecosystem. Its
16+
functionality is build on top of the package
17+
[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl).
1418

1519
## Basic Usage
1620

21+
The way `MLDatasets.jl` is organized is that each dataset has its
22+
own dedicated sub-module. Where possible, those sub-module share
23+
a common interface for interacting with the datasets. For example
24+
you can load the training set and the test set of the MNIST
25+
database of handwritten digits using the following commands:
26+
1727
```julia
1828
using MLDatasets
1929

2030
train_x, train_y = MNIST.traindata()
21-
test_x, test_y = MNIST.testdata()
31+
test_x, test_y = MNIST.testdata()
2232
```
2333

24-
Use `traindata(<directory>)` and `testdata(<directory>)` to change the default directory.
34+
To load the data the package looks for the necessary files in
35+
various locations (see
36+
[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl#configuration)
37+
for more information on how to configure such defaults). If the
38+
data can't be found in any of those locations, then the package
39+
will trigger a download dialog to `~/.julia/datadeps/MNIST`. To
40+
overwrite this on a case by case basis, it is possible to specify
41+
a data directory directly in `traindata(dir = <directory>)` and
42+
`testdata(dir = <directory>)`.
2543

2644
## Available Datasets
2745

28-
### Image Classification
29-
30-
#### CIFAR-10
46+
Check out the **[latest
47+
documentation](https://juliaml.github.io/MLDatasets.jl/latest)**
3148

32-
The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
33-
dataset consists of 60000 32x32 RGB images in 10 classes.
49+
Additionally, you can make use of Julia's native docsystem.
50+
The following example shows how to get additional information
51+
on `MNIST.traintensor` within Julia's REPL:
3452

35-
Take a look at the [sub-module](src/CIFAR10/README.md) for more
36-
information
37-
38-
#### CIFAR-100
39-
40-
The [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)
41-
dataset consists of 60000 32x32 color images in 100 classes. The
42-
100 classes are grouped into 20 superclasses (fine and coarse
43-
labels).
44-
45-
Take a look at the [sub-module](src/CIFAR100/README.md) for more
46-
information
47-
48-
#### MNIST
49-
50-
The [MNIST](http://yann.lecun.com/exdb/mnist/) dataset consists
51-
of 60000 28x28 images of handwritten digits.
53+
```julia
54+
?MNIST.traintensor
55+
```
5256

53-
Take a look at the [sub-module](src/MNIST/README.md) for more
54-
information
57+
Each dataset has its own dedicated sub-module. As such, it makes
58+
sense to document their functionality similarly distributed. Find
59+
below a list of available datasets and links to their their
60+
documentation.
5561

56-
#### Fashion-MNIST
62+
### Image Classification
5763

58-
The [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist)
59-
dataset consists of 60000 28x28 images of fashion products. It
60-
was designed to be a drop-in replacement for the MNIST dataset
64+
This package provides a variety of common benchmark datasets for
65+
the purpose of image classification.
6166

62-
Take a look at the [sub-module](src/FashionMNIST/README.md) for more
63-
information
67+
Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
68+
:------:|:-------:|:-------------:|:-------------:|:------------:|:------------:
69+
[**MNIST**](https://juliaml.github.io/MLDatasets.jl/datasets/MNIST/) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
70+
[**FashionMNIST**](https://juliaml.github.io/MLDatasets.jl/datasets/FashionMNIST/) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
71+
[**CIFAR-10**](https://juliaml.github.io/MLDatasets.jl/datasets/CIFAR10/) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
72+
[**CIFAR-100**](https://juliaml.github.io/MLDatasets.jl/datasets/CIFAR100/) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
6473

6574
### Language Modeling
6675

@@ -102,10 +111,27 @@ testdata = UD_English.devdata()
102111

103112
## Data Size
104113
| | Type | Train x | Train y | Test x | Test y |
105-
|:---:|:---:|:---:|:---:|:---:|:---:|
106-
| **CIFAR-10** | image | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000 |
107-
| **CIFAR-100** | image | 32x32x3x5000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2) |
108-
| **MNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
109-
| **FashionMNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
110114
| **PTBLM** | text | 42068 | 42068 | 3761 | 3761 |
111115
| **UD_English** | text | 12543 | - | 2077 | - |
116+
117+
## Installation
118+
119+
To install `MLDatasets.jl`, start up Julia and type the following
120+
code snippet into the REPL. It makes use of the native Julia
121+
package manger.
122+
123+
```julia
124+
Pkg.add("MLDatasets")
125+
```
126+
127+
Additionally, for example if you encounter any sudden issues, or
128+
in the case you would like to contribute to the package, you can
129+
manually choose to be on the latest (untagged) version.
130+
131+
```julia
132+
Pkg.checkout("MLDatasets")
133+
```
134+
135+
## License
136+
137+
This code is free to use under the terms of the MIT license.

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
build/
2+
site/

docs/make.jl

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
using Documenter, MLDatasets
2+
3+
makedocs(
4+
modules = [MLDatasets],
5+
clean = false,
6+
format = :html,
7+
assets = [
8+
joinpath("assets", "favicon.ico"),
9+
],
10+
sitename = "MLDatasets.jl",
11+
authors = "Hiroyuki Shindo, Christof Stocker",
12+
linkcheck = !("skiplinks" in ARGS),
13+
pages = Any[
14+
"Home" => "index.md",
15+
"Available Datasets" => Any[
16+
"Image Classification" => Any[
17+
"MNIST handwritten digits" => "datasets/MNIST.md",
18+
"Fashion MNIST" => "datasets/FashionMNIST.md",
19+
"CIFAR-10" => "datasets/CIFAR10.md",
20+
"CIFAR-100" => "datasets/CIFAR100.md",
21+
],
22+
],
23+
hide("Indices" => "indices.md"),
24+
"LICENSE.md",
25+
],
26+
html_prettyurls = !("local" in ARGS),
27+
)
28+
29+
deploydocs(
30+
repo = "github.com/JuliaML/MLDatasets.jl.git",
31+
target = "build",
32+
julia = "0.6",
33+
deps = nothing,
34+
make = nothing,
35+
)

docs/src/LICENSE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# LICENSE
2+
3+
```@eval
4+
Markdown.parse_file(joinpath(@__DIR__, "../LICENSE"))
5+
```

docs/src/assets/favicon.ico

4.19 KB
Binary file not shown.

docs/src/assets/logo.png

10.8 KB
Loading

docs/src/index.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# MLDatasets.jl's Documentation
2+
3+
This package represents a community effort to provide a common
4+
interface for accessing common Machine Learning (ML) datasets. In
5+
contrast to other data-related Julia packages, the focus of
6+
`MLDatasets.jl` is specifically on downloading, unpacking, and
7+
accessing benchmark dataset. Functionality for the purpose of
8+
data processing or visualization is only provided to a degree
9+
that is special to some dataset.
10+
11+
This package is a part of the
12+
[`JuliaML`](https://github.com/JuliaML) ecosystem. Its
13+
functionality is build on top of the package
14+
[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl).
15+
16+
## Installation
17+
18+
To install `MLDatasets.jl`, start up Julia and type the following
19+
code snippet into the REPL. It makes use of the native Julia
20+
package manger.
21+
22+
```julia
23+
Pkg.add("MLDatasets")
24+
```
25+
26+
Additionally, for example if you encounter any sudden issues, or
27+
in the case you would like to contribute to the package, you can
28+
manually choose to be on the latest (untagged) version.
29+
30+
```julia
31+
Pkg.checkout("MLDatasets")
32+
```
33+
34+
## Basic Usage
35+
36+
The way `MLDatasets.jl` is organized is that each dataset has its
37+
own dedicated sub-module. Where possible, those sub-module share
38+
a common interface for interacting with the datasets. For example
39+
you can load the training set and the test set of the MNIST
40+
database of handwritten digits using the following commands:
41+
42+
```julia
43+
using MLDatasets
44+
45+
train_x, train_y = MNIST.traindata()
46+
test_x, test_y = MNIST.testdata()
47+
```
48+
49+
To load the data the package looks for the necessary files in
50+
various locations (see
51+
[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl#configuration)
52+
for more information on how to configure such defaults). If the
53+
data can't be found in any of those locations, then the package
54+
will trigger a download dialog to `~/.julia/datadeps/MNIST`. To
55+
overwrite this on a case by case basis, it is possible to specify
56+
a data directory directly in `traindata(dir = <directory>)` and
57+
`testdata(dir = <directory>)`.
58+
59+
## Available Datasets
60+
61+
Each dataset has its own dedicated sub-module. As such, it makes
62+
sense to document their functionality similarly distributed. Find
63+
below a list of available datasets and their documentation.
64+
65+
### Image Classification
66+
67+
This package provides a variety of common benchmark datasets for
68+
the purpose of image classification.
69+
70+
Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
71+
:------:|:-------:|:-------------:|:-------------:|:------------:|:------------:
72+
[**MNIST**](@ref MNIST) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
73+
[**FashionMNIST**](@ref FashionMNIST) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
74+
[**CIFAR-10**](@ref CIFAR10) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
75+
[**CIFAR-100**](@ref CIFAR100) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
76+
77+
### Language Modeling
78+
79+
Work in progress
80+
81+
## Index
82+
83+
```@contents
84+
Pages = ["indices.md"]
85+
```

docs/src/indices.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Functions
2+
3+
```@index
4+
Order = [:function]
5+
```
6+
7+
## Types
8+
9+
```@index
10+
Order = [:type]
11+
```

src/CIFAR10/README.md

Lines changed: 0 additions & 81 deletions
This file was deleted.

0 commit comments

Comments
 (0)