JuliaML
diff --git a/‎.travis.yml
Lines changed: 4 additions & 0 deletions b/‎.travis.yml
Lines changed: 4 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 71 additions & 45 deletions b/‎README.md
Lines changed: 71 additions & 45 deletions
diff --git a/‎docs/.gitignore
Lines changed: 2 additions & 0 deletions b/‎docs/.gitignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/make.jl
Lines changed: 35 additions & 0 deletions b/‎docs/make.jl
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/src/LICENSE.md
Lines changed: 5 additions & 0 deletions b/‎docs/src/LICENSE.md
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/src/assets/favicon.ico
4.19 KB b/‎docs/src/assets/favicon.ico
4.19 KB
diff --git a/‎docs/src/assets/logo.png
10.8 KB b/‎docs/src/assets/logo.png
10.8 KB
diff --git a/‎docs/src/index.md
Lines changed: 85 additions & 0 deletions b/‎docs/src/index.md
Lines changed: 85 additions & 0 deletions
diff --git a/‎docs/src/indices.md
Lines changed: 11 additions & 0 deletions b/‎docs/src/indices.md
Lines changed: 11 additions & 0 deletions
diff --git a/‎src/CIFAR10/README.md
Lines changed: 0 additions & 81 deletions b/‎src/CIFAR10/README.md
Lines changed: 0 additions & 81 deletions
@@ -22,6 +22,10 @@ before_script:
 install:
   #- sudo pip install pymdown-extensions
 
+after_success:
+  - julia -e 'Pkg.add("Documenter")'
+  - julia -e 'cd(Pkg.dir("MLDatasets")); include(joinpath("docs", "make.jl"))'
+
 script:
   - if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
   - julia -e 'Pkg.clone(pwd()); Pkg.build("MLDatasets"); Pkg.test("MLDatasets"; coverage=true)'
@@ -1,66 +1,75 @@
 # MLDatasets.jl
 
+[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaML.github.io/MLDatasets.jl/stable)
 [![Build Status](https://travis-ci.org/JuliaML/MLDatasets.jl.svg?branch=master)](https://travis-ci.org/JuliaML/MLDatasets.jl)
 
-`MLDatasets` provides access to common machine learning datasets
-for [Julia](http://julialang.org/). Currently, julia 0.6 is
-supported.
+This package represents a community effort to provide a common
+interface for accessing common Machine Learning (ML) datasets. In
+contrast to other data-related Julia packages, the focus of
+`MLDatasets.jl` is specifically on downloading, unpacking, and
+accessing benchmark dataset. Functionality for the purpose of
+data processing or visualization is only provided to a degree
+that is special to some dataset.
 
-## Installation
-
-```julia
-julia> Pkg.clone("https://github.com/JuliaML/MLDatasets.jl.git")
-```
+This package is a part of the
+[`JuliaML`](https://github.com/JuliaML) ecosystem. Its
+functionality is build on top of the package
+[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl).
 
 ## Basic Usage
 
+The way `MLDatasets.jl` is organized is that each dataset has its
+own dedicated sub-module. Where possible, those sub-module share
+a common interface for interacting with the datasets. For example
+you can load the training set and the test set of the MNIST
+database of handwritten digits using the following commands:
+
 ```julia
 using MLDatasets
 
 train_x, train_y = MNIST.traindata()
-test_x, test_y = MNIST.testdata()
+test_x,  test_y  = MNIST.testdata()
 ```
 
-Use `traindata(<directory>)` and `testdata(<directory>)` to change the default directory.
+To load the data the package looks for the necessary files in
+various locations (see
+[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl#configuration)
+for more information on how to configure such defaults). If the
+data can't be found in any of those locations, then the package
+will trigger a download dialog to `~/.julia/datadeps/MNIST`. To
+overwrite this on a case by case basis, it is possible to specify
+a data directory directly in `traindata(dir = <directory>)` and
+`testdata(dir = <directory>)`.
 
 ## Available Datasets
 
-### Image Classification
-
-#### CIFAR-10
+Check out the **[latest
+documentation](https://juliaml.github.io/MLDatasets.jl/latest)**
 
-The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
-dataset consists of 60000 32x32 RGB images in 10 classes.
+Additionally, you can make use of Julia's native docsystem.
+The following example shows how to get additional information
+on `MNIST.traintensor` within Julia's REPL:
 
-Take a look at the [sub-module](src/CIFAR10/README.md) for more
-information
-
-#### CIFAR-100
-
-The [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)
-dataset consists of 60000 32x32 color images in 100 classes. The
-100 classes are grouped into 20 superclasses (fine and coarse
-labels).
-
-Take a look at the [sub-module](src/CIFAR100/README.md) for more
-information
-
-#### MNIST
-
-The [MNIST](http://yann.lecun.com/exdb/mnist/) dataset consists
-of 60000 28x28 images of handwritten digits.
+```julia
+?MNIST.traintensor
+```
 
-Take a look at the [sub-module](src/MNIST/README.md) for more
-information
+Each dataset has its own dedicated sub-module. As such, it makes
+sense to document their functionality similarly distributed. Find
+below a list of available datasets and links to their their
+documentation.
 
-#### Fashion-MNIST
+### Image Classification
 
-The [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist)
-dataset consists of 60000 28x28 images of fashion products. It
-was designed to be a drop-in replacement for the MNIST dataset
+This package provides a variety of common benchmark datasets for
+the purpose of image classification.
 
-Take a look at the [sub-module](src/FashionMNIST/README.md) for more
-information
+Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
+:------:|:-------:|:-------------:|:-------------:|:------------:|:------------:
+[**MNIST**](https://juliaml.github.io/MLDatasets.jl/datasets/MNIST/) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
+[**FashionMNIST**](https://juliaml.github.io/MLDatasets.jl/datasets/FashionMNIST/) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
+[**CIFAR-10**](https://juliaml.github.io/MLDatasets.jl/datasets/CIFAR10/) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
+[**CIFAR-100**](https://juliaml.github.io/MLDatasets.jl/datasets/CIFAR100/) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
 
 ### Language Modeling
 
@@ -102,10 +111,27 @@ testdata = UD_English.devdata()
 
 ## Data Size
 | | Type | Train x | Train y | Test x | Test y |
-|:---:|:---:|:---:|:---:|:---:|:---:|
-| **CIFAR-10** | image | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000 |
-| **CIFAR-100** | image | 32x32x3x5000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2) |
-| **MNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
-| **FashionMNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
 | **PTBLM** | text | 42068 | 42068 | 3761 | 3761 |
 | **UD_English** | text | 12543 | - | 2077 | - |
+
+## Installation
+
+To install `MLDatasets.jl`, start up Julia and type the following
+code snippet into the REPL. It makes use of the native Julia
+package manger.
+
+```julia
+Pkg.add("MLDatasets")
+```
+
+Additionally, for example if you encounter any sudden issues, or
+in the case you would like to contribute to the package, you can
+manually choose to be on the latest (untagged) version.
+
+```julia
+Pkg.checkout("MLDatasets")
+```
+
+## License
+
+This code is free to use under the terms of the MIT license.
@@ -0,0 +1,2 @@
+build/
+site/
@@ -0,0 +1,35 @@
+using Documenter, MLDatasets
+
+makedocs(
+    modules = [MLDatasets],
+    clean = false,
+    format = :html,
+    assets = [
+        joinpath("assets", "favicon.ico"),
+    ],
+    sitename = "MLDatasets.jl",
+    authors = "Hiroyuki Shindo, Christof Stocker",
+    linkcheck = !("skiplinks" in ARGS),
+    pages = Any[
+        "Home" => "index.md",
+        "Available Datasets" => Any[
+            "Image Classification" => Any[
+                "MNIST handwritten digits" => "datasets/MNIST.md",
+                "Fashion MNIST" => "datasets/FashionMNIST.md",
+                "CIFAR-10" => "datasets/CIFAR10.md",
+                "CIFAR-100" => "datasets/CIFAR100.md",
+            ],
+        ],
+        hide("Indices" => "indices.md"),
+        "LICENSE.md",
+    ],
+    html_prettyurls = !("local" in ARGS),
+)
+
+deploydocs(
+    repo = "github.com/JuliaML/MLDatasets.jl.git",
+    target = "build",
+    julia = "0.6",
+    deps = nothing,
+    make = nothing,
+)
@@ -0,0 +1,5 @@
+# LICENSE
+
+```@eval
+Markdown.parse_file(joinpath(@__DIR__, "../LICENSE"))
+```
@@ -0,0 +1,85 @@
+# MLDatasets.jl's Documentation
+
+This package represents a community effort to provide a common
+interface for accessing common Machine Learning (ML) datasets. In
+contrast to other data-related Julia packages, the focus of
+`MLDatasets.jl` is specifically on downloading, unpacking, and
+accessing benchmark dataset. Functionality for the purpose of
+data processing or visualization is only provided to a degree
+that is special to some dataset.
+
+This package is a part of the
+[`JuliaML`](https://github.com/JuliaML) ecosystem. Its
+functionality is build on top of the package
+[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl).
+
+## Installation
+
+To install `MLDatasets.jl`, start up Julia and type the following
+code snippet into the REPL. It makes use of the native Julia
+package manger.
+
+```julia
+Pkg.add("MLDatasets")
+```
+
+Additionally, for example if you encounter any sudden issues, or
+in the case you would like to contribute to the package, you can
+manually choose to be on the latest (untagged) version.
+
+```julia
+Pkg.checkout("MLDatasets")
+```
+
+## Basic Usage
+
+The way `MLDatasets.jl` is organized is that each dataset has its
+own dedicated sub-module. Where possible, those sub-module share
+a common interface for interacting with the datasets. For example
+you can load the training set and the test set of the MNIST
+database of handwritten digits using the following commands:
+
+```julia
+using MLDatasets
+
+train_x, train_y = MNIST.traindata()
+test_x,  test_y  = MNIST.testdata()
+```
+
+To load the data the package looks for the necessary files in
+various locations (see
+[`DataDeps.jl`](https://github.com/oxinabox/DataDeps.jl#configuration)
+for more information on how to configure such defaults). If the
+data can't be found in any of those locations, then the package
+will trigger a download dialog to `~/.julia/datadeps/MNIST`. To
+overwrite this on a case by case basis, it is possible to specify
+a data directory directly in `traindata(dir = <directory>)` and
+`testdata(dir = <directory>)`.
+
+## Available Datasets
+
+Each dataset has its own dedicated sub-module. As such, it makes
+sense to document their functionality similarly distributed. Find
+below a list of available datasets and their documentation.
+
+### Image Classification
+
+This package provides a variety of common benchmark datasets for
+the purpose of image classification.
+
+Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
+:------:|:-------:|:-------------:|:-------------:|:------------:|:------------:
+[**MNIST**](@ref MNIST) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
+[**FashionMNIST**](@ref FashionMNIST) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
+[**CIFAR-10**](@ref CIFAR10) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
+[**CIFAR-100**](@ref CIFAR100) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
+
+### Language Modeling
+
+Work in progress
+
+## Index
+
+```@contents
+Pages = ["indices.md"]
+```
@@ -0,0 +1,11 @@
+## Functions
+
+```@index
+Order   = [:function]
+```
+
+## Types
+
+```@index
+Order   = [:type]
+```