JuliaML
diff --git a/‎.travis.yml
Lines changed: 7 additions & 1 deletion b/‎.travis.yml
Lines changed: 7 additions & 1 deletion
diff --git a/‎README.md
Lines changed: 45 additions & 10 deletions b/‎README.md
Lines changed: 45 additions & 10 deletions
diff --git a/‎REQUIRE
Lines changed: 1 addition & 1 deletion b/‎REQUIRE
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/CIFAR10.jl
Lines changed: 3 additions & 3 deletions b/‎src/CIFAR10.jl
Lines changed: 3 additions & 3 deletions
diff --git a/‎src/CIFAR100.jl
Lines changed: 3 additions & 3 deletions b/‎src/CIFAR100.jl
Lines changed: 3 additions & 3 deletions
diff --git a/‎src/FashionMNIST/FashionMNIST.jl
Lines changed: 74 additions & 0 deletions b/‎src/FashionMNIST/FashionMNIST.jl
Lines changed: 74 additions & 0 deletions
diff --git a/‎src/FashionMNIST/README.md
Lines changed: 82 additions & 0 deletions b/‎src/FashionMNIST/README.md
Lines changed: 82 additions & 0 deletions
@@ -5,7 +5,13 @@ os:
     - osx
 
 julia:
-    - 0.5
+    - 0.6
+    - nightly
+matrix:
+    allow_failures:
+        - julia: nightly
+git:
+    depth: 5000
 
 notifications:
     email: false
 
@@ -1,34 +1,46 @@
 # MLDatasets.jl
+
 [![Build Status](https://travis-ci.org/JuliaML/MLDatasets.jl.svg?branch=master)](https://travis-ci.org/JuliaML/MLDatasets.jl)
 
-`MLDatasets` provides an access to common machine learning datasets for [Julia](http://julialang.org/).
-Currently, julia 0.5 is supported.
+`MLDatasets` provides an access to common machine learning
+datasets for [Julia](http://julialang.org/). Currently, julia 0.5
+is supported.
 
-The datasets are automatically downloaded to the specified directory.
-The default directory is `MLDatasets/datasets`.
+The datasets are automatically downloaded to the specified
+directory. The default directory is `MLDatasets/datasets`.
 
 ## Installation
+
 ```julia
 julia> Pkg.clone("https://github.com/JuliaML/MLDatasets.jl.git")
 ```
 
 ## Basic Usage
+
 ```julia
 using MLDatasets
 
 train_x, train_y = MNIST.traindata()
 test_x, test_y = MNIST.testdata()
 ```
+
 Use `traindata(<directory>)` and `testdata(<directory>)` to change the default directory.
 
 ## Available Datasets
+
 ### Image Classification
+
 #### CIFAR-10
-The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 60000 32x32 color images in 10 classes.
+
+The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
+dataset consists of 60000 32x32 color images in 10 classes.
 
 #### CIFAR-100
-The [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 600 32x32 color images in 100 classes.
-The 100 classes are grouped into 20 superclasses (fine and coarse labels).
+
+The [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)
+dataset consists of 600 32x32 color images in 100 classes. The
+100 classes are grouped into 20 superclasses (fine and coarse
+labels).
 
 #### MNIST
 
@@ -38,12 +50,27 @@ of 60000 28x28 images of handwritten digits.
 Take a look at the [sub-module](src/MNIST/README.md) for more
 information
 
+#### Fashion-MNIST
+
+The [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist)
+dataset consists of 60000 28x28 images of fashion products. It
+was designed to be a drop-in replacement for the MNIST dataset
+
+Take a look at the [sub-module](src/FashionMNIST/README.md) for more
+information
+
 ### Language Modeling
+
 #### PTBLM
-The `PTBLM` dataset consists of Penn Treebank sentences for language modeling, available from [tomsercu/lstm](https://github.com/tomsercu/lstm).
-The unknown words are replaced with `<unk>` so that the total vocaburary size becomes 10000.
+
+The `PTBLM` dataset consists of Penn Treebank sentences for
+language modeling, available from
+[tomsercu/lstm](https://github.com/tomsercu/lstm). The unknown
+words are replaced with `<unk>` so that the total vocaburary size
+becomes 10000.
 
 This is the first sentence of the PTBLM dataset.
+
 ```julia
 x, y = PTBLM.traindata()
 
@@ -52,11 +79,18 @@ x[1]
 y[1]
 > ["it", "was", "n't", "black", "monday", "<eos>"]
 ```
+
 where `MLDataset` adds the special word: `<eos>` to the end of `y`.
 
 ### Text Analysis (POS-Tagging, Parsing)
+
 #### UD English
-The [UD_English](https://github.com/UniversalDependencies/UD_English) dataset is an annotated corpus of morphological features, POS-tags and syntactic trees. The dataset follows CoNLL-style format.
+
+The [UD_English](https://github.com/UniversalDependencies/UD_English)
+dataset is an annotated corpus of morphological features,
+POS-tags and syntactic trees. The dataset follows CoNLL-style
+format.
+
 ```julia
 traindata = UD_English.traindata()
 devdata = UD_English.devdata()
@@ -69,5 +103,6 @@ testdata = UD_English.devdata()
 | **CIFAR-10** | image | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000 |
 | **CIFAR-100** | image | 32x32x3x500 | 2x500 | 32x32x3x100 | 2x100 |
 | **MNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
+| **FashionMNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
 | **PTBLM** | text | 42068 | 42068 | 3761 | 3761 |
 | **UD_English** | text | 12543 | - | 2077 | - |
@@ -1,4 +1,4 @@
-julia 0.5
+julia 0.6
 ImageCore 0.1.2
 ColorTypes 0.4
 GZip
 
@@ -3,7 +3,7 @@ module CIFAR10
 
 using BinDeps
 
-const defdir = joinpath(Pkg.dir("MLDatasets"), "datasets/cifar10")
+const defdir = joinpath(Pkg.dir("MLDatasets"), "datasets", "cifar10")
 
 function getdata(dir)
     mkpath(dir)
@@ -25,7 +25,7 @@ function readdata(data::Vector{UInt8})
 end
 
 function traindata(dir=defdir)
-    files = ["$(dir)/cifar-10-batches-bin/data_batch_$(i).bin" for i=1:5]
+    files = [joinpath(dir,"cifar-10-batches-bin","data_batch_$i.bin") for i=1:5]
     all(isfile, files) || getdata(dir)
     data = UInt8[]
     for file in files
@@ -35,7 +35,7 @@ function traindata(dir=defdir)
 end
 
 function testdata(dir=defdir)
-    file = "$(dir)/cifar-10-batches-bin/test_batch.bin"
+    file = joinpath(dir,"cifar-10-batches-bin","test_batch.bin")
     isfile(file) || getdata(dir)
     readdata(open(read,file))
 end
 
@@ -3,7 +3,7 @@ module CIFAR100
 
 using BinDeps
 
-const defdir = joinpath(Pkg.dir("MLDatasets"), "datasets/cifar100")
+const defdir = joinpath(Pkg.dir("MLDatasets"), "datasets","cifar100")
 
 function getdata(dir)
     mkpath(dir)
@@ -25,13 +25,13 @@ function readdata(data::Vector{UInt8})
 end
 
 function traindata(dir=defdir)
-    file = joinpath(dir, "cifar-100-binary/train.bin")
+    file = joinpath(dir, "cifar-100-binary","train.bin")
     isfile(file) || getdata(dir)
     readdata(open(read,file))
 end
 
 function testdata(dir=defdir)
-    file = joinpath(dir, "cifar-100-binary/test.bin")
+    file = joinpath(dir, "cifar-100-binary","test.bin")
     isfile(file) || getdata(dir)
     readdata(open(read,file))
 end
 
@@ -0,0 +1,74 @@
+export FashionMNIST
+module FashionMNIST
+    using ImageCore
+    using ColorTypes
+    import ..downloaded_file
+    import ..download_helper
+    import ..DownloadSettings
+    import ..MNIST.convert2image
+    import ..MNIST.convert2features
+    import ..MNIST.Reader
+
+    export
+
+        traintensor,
+        testtensor,
+
+        trainlabels,
+        testlabels,
+
+        traindata,
+        testdata,
+
+        convert2image,
+        convert2features,
+
+        download
+
+    const DEFAULT_DIR = abspath(joinpath(@__DIR__, "..", "..", "datasets", "fashion_mnist"))
+
+    const TRAINIMAGES = "train-images-idx3-ubyte.gz"
+    const TRAINLABELS = "train-labels-idx1-ubyte.gz"
+    const TESTIMAGES  = "t10k-images-idx3-ubyte.gz"
+    const TESTLABELS  = "t10k-labels-idx1-ubyte.gz"
+
+    const CLASSES = [
+        "T-Shirt",
+        "Trouser",
+        "Pullover",
+        "Dress",
+        "Coat",
+        "Sandal",
+        "Shirt",
+        "Sneaker",
+        "Bag",
+        "Ankle boot"
+    ]
+
+    const SETTINGS = DownloadSettings(
+        "http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/",
+        """
+        Dataset: Fashion-MNIST
+        Authors: Han Xiao, Kashif Rasul, Roland Vollgraf
+        Website: https://github.com/zalandoresearch/fashion-mnist
+        License: MIT
+
+        [Han Xiao et al. 2017]
+            Han Xiao, Kashif Rasul, and Roland Vollgraf.
+            "Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms."
+            arXiv:1708.07747
+
+        The files are available for download at the offical
+        website linked above. We can download these files for you
+        if you wish, but that doesn't free you from the burden of
+        using the data responsibly and respect lincense and
+        authorship.
+        """,
+        [TRAINIMAGES, TRAINLABELS, TESTIMAGES, TESTLABELS]
+    )
+
+    download(dir = DEFAULT_DIR; kw...) =
+        download_helper(SETTINGS, dir; kw...)
+
+    include("interface.jl")
+end
@@ -0,0 +1,82 @@
+# Fashion-MNIST
+
+Description from the [official website](https://github.com/zalandoresearch/fashion-mnist)
+
+> Fashion-MNIST is a dataset of Zalando's article
+> images—consisting of a training set of 60,000 examples and a
+> test set of 10,000 examples. Each example is a 28x28 grayscale
+> image, associated with a label from 10 classes. We intend
+> Fashion-MNIST to serve as a direct drop-in replacement for the
+> original MNIST dataset for benchmarking machine learning
+> algorithms. It shares the same image size and structure of
+> training and testing splits.
+
+## Usage
+
+This sub-module provides a programmatic interface to download,
+load, and work with the MNIST dataset of handwritten digits.
+
+```julia
+using MLDatasets
+
+# download dataset
+FashionMNIST.download()
+
+# load full training set
+train_x, train_y = FashionMNIST.traindata()
+
+# load full test set
+test_x,  test_y  = FashionMNIST.testdata()
+```
+
+The provided functions also allow for optional arguments, such as
+the directory `dir` where the dataset is located, or the specific
+observation `indices` that one wants to work with. For more
+information on the interface take a look at the documentation
+(e.g. `?FashionMNIST.traindata`).
+
+Function | Description
+---------|-------------
+`download([dir])` | Trigger interactive download of the dataset
+`traintensor([indices]; [dir], [decimal=true])` | Load the training images as an array
+`trainlabels([indices]; [dir])` | Load the labels for the training images
+`testtensor([indices]; [dir], [decimal=true])` | Load the test images as an array
+`testlabels([indices]; [dir])` | Load the labels for the test images
+`traindata([indices]; [dir], [decimal=true])` | Load images and labels of the training data
+`testdata([indices]; [dir], [decimal=true])` | Load images and labels of the test data
+
+This module also provides utility functions to make working with
+the FashionMNIST dataset in Julia more convenient.
+
+You can use the function `convert2features` to convert the given
+FashionMNIST tensor to a feature matrix (or feature vector in the case
+of a single image). The purpose of this function is to drop the
+spatial dimensions such that traditional ML algorithms can
+process the dataset.
+
+```julia
+julia> FashionMNIST.convert2features(FashionMNIST.traintensor()) # full training data
+784×60000 Array{Float64,2}:
+[...]
+```
+
+To visualize an image or a prediction we provide the function
+`convert2image` to convert the given FashionMNIST horizontal-major
+tensor (or feature matrix) to a vertical-major `Colorant` array.
+The values are also color corrected according to the website's
+description, which means that the digits are black on a white
+background.
+
+```julia
+julia> FashionMNIST.convert2image(FashionMNIST.traintensor(1)) # first training image
+28×28 Array{Gray{Float64},2}:
+[...]
+```
+
+## References
+
+- **Authors**: Han Xiao, Kashif Rasul, Roland Vollgraf
+
+- **Website**: https://github.com/zalandoresearch/fashion-mnist
+
+- **[Han Xiao et al. 2017]** Han Xiao, Kashif Rasul, and Roland Vollgraf. "Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms." arXiv:1708.07747
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-julia 0.5`
	`1`	`+julia 0.6`
`2`	`2`	`ImageCore 0.1.2`
`3`	`3`	`ColorTypes 0.4`
`4`	`4`	`GZip`