add docs for SVHM

Evizero · Evizero · commit 2c19fa1c03d2 · 2018-02-26T17:32:10.000+01:00
diff --git a/README.md b/README.md
@@ -70,7 +70,7 @@ Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
 [**FashionMNIST**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/FashionMNIST/) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
 [**CIFAR-10**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/CIFAR10/) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
 [**CIFAR-100**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/CIFAR100/) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
-[**SVHN-2**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/SVHN2/)(*) | 10 | 32x32x3x73257 | 73257 | 32x32x3x26032 | 26032
+[**SVHN-2**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/SVHN2/) (*) | 10 | 32x32x3x73257 | 73257 | 32x32x3x26032 | 26032
 
 (*) Note that the SVHN-2 dataset provides an additional 531131 observations aside from the training- and testset
 
diff --git a/docs/make.jl b/docs/make.jl
@@ -18,6 +18,7 @@ makedocs(
                 "Fashion MNIST" => "datasets/FashionMNIST.md",
                 "CIFAR-10" => "datasets/CIFAR10.md",
                 "CIFAR-100" => "datasets/CIFAR100.md",
+                "SVHN format 2" => "datasets/SVHN2.md",
             ],
         ],
         hide("Indices" => "indices.md"),
diff --git a/docs/src/datasets/SVHN2.md b/docs/src/datasets/SVHN2.md
@@ -0,0 +1,140 @@
+# [The Street View House Numbers (SVHN) Dataset](@id SVHN2)
+
+Description from the [official
+website](http://ufldl.stanford.edu/housenumbers/):
+
+> SVHN is a real-world image dataset for developing machine
+> learning and object recognition algorithms with minimal
+> requirement on data preprocessing and formatting. It can be
+> seen as similar in flavor to MNIST (e.g., the images are of
+> small cropped digits), but incorporates an order of magnitude
+> more labeled data (over 600,000 digit images) and comes from a
+> significantly harder, unsolved, real world problem (recognizing
+> digits and numbers in natural scene images). SVHN is obtained
+> from house numbers in Google Street View images.
+
+About Format 2 (Cropped Digits):
+
+> All digits have been resized to a fixed resolution of 32-by-32
+> pixels. The original character bounding boxes are extended in
+> the appropriate dimension to become square windows, so that
+> resizing them to 32-by-32 pixels does not introduce aspect
+> ratio distortions. Nevertheless this preprocessing introduces
+> some distracting digits to the sides of the digit of interest.
+
+!!! note
+
+    For non-commercial use only
+
+## Contents
+
+```@contents
+Pages = ["SVHN2.md"]
+Depth = 3
+```
+
+## Overview
+
+The `MLDatasets.SVHN2` sub-module provides a programmatic
+interface to download, load, and work with the SVHN2 dataset of
+handwritten digits.
+
+```julia
+using MLDatasets
+
+# load full training set
+train_x, train_y = SVHN2.traindata()
+
+# load full test set
+test_x,  test_y  = SVHN2.testdata()
+
+# load additional train set
+extra_x, extra_y = SVHN2.extradata()
+```
+
+The provided functions also allow for optional arguments, such as
+the directory `dir` where the dataset is located, or the specific
+observation `indices` that one wants to work with. For more
+information on the interface take a look at the documentation
+(e.g. `?SVHN2.traindata`).
+
+Function | Description
+---------|-------------
+[`download([dir])`](@ref SVHN2.download) | Trigger interactive download of the dataset
+[`classnames()`](@ref SVHN2.classnames) | Return the class names as a vector of strings
+[`traindata([T], [indices]; [dir])`](@ref SVHN2.traindata) | Load images and labels of the training data
+[`testdata([T], [indices]; [dir])`](@ref SVHN2.testdata) | Load images and labels of the test data
+[`extradata([T], [indices]; [dir])`](@ref SVHN2.extradata) | Load images and labels of the extra training data
+
+This module also provides utility functions to make working with
+the SVHN (format 2) dataset in Julia more convenient.
+
+Function | Description
+---------|-------------
+[`convert2features(array)`](@ref SVHN2.convert2features) | Convert the SVHN tensor to a flat feature matrix
+[`convert2image(array)`](@ref SVHN2.convert2image) | Convert the SVHN tensor/matrix to a colorant array
+
+You can use the function
+[`convert2features`](@ref SVHN2.convert2features) to convert
+the given SVHN tensor to a feature matrix (or feature vector
+in the case of a single image). The purpose of this function is
+to drop the spatial dimensions such that traditional ML
+algorithms can process the dataset.
+
+```julia
+julia> SVHN2.convert2features(SVHN2.traindata()[1]) # full training data
+3072×73257 Array{N0f8,2}:
+[...]
+```
+
+To visualize an image or a prediction we provide the function
+[`convert2image`](@ref SVHN2.convert2image) to convert the
+given SVHN2 horizontal-major tensor (or feature matrix) to a
+vertical-major `Colorant` array.
+
+```julia
+julia> SVHN2.convert2image(SVHN2.traindata(1)[1]) # first training image
+32×32 Array{RGB{N0f8},2}:
+[...]
+```
+
+## API Documentation
+
+```@docs
+SVHN2
+```
+
+### Trainingset
+
+```@docs
+SVHN2.traindata
+```
+
+### Testset
+
+```@docs
+SVHN2.testdata
+```
+
+### Extraset
+
+```@docs
+SVHN2.extradata
+```
+
+### Utilities
+
+```@docs
+SVHN2.download
+SVHN2.classnames
+SVHN2.convert2features
+SVHN2.convert2image
+```
+
+## References
+
+- **Authors**: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
+
+- **Website**: http://ufldl.stanford.edu/housenumbers
+
+- **[Netzer et al., 2011]** Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. "Reading Digits in Natural Images with Unsupervised Feature Learning" NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -73,6 +73,9 @@ Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
 [**FashionMNIST**](@ref FashionMNIST) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
 [**CIFAR-10**](@ref CIFAR10) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
 [**CIFAR-100**](@ref CIFAR100) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
+[**SVHN-2**](@ref SVHN2) (*) | 10 | 32x32x3x73257 | 73257 | 32x32x3x26032 | 26032
+
+(*) Note that the SVHN-2 dataset provides an additional 531131 observations aside from the training- and testset
 
 ### Language Modeling
 
diff --git a/src/SVHN2/SVHN2.jl b/src/SVHN2/SVHN2.jl
@@ -3,8 +3,8 @@ export SVHN2
 """
 The Street View House Numbers (SVHN) Dataset
 
-Authors: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
-Website: http://ufldl.stanford.edu/housenumbers
+- Authors: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
+- Website: http://ufldl.stanford.edu/housenumbers
 
 SVHN was obtained from house numbers in Google Street View
 images. As such they are quite diverse in terms of orientation
@@ -17,14 +17,15 @@ additional to use as extra training data.
 
 ## Interface
 
-- [SVHN2.traindata](@ref)
-- [SVHN2.testdata](@ref)
-- [SVHN2.extradata](@ref)
+- [`SVHN2.traindata`](@ref)
+- [`SVHN2.testdata`](@ref)
+- [`SVHN2.extradata`](@ref)
 
 ## Utilities
 
-- [SVHN2.convert2features](@ref)
-- [SVHN2.convert2image](@ref)
+- [`SVHN2.classnames`](@ref)
+- [`SVHN2.convert2features`](@ref)
+- [`SVHN2.convert2image`](@ref)
 """
 module SVHN2
     using DataDeps
@@ -54,6 +55,19 @@ module SVHN2
     const EXTRADATA = "extra_32x32.mat"
     const CLASSES = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
 
+    """
+        download([dir]; [i_accept_the_terms_of_use])
+
+    Trigger the (interactive) download of the full dataset into
+    "<`dir`>/$DEPNAME". If no `dir` is provided the dataset will
+    be downloaded into "~/.julia/datadeps/$DEPNAME".
+
+    This function will display an interactive dialog unless
+    either the keyword parameter `i_accept_the_terms_of_use` or
+    the environment variable `DATADEPS_ALWAY_ACCEPT` is set to
+    `true`. Note that using the data responsibly and respecting
+    copyright/terms-of-use remains your responsibility.
+    """
     download(args...; kw...) = download_dep(DEPNAME, args...; kw...)
 
     include("interface.jl")
@@ -87,6 +101,7 @@ module SVHN2
             dataset.
             """,
             "http://ufldl.stanford.edu/housenumbers/" .* [TRAINDATA, TESTDATA, EXTRADATA],
+            "2fa3b0b79baf39de36ed7579e6947760e6241f4c52b6b406cabc44d654c13a50"
         )
     end
 end
diff --git a/src/SVHN2/interface.jl b/src/SVHN2/interface.jl
@@ -35,7 +35,7 @@ train_x, train_y = SVHN2.traindata(2) # only second observation
 train_x, train_y = SVHN2.traindata(dir="./SVHN") # custom folder
 ```
 
-$(download_docstring("SVHN", DEPNAME))
+$(download_docstring("SVHN2", DEPNAME))
 """
 function traindata(args...; dir = nothing)
     traindata(N0f8, args...; dir = dir)
@@ -83,7 +83,7 @@ test_x, test_y = SVHN2.testdata(2) # only second observation
 test_x, test_y = SVHN2.testdata(dir="./SVHN") # custom folder
 ```
 
-$(download_docstring("SVHN", DEPNAME))
+$(download_docstring("SVHN2", DEPNAME))
 """
 function testdata(args...; dir = nothing)
     testdata(N0f8, args...; dir = dir)
@@ -131,7 +131,7 @@ extra_x, extra_y = SVHN2.extradata(2) # only second observation
 extra_x, extra_y = SVHN2.extradata(dir="./SVHN") # custom folder
 ```
 
-$(download_docstring("SVHN", DEPNAME))
+$(download_docstring("SVHN2", DEPNAME))
 """
 function extradata(args...; dir = nothing)
     extradata(N0f8, args...; dir = dir)
diff --git a/src/SVHN2/utils.jl b/src/SVHN2/utils.jl
@@ -11,7 +11,7 @@ julia> SVHN2.convert2features(SVHN2.traindata(Float32)[1]) # full training data
 3072×50000 Array{Float32,2}:
 [...]
 
-julia> SVHN2.convert2features(SVHN2.traindata(Float32)[1][:,:,:,1]) # first observation
+julia> SVHN2.convert2features(SVHN2.traindata(Float32,1)[1]) # first observation
 3072-element Array{Float32,1}:
 [...]
 ```
@@ -45,7 +45,7 @@ julia> SVHN2.convert2image(SVHN2.traindata()[1]) # full training dataset
 32×32×50000 Array{RGB{N0f8},3}:
 [...]
 
-julia> SVHN2.convert2image(SVHN2.traindata()[1][:,:,:,1]) # first training image
+julia> SVHN2.convert2image(SVHN2.traindata(1)[1]) # first training image
 32×32 Array{RGB{N0f8},2}:
 [...]
 ```