Skip to content

Commit 2c19fa1

Browse files
committed
add docs for SVHM
1 parent f6b418d commit 2c19fa1

File tree

7 files changed

+172
-13
lines changed

7 files changed

+172
-13
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
7070
[**FashionMNIST**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/FashionMNIST/) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
7171
[**CIFAR-10**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/CIFAR10/) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
7272
[**CIFAR-100**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/CIFAR100/) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
73-
[**SVHN-2**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/SVHN2/)(*) | 10 | 32x32x3x73257 | 73257 | 32x32x3x26032 | 26032
73+
[**SVHN-2**](https://juliaml.github.io/MLDatasets.jl/latest/datasets/SVHN2/) (*) | 10 | 32x32x3x73257 | 73257 | 32x32x3x26032 | 26032
7474

7575
(*) Note that the SVHN-2 dataset provides an additional 531131 observations aside from the training- and testset
7676

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ makedocs(
1818
"Fashion MNIST" => "datasets/FashionMNIST.md",
1919
"CIFAR-10" => "datasets/CIFAR10.md",
2020
"CIFAR-100" => "datasets/CIFAR100.md",
21+
"SVHN format 2" => "datasets/SVHN2.md",
2122
],
2223
],
2324
hide("Indices" => "indices.md"),

docs/src/datasets/SVHN2.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# [The Street View House Numbers (SVHN) Dataset](@id SVHN2)
2+
3+
Description from the [official
4+
website](http://ufldl.stanford.edu/housenumbers/):
5+
6+
> SVHN is a real-world image dataset for developing machine
7+
> learning and object recognition algorithms with minimal
8+
> requirement on data preprocessing and formatting. It can be
9+
> seen as similar in flavor to MNIST (e.g., the images are of
10+
> small cropped digits), but incorporates an order of magnitude
11+
> more labeled data (over 600,000 digit images) and comes from a
12+
> significantly harder, unsolved, real world problem (recognizing
13+
> digits and numbers in natural scene images). SVHN is obtained
14+
> from house numbers in Google Street View images.
15+
16+
About Format 2 (Cropped Digits):
17+
18+
> All digits have been resized to a fixed resolution of 32-by-32
19+
> pixels. The original character bounding boxes are extended in
20+
> the appropriate dimension to become square windows, so that
21+
> resizing them to 32-by-32 pixels does not introduce aspect
22+
> ratio distortions. Nevertheless this preprocessing introduces
23+
> some distracting digits to the sides of the digit of interest.
24+
25+
!!! note
26+
27+
For non-commercial use only
28+
29+
## Contents
30+
31+
```@contents
32+
Pages = ["SVHN2.md"]
33+
Depth = 3
34+
```
35+
36+
## Overview
37+
38+
The `MLDatasets.SVHN2` sub-module provides a programmatic
39+
interface to download, load, and work with the SVHN2 dataset of
40+
handwritten digits.
41+
42+
```julia
43+
using MLDatasets
44+
45+
# load full training set
46+
train_x, train_y = SVHN2.traindata()
47+
48+
# load full test set
49+
test_x, test_y = SVHN2.testdata()
50+
51+
# load additional train set
52+
extra_x, extra_y = SVHN2.extradata()
53+
```
54+
55+
The provided functions also allow for optional arguments, such as
56+
the directory `dir` where the dataset is located, or the specific
57+
observation `indices` that one wants to work with. For more
58+
information on the interface take a look at the documentation
59+
(e.g. `?SVHN2.traindata`).
60+
61+
Function | Description
62+
---------|-------------
63+
[`download([dir])`](@ref SVHN2.download) | Trigger interactive download of the dataset
64+
[`classnames()`](@ref SVHN2.classnames) | Return the class names as a vector of strings
65+
[`traindata([T], [indices]; [dir])`](@ref SVHN2.traindata) | Load images and labels of the training data
66+
[`testdata([T], [indices]; [dir])`](@ref SVHN2.testdata) | Load images and labels of the test data
67+
[`extradata([T], [indices]; [dir])`](@ref SVHN2.extradata) | Load images and labels of the extra training data
68+
69+
This module also provides utility functions to make working with
70+
the SVHN (format 2) dataset in Julia more convenient.
71+
72+
Function | Description
73+
---------|-------------
74+
[`convert2features(array)`](@ref SVHN2.convert2features) | Convert the SVHN tensor to a flat feature matrix
75+
[`convert2image(array)`](@ref SVHN2.convert2image) | Convert the SVHN tensor/matrix to a colorant array
76+
77+
You can use the function
78+
[`convert2features`](@ref SVHN2.convert2features) to convert
79+
the given SVHN tensor to a feature matrix (or feature vector
80+
in the case of a single image). The purpose of this function is
81+
to drop the spatial dimensions such that traditional ML
82+
algorithms can process the dataset.
83+
84+
```julia
85+
julia> SVHN2.convert2features(SVHN2.traindata()[1]) # full training data
86+
3072×73257 Array{N0f8,2}:
87+
[...]
88+
```
89+
90+
To visualize an image or a prediction we provide the function
91+
[`convert2image`](@ref SVHN2.convert2image) to convert the
92+
given SVHN2 horizontal-major tensor (or feature matrix) to a
93+
vertical-major `Colorant` array.
94+
95+
```julia
96+
julia> SVHN2.convert2image(SVHN2.traindata(1)[1]) # first training image
97+
32×32 Array{RGB{N0f8},2}:
98+
[...]
99+
```
100+
101+
## API Documentation
102+
103+
```@docs
104+
SVHN2
105+
```
106+
107+
### Trainingset
108+
109+
```@docs
110+
SVHN2.traindata
111+
```
112+
113+
### Testset
114+
115+
```@docs
116+
SVHN2.testdata
117+
```
118+
119+
### Extraset
120+
121+
```@docs
122+
SVHN2.extradata
123+
```
124+
125+
### Utilities
126+
127+
```@docs
128+
SVHN2.download
129+
SVHN2.classnames
130+
SVHN2.convert2features
131+
SVHN2.convert2image
132+
```
133+
134+
## References
135+
136+
- **Authors**: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
137+
138+
- **Website**: http://ufldl.stanford.edu/housenumbers
139+
140+
- **[Netzer et al., 2011]** Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. "Reading Digits in Natural Images with Unsupervised Feature Learning" NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011

docs/src/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,9 @@ Dataset | Classes | `traintensor` | `trainlabels` | `testtensor` | `testlabels`
7373
[**FashionMNIST**](@ref FashionMNIST) | 10 | 28x28x60000 | 60000 | 28x28x10000 | 10000
7474
[**CIFAR-10**](@ref CIFAR10) | 10 | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000
7575
[**CIFAR-100**](@ref CIFAR100) | 100 (20) | 32x32x3x50000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2)
76+
[**SVHN-2**](@ref SVHN2) (*) | 10 | 32x32x3x73257 | 73257 | 32x32x3x26032 | 26032
77+
78+
(*) Note that the SVHN-2 dataset provides an additional 531131 observations aside from the training- and testset
7679

7780
### Language Modeling
7881

src/SVHN2/SVHN2.jl

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ export SVHN2
33
"""
44
The Street View House Numbers (SVHN) Dataset
55
6-
Authors: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
7-
Website: http://ufldl.stanford.edu/housenumbers
6+
- Authors: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
7+
- Website: http://ufldl.stanford.edu/housenumbers
88
99
SVHN was obtained from house numbers in Google Street View
1010
images. As such they are quite diverse in terms of orientation
@@ -17,14 +17,15 @@ additional to use as extra training data.
1717
1818
## Interface
1919
20-
- [SVHN2.traindata](@ref)
21-
- [SVHN2.testdata](@ref)
22-
- [SVHN2.extradata](@ref)
20+
- [`SVHN2.traindata`](@ref)
21+
- [`SVHN2.testdata`](@ref)
22+
- [`SVHN2.extradata`](@ref)
2323
2424
## Utilities
2525
26-
- [SVHN2.convert2features](@ref)
27-
- [SVHN2.convert2image](@ref)
26+
- [`SVHN2.classnames`](@ref)
27+
- [`SVHN2.convert2features`](@ref)
28+
- [`SVHN2.convert2image`](@ref)
2829
"""
2930
module SVHN2
3031
using DataDeps
@@ -54,6 +55,19 @@ module SVHN2
5455
const EXTRADATA = "extra_32x32.mat"
5556
const CLASSES = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
5657

58+
"""
59+
download([dir]; [i_accept_the_terms_of_use])
60+
61+
Trigger the (interactive) download of the full dataset into
62+
"<`dir`>/$DEPNAME". If no `dir` is provided the dataset will
63+
be downloaded into "~/.julia/datadeps/$DEPNAME".
64+
65+
This function will display an interactive dialog unless
66+
either the keyword parameter `i_accept_the_terms_of_use` or
67+
the environment variable `DATADEPS_ALWAY_ACCEPT` is set to
68+
`true`. Note that using the data responsibly and respecting
69+
copyright/terms-of-use remains your responsibility.
70+
"""
5771
download(args...; kw...) = download_dep(DEPNAME, args...; kw...)
5872

5973
include("interface.jl")
@@ -87,6 +101,7 @@ module SVHN2
87101
dataset.
88102
""",
89103
"http://ufldl.stanford.edu/housenumbers/" .* [TRAINDATA, TESTDATA, EXTRADATA],
104+
"2fa3b0b79baf39de36ed7579e6947760e6241f4c52b6b406cabc44d654c13a50"
90105
)
91106
end
92107
end

src/SVHN2/interface.jl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ train_x, train_y = SVHN2.traindata(2) # only second observation
3535
train_x, train_y = SVHN2.traindata(dir="./SVHN") # custom folder
3636
```
3737
38-
$(download_docstring("SVHN", DEPNAME))
38+
$(download_docstring("SVHN2", DEPNAME))
3939
"""
4040
function traindata(args...; dir = nothing)
4141
traindata(N0f8, args...; dir = dir)
@@ -83,7 +83,7 @@ test_x, test_y = SVHN2.testdata(2) # only second observation
8383
test_x, test_y = SVHN2.testdata(dir="./SVHN") # custom folder
8484
```
8585
86-
$(download_docstring("SVHN", DEPNAME))
86+
$(download_docstring("SVHN2", DEPNAME))
8787
"""
8888
function testdata(args...; dir = nothing)
8989
testdata(N0f8, args...; dir = dir)
@@ -131,7 +131,7 @@ extra_x, extra_y = SVHN2.extradata(2) # only second observation
131131
extra_x, extra_y = SVHN2.extradata(dir="./SVHN") # custom folder
132132
```
133133
134-
$(download_docstring("SVHN", DEPNAME))
134+
$(download_docstring("SVHN2", DEPNAME))
135135
"""
136136
function extradata(args...; dir = nothing)
137137
extradata(N0f8, args...; dir = dir)

src/SVHN2/utils.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ julia> SVHN2.convert2features(SVHN2.traindata(Float32)[1]) # full training data
1111
3072×50000 Array{Float32,2}:
1212
[...]
1313
14-
julia> SVHN2.convert2features(SVHN2.traindata(Float32)[1][:,:,:,1]) # first observation
14+
julia> SVHN2.convert2features(SVHN2.traindata(Float32,1)[1]) # first observation
1515
3072-element Array{Float32,1}:
1616
[...]
1717
```
@@ -45,7 +45,7 @@ julia> SVHN2.convert2image(SVHN2.traindata()[1]) # full training dataset
4545
32×32×50000 Array{RGB{N0f8},3}:
4646
[...]
4747
48-
julia> SVHN2.convert2image(SVHN2.traindata()[1][:,:,:,1]) # first training image
48+
julia> SVHN2.convert2image(SVHN2.traindata(1)[1]) # first training image
4949
32×32 Array{RGB{N0f8},2}:
5050
[...]
5151
```

0 commit comments

Comments
 (0)