Skip to content

Commit 4ea6a04

Browse files
committed
rework CIFAR-100 for DataDeps
1 parent 211cf78 commit 4ea6a04

File tree

9 files changed

+900
-60
lines changed

9 files changed

+900
-60
lines changed

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,13 @@ information
3838
#### CIFAR-100
3939

4040
The [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)
41-
dataset consists of 600 32x32 color images in 100 classes. The
41+
dataset consists of 60000 32x32 color images in 100 classes. The
4242
100 classes are grouped into 20 superclasses (fine and coarse
4343
labels).
4444

45+
Take a look at the [sub-module](src/CIFAR100/README.md) for more
46+
information
47+
4548
#### MNIST
4649

4750
The [MNIST](http://yann.lecun.com/exdb/mnist/) dataset consists
@@ -101,7 +104,7 @@ testdata = UD_English.devdata()
101104
| | Type | Train x | Train y | Test x | Test y |
102105
|:---:|:---:|:---:|:---:|:---:|:---:|
103106
| **CIFAR-10** | image | 32x32x3x50000 | 50000 | 32x32x3x10000 | 10000 |
104-
| **CIFAR-100** | image | 32x32x3x500 | 2x500 | 32x32x3x100 | 2x100 |
107+
| **CIFAR-100** | image | 32x32x3x5000 | 50000 (x2) | 32x32x3x10000 | 10000 (x2) |
105108
| **MNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
106109
| **FashionMNIST** | image | 28x28x60000 | 60000 | 28x28x10000 | 10000 |
107110
| **PTBLM** | text | 42068 | 42068 | 3761 | 3761 |

src/CIFAR100.jl

Lines changed: 0 additions & 39 deletions
This file was deleted.

src/CIFAR100/CIFAR100.jl

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
export CIFAR100
2+
module CIFAR100
3+
using DataDeps
4+
using BinDeps
5+
using FixedPointNumbers
6+
using ..bytes_to_type
7+
using ..datafile
8+
using ..download_dep
9+
using ..download_docstring
10+
import ..CIFAR10.convert2image
11+
import ..CIFAR10.convert2features
12+
13+
export
14+
15+
classnames_coarse,
16+
classnames_fine,
17+
18+
traintensor,
19+
testtensor,
20+
21+
trainlabels,
22+
testlabels,
23+
24+
traindata,
25+
testdata,
26+
27+
convert2image,
28+
convert2features,
29+
30+
download
31+
32+
const DEPNAME = "CIFAR100"
33+
const TRAINSET_FILENAME = joinpath("cifar-100-binary", "train.bin")
34+
const TESTSET_FILENAME = joinpath("cifar-100-binary", "test.bin")
35+
const COARSE_FILENAME = joinpath("cifar-100-binary", "coarse_label_names.txt")
36+
const FINE_FILENAME = joinpath("cifar-100-binary", "fine_label_names.txt")
37+
38+
const TRAINSET_SIZE = 50_000
39+
const TESTSET_SIZE = 10_000
40+
41+
download(args...; kw...) = download_dep(DEPNAME, args...; kw...)
42+
43+
include(joinpath("Reader","Reader.jl"))
44+
include("interface.jl")
45+
46+
function __init__()
47+
RegisterDataDep(
48+
DEPNAME,
49+
"""
50+
Dataset: The CIFAR-100 dataset
51+
Authors: Alex Krizhevsky, Vinod Nair, Geoffrey Hinton
52+
Website: https://www.cs.toronto.edu/~kriz/cifar.html
53+
Reference: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
54+
55+
[Krizhevsky, 2009]
56+
Alex Krizhevsky.
57+
"Learning Multiple Layers of Features from Tiny Images",
58+
Tech Report, 2009.
59+
60+
The CIFAR-100 dataset is a labeled subsets of the 80
61+
million tiny images dataset. It consists of 60000
62+
32x32 colour images in 100 classes. Specifically, it
63+
has 100 classes containing 600 images each. There are
64+
500 training images and 100 testing images per class.
65+
The 100 classes in the CIFAR-100 are grouped into 20
66+
superclasses. Each image comes with a "fine" label
67+
(the class to which it belongs) and a "coarse" label
68+
(the superclass to which it belongs).
69+
70+
The compressed archive file that contains the
71+
complete dataset is available for download at the
72+
offical website linked above; specifically the binary
73+
version for C programs. We can download and unpack
74+
this archive for you if you wish, but that doesn't
75+
free you from the burden of using the data
76+
responsibly and respect copyright. The authors of
77+
CIFAR-100 aren't really explicit about any terms of
78+
use, so please read the website to make sure you want
79+
to download the dataset.
80+
""",
81+
"https://www.cs.toronto.edu/~kriz/cifar-100-binary.tar.gz",
82+
"58a81ae192c23a4be8b1804d68e518ed807d710a4eb253b1f2a199162a40d8ec",
83+
fetch_method = (src, dst) -> run(BinDeps.download_cmd(src, dst)),
84+
post_fetch_method = file -> (run(BinDeps.unpack_cmd(file,dirname(file), ".gz", ".tar")); rm(file))
85+
)
86+
end
87+
end

src/CIFAR100/README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# CIFAR-100
2+
3+
Description from the [original
4+
website](https://www.cs.toronto.edu/~kriz/cifar.html)
5+
6+
> The CIFAR-10 and CIFAR-100 are labeled subsets of the
7+
> [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/)
8+
> dataset. They were collected by Alex Krizhevsky, Vinod Nair,
9+
> and Geoffrey Hinton.
10+
>
11+
> This dataset is just like the CIFAR-10, except it has 100
12+
> classes containing 600 images each. There are 500 training
13+
> images and 100 testing images per class. The 100 classes in the
14+
> CIFAR-100 are grouped into 20 superclasses. Each image comes
15+
> with a "fine" label (the class to which it belongs) and a
16+
> "coarse" label (the superclass to which it belongs).
17+
18+
## Usage
19+
20+
This sub-module provides a programmatic interface to download,
21+
load, and work with the CIFAR-100 dataset.
22+
23+
```julia
24+
using MLDatasets
25+
26+
# download dataset
27+
CIFAR100.download()
28+
29+
# load full training set
30+
train_x, train_y_coarse, train_y_fine = CIFAR100.traindata()
31+
32+
# load full test set
33+
test_x, test_y_coarse, test_y_fine = CIFAR100.testdata()
34+
```
35+
36+
The provided functions also allow for optional arguments, such as
37+
the directory `dir` where the dataset is located, or the specific
38+
observation `indices` that one wants to work with. For more
39+
information on the interface take a look at the documentation
40+
(e.g. `?CIFAR100.traindata`).
41+
42+
Function | Description
43+
---------|-------------
44+
`download([dir])` | Trigger interactive download of the dataset
45+
`classnames_coarse()` | Return the 20 super-class names as a vector of strings
46+
`classnames_fine()` | Return the 100 class names as a vector of strings
47+
`traintensor([T], [indices]; [dir])` | Load the training images as an array of eltype `T`
48+
`trainlabels([indices]; [dir])` | Load the labels for the training images
49+
`testtensor([T], [indices]; [dir])` | Load the test images as an array of eltype `T`
50+
`testlabels([indices]; [dir])` | Load the labels for the test images
51+
`traindata([T], [indices]; [dir])` | Load images and labels of the training data
52+
`testdata([T], [indices]; [dir])` | Load images and labels of the test data
53+
54+
This module also provides utility functions to make working with
55+
the CIFAR100 dataset in Julia more convenient.
56+
57+
You can use the function `convert2features` to convert the given
58+
CIFAR100 tensor to a feature matrix (or feature vector in the case
59+
of a single image). The purpose of this function is to drop the
60+
spatial dimensions such that traditional ML algorithms can
61+
process the dataset.
62+
63+
```julia
64+
julia> CIFAR100.convert2features(CIFAR100.traintensor()) # full training data
65+
3072×50000 Array{N0f8,2}:
66+
[...]
67+
```
68+
69+
To visualize an image or a prediction we provide the function
70+
`convert2image` to convert the given CIFAR100 horizontal-major
71+
tensor (or feature matrix) to a vertical-major `Colorant` array.
72+
73+
```julia
74+
julia> CIFAR100.convert2image(CIFAR100.traintensor(1)) # first training image
75+
32×32 Array{RGB{N0f8},2}:
76+
[...]
77+
```
78+
79+
## References
80+
81+
- **Authors**: Alex Krizhevsky, Vinod Nair, Geoffrey Hinton
82+
83+
- **Website**: https://www.cs.toronto.edu/~kriz/cifar.html
84+
85+
- **[Krizhevsky, 2009]** Alex Krizhevsky. ["Learning Multiple Layers of Features from Tiny Images"](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf), Tech Report, 2009.

src/CIFAR100/Reader/Reader.jl

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
module Reader
2+
3+
export
4+
5+
readdata!,
6+
readdata
7+
8+
const NROW = 32
9+
const NCOL = 32
10+
const NCHAN = 3
11+
const NBYTE = NROW * NCOL * NCHAN + 2 # "+ 2" for label
12+
13+
function readnext!(buffer::Array{UInt8}, io::IO)
14+
c = Int(read(io, UInt8))
15+
f = Int(read(io, UInt8))
16+
read!(io, buffer)
17+
buffer, c, f
18+
end
19+
20+
function readdata!(buffer::Array{UInt8}, io::IO, index::Integer)
21+
seek(io, (index - 1) * NBYTE)
22+
readnext!(buffer, io)
23+
end
24+
25+
function readdata(io::IO, nobs::Int, index::Integer)
26+
buffer = Array{UInt8}(NROW, NCOL, NCHAN)
27+
readdata!(buffer, io, index)
28+
end
29+
30+
function readdata(io::IO, nobs::Int)
31+
X = Array{UInt8}(NROW, NCOL, NCHAN, nobs)
32+
C = Array{Int}(nobs)
33+
F = Array{Int}(nobs)
34+
buffer = Array{UInt8}(NROW, NCOL, NCHAN)
35+
@inbounds for index in 1:nobs
36+
_, tc, tf = readnext!(buffer, io)
37+
copy!(view(X,:,:,:,index), buffer)
38+
C[index] = tc
39+
F[index] = tf
40+
end
41+
X, C, F
42+
end
43+
44+
function readdata(file::AbstractString, nobs::Int, index::Integer)
45+
open(file, "r") do io
46+
readdata(io, nobs, index)
47+
end::Tuple{Array{UInt8,3},Int,Int}
48+
end
49+
50+
function readdata(file::AbstractString, nobs::Int)
51+
open(file, "r") do io
52+
readdata(io, nobs)
53+
end::Tuple{Array{UInt8,4},Vector{Int},Vector{Int}}
54+
end
55+
56+
end

0 commit comments

Comments
 (0)