Skip to content

Commit e3e08c8

Browse files
authored
Tutorial - Object Detection using HOG (#42)
* added docs * changes * Added scripts for downloading and preprocessing data * added test and editted doc * editted docstring
1 parent 90d1222 commit e3e08c8

File tree

11 files changed

+126
-1
lines changed

11 files changed

+126
-1
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ makedocs(format = :html,
1010
"FREAK" => "tutorials/freak.md",
1111
"Gray level co-occurence matrix" => "tutorials/glcm.md",
1212
"Local binary patterns" => "tutorials/lbp.md",
13+
"Object Detection using HOG" => "tutorials/object_detection.md"
1314
],
1415
"Function reference" => "function_reference.md",
1516
],

docs/src/img/boxes.jpg

81.9 KB
Loading

docs/src/img/human1.png

5.52 KB
Loading

docs/src/img/human2.png

7.3 KB
Loading

docs/src/img/human3.png

6.54 KB
Loading

docs/src/img/humans.jpg

81.1 KB
Loading

docs/src/img/not-human1.jpg

5.25 KB
Loading

docs/src/img/scores.png

5.15 KB
Loading
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Object Detection using HOG
2+
3+
In this tutorial, we will use Histogram of Oriented Gradient (HOG) feature descriptor based linear SVM to create a person detector. We will first create
4+
a person classifier and then use this classifier with a sliding window to identify and localize people in an image.
5+
6+
The key challenge in creating a classifier is that it needs to work with variations in illumination, pose and occlusions in the image. To achieve this, we will train
7+
the classifier on an intermediate representation of the image instead of the pixel-based representation. Our ideal representation (commonly called feature vector)
8+
captures information which is useful for classification but is invariant to small changes in illumination and occlusions. HOG descriptor is a gradient-based
9+
representation which is invariant to local geometric and photometric changes (i.e. shape and illumination changes) and so is a good choice for our problem. In fact HOG descriptors are widely used for object detection.
10+
11+
Download the script to get the training data [here](https://drive.google.com/open?id=0B9V0KF3ZHWtWR1dBR2VZUDctUGc). Download tutorial.zip, decompress it and run get_data.bash. (Change the variable `path_to_tutorial` in preprocess.jl and path to julia executable in get_data.bash). This script will download the required datasets. We will start by loading the data and computing HOG features of all the images.
12+
13+
```julia
14+
using Images, ImageFeatures
15+
16+
path_to_tutorial = ""
17+
pos_examples = "path_to_tutorial/tutorial/humans/"
18+
neg_examples = "path_to_tutorial/tutorial/not_humans/"
19+
20+
n_pos = length(readdir(pos_examples)) # number of positive training examples
21+
n_neg = length(readdir(neg_examples)) # number of negative training examples
22+
n = n_pos + n_neg # number of training examples
23+
data = Array{Float64}(3780, n) # Array to store HOG descriptor of each image. Each image in our training data has size 128x64 and so has a 3780 length
24+
labels = Vector{Int}(n) # Vector to store label (1=human, 0=not human) of each image.
25+
26+
for (i, file) in enumerate([readdir(pos_examples); readdir(neg_examples)])
27+
filename = "$(i <= n_pos ? pos_examples : neg_examples )/$file"
28+
img = load(filename)
29+
data[:, i] = create_descriptor(img, HOG())
30+
labels[i] = (i <= n_pos ? 1 : 0)
31+
end
32+
```
33+
34+
Basically we now have an encoded version of images in our training data. This encoding captures useful information but discards extraneous information
35+
(illumination changes, pose variations etc). We will train a linear SVM on this data.
36+
37+
```julia
38+
using LIBSVM
39+
40+
#Split the dataset into train and test set. Train set = 2500 images, Test set = 294 images.
41+
random_perm = randperm(n)
42+
train_ind = random_perm[1:2500]
43+
test_ind = random_perm[2501:end]
44+
45+
model = svmtrain(data[:, train_ind], labels[train_ind]);
46+
```
47+
48+
Now let's test this classifier on some images.
49+
50+
```julia
51+
img = load("$pos_examples/per00003.ppm")
52+
descriptor = Array{Float64}(3780, 1)
53+
descriptor[:, 1] = create_descriptor(img, HOG())
54+
55+
predicted_label, _ = svmpredict(model, descriptor);
56+
print(predicted_label) # 1=human, 0=not human
57+
58+
# Get test accuracy of our model
59+
predicted_labels, decision_values = svmpredict(model, data[:, test_ind]);
60+
@printf "Accuracy: %.2f%%\n" mean((predicted_labels .== labels[test_ind]))*100 # test accuracy should be > 98%
61+
```
62+
63+
Try testing our trained model on more images. You can see that it performs quite well.
64+
65+
| ![Original](../img/human1.png) | ![Original](../img/human2.png) |
66+
|:------:|:---:|
67+
| predicted_label = 1 | predicted_label = 1 |
68+
69+
| ![Original](../img/human3.png) | ![Original](../img/not-human1.jpg) |
70+
|:------:|:---:|
71+
| predicted_label = 1 | predicted_label = 0 |
72+
73+
Next we will use our trained classifier with a sliding window to localize persons in an image.
74+
75+
![Original](../img/humans.jpg)
76+
77+
```julia
78+
img = load("path_to_tutorial/tutorial/humans.jpg")
79+
rows, cols = size(img)
80+
81+
scores = Array{Float64}(22, 45)
82+
descriptor = Array{Float64}(3780, 1)
83+
84+
#Apply classifier using a sliding window approach and store classification score for not-human at every location in score array
85+
for j in 32:10:cols-32
86+
for i in 64:10:rows-64
87+
box = img[i-63:i+64, j-31:j+32]
88+
descriptor[:, 1] = create_descriptor(box, HOG())
89+
predicted_label, s = svmpredict(model, descriptor);
90+
scores[Int((i-64)/10)+1, Int((j-32)/10)+1] = s[1]
91+
end
92+
end
93+
```
94+
95+
![Original](../img/scores.png)
96+
97+
You can see that classifier gave low score to not-human class (i.e. high score to human class) at positions corresponding to humans in the original image.
98+
Below we threshold the image and supress non-minimal values to get the human locations. We then plot the bounding boxes using `ImageDraw`.
99+
100+
```julia
101+
using ImageDraw, ImageView
102+
103+
scores[scores.>0] = 0
104+
object_locations = findlocalminima(scores)
105+
106+
rectangles = [[((i[2]-1)*10+1, (i[1]-1)*10+1), ((i[2]-1)*10+64, (i[1]-1)*10+1), ((i[2]-1)*10+64, (i[1]-1)*10+128), ((i[2]-1)*10+1, (i[1]-1)*10+128)] for i in object_locations];
107+
108+
for rec in rectangles
109+
draw!(img, Polygon(rec), RGB{N0f8}(0, 0, 1.0))
110+
end
111+
imshow(img)
112+
```
113+
114+
![Original](../img/boxes.jpg)
115+
116+
In our example we were lucky that the persons in our image had roughly the same size (128x64) as examples in our train set. We will generally need to take bounding boxes across multiple scales (and multiple aspect ratios for some object classes).
117+
118+
119+
120+

src/hog.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ function create_hog_descriptor(mag::AbstractArray{T, 2}, phase::AbstractArray{T,
112112
end
113113

114114
function trilinear_interpolate!(hist, w, θ, orientations, i, cell_size, cell_rows, cell_cols, rows, cols)
115-
bin_θ1 = floor(Int, θ*orientations/180) + 1
115+
bin_θ1 = min(floor(Int, θ*orientations/180) + 1, orientations)
116116
bin_θ2 = bin_θ1%orientations + 1
117117
b_θ = 180/orientations
118118

0 commit comments

Comments
 (0)