|
| 1 | +# --- |
| 2 | +# cover: assets/hog.gif |
| 3 | +# title: Object Detection using HOG |
| 4 | +# description: This demo shows HOG descriptor |
| 5 | +# author: Anchit Navelkar, Ashwani Rathee |
| 6 | +# date: 2021-07-12 |
| 7 | +# --- |
| 8 | + |
| 9 | +# In this tutorial, we will use Histogram of Oriented Gradient (HOG) feature |
| 10 | +# descriptor based linear SVM to create a person detector. We will first create |
| 11 | +# a person classifier and then use this classifier with a sliding window to |
| 12 | +# identify and localize people in an image. |
| 13 | + |
| 14 | +# The key challenge in creating a classifier is that it needs to work with |
| 15 | +# variations in illumination, pose and occlusions in the image. To achieve this, |
| 16 | +# we will train the classifier on an intermediate representation of the image |
| 17 | +# instead of the pixel-based representation. Our ideal representation (commonly |
| 18 | +# called feature vector) captures information which is useful for classification |
| 19 | +# but is invariant to small changes in illumination and occlusions. HOG descriptor |
| 20 | +# is a gradient-based representation which is invariant to local geometric and |
| 21 | +# photometric changes (i.e. shape and illumination changes) and so is a good |
| 22 | +# choice for our problem. In fact HOG descriptors are widely used for object detection. |
| 23 | + |
| 24 | +# Download the script to get the training data [here](https://drive.google.com/file/d/11G_9zh9N-0veQ2EL5WDGsnxRpihsqLX5/view?usp=sharing). |
| 25 | +# Download tutorial.zip, decompress it and run get_data.bash. (Change the |
| 26 | +# variable `path_to_tutorial` in preprocess.jl and path to julia executable |
| 27 | +# in get_data.bash). This script will download the required datasets. We will |
| 28 | +# start by loading the data and computing HOG features of all the images. |
| 29 | + |
| 30 | +# ```julia |
| 31 | +# using Images, ImageFeatures |
| 32 | + |
| 33 | +# path_to_tutorial = "" # specify this path |
| 34 | +# pos_examples = "$path_to_tutorial/tutorial/humans/" |
| 35 | +# neg_examples = "$path_to_tutorial/tutorial/not_humans/" |
| 36 | + |
| 37 | +# n_pos = length(readdir(pos_examples)) # number of positive training examples |
| 38 | +# n_neg = length(readdir(neg_examples)) # number of negative training examples |
| 39 | +# n = n_pos + n_neg # number of training examples |
| 40 | +# data = Array{Float64}(undef, 3780, n) # Array to store HOG descriptor of each image. Each image in our training data has size 128x64 and so has a 3780 length |
| 41 | +# labels = Vector{Int}(undef, n) # Vector to store label (1=human, 0=not human) of each image. |
| 42 | + |
| 43 | +# for (i, file) in enumerate([readdir(pos_examples); readdir(neg_examples)]) |
| 44 | +# filename = "$(i <= n_pos ? pos_examples : neg_examples )/$file" |
| 45 | +# img = load(filename) |
| 46 | +# data[:, i] = create_descriptor(img, HOG()) |
| 47 | +# labels[i] = (i <= n_pos ? 1 : 0) |
| 48 | +# end |
| 49 | +# ``` |
| 50 | + |
| 51 | +# Basically we now have an encoded version of images in our training data. |
| 52 | +# This encoding captures useful information but discards extraneous information |
| 53 | +# (illumination changes, pose variations etc). We will train a linear SVM on this data. |
| 54 | + |
| 55 | +# ```julia |
| 56 | +# using LIBSVM |
| 57 | + |
| 58 | +# #Split the dataset into train and test set. Train set = 2500 images, Test set = 294 images. |
| 59 | +# random_perm = randperm(n) |
| 60 | +# train_ind = random_perm[1:2500] |
| 61 | +# test_ind = random_perm[2501:end] |
| 62 | + |
| 63 | +# model = svmtrain(data[:, train_ind], labels[train_ind]); |
| 64 | +# ``` |
| 65 | + |
| 66 | +# Now let's test this classifier on some images. |
| 67 | + |
| 68 | +# ```julia |
| 69 | +# img = load("$pos_examples/per00003.ppm") |
| 70 | +# descriptor = Array{Float64}(3780, 1) |
| 71 | +# descriptor[:, 1] = create_descriptor(img, HOG()) |
| 72 | + |
| 73 | +# predicted_label, _ = svmpredict(model, descriptor); |
| 74 | +# print(predicted_label) # 1=human, 0=not human |
| 75 | + |
| 76 | +# # Get test accuracy of our model |
| 77 | +# predicted_labels, decision_values = svmpredict(model, data[:, test_ind]); |
| 78 | +# @printf "Accuracy: %.2f%%\n" mean((predicted_labels .== labels[test_ind])) * 100 # test accuracy should be > 98% |
| 79 | +# ``` |
| 80 | + |
| 81 | +# Try testing our trained model on more images. You can see that it performs quite well. |
| 82 | +# Image |
| 83 | + |
| 84 | + |
| 85 | +# |  |  | |
| 86 | +# |:------:|:---:| |
| 87 | +# | predicted_label = 1 | predicted_label = 1 | |
| 88 | + |
| 89 | +# |  |  | |
| 90 | +# |:------:|:---:| |
| 91 | +# | predicted_label = 1 | predicted_label = 0 | |
| 92 | + |
| 93 | +# Next we will use our trained classifier with a sliding window to localize persons in an image. |
| 94 | + |
| 95 | +#  |
| 96 | + |
| 97 | +# ```julia |
| 98 | +# img = load("path_to_tutorial/tutorial/humans.jpg") |
| 99 | +# rows, cols = size(img) |
| 100 | + |
| 101 | +# scores = Array{Float64}(22, 45) |
| 102 | +# descriptor = Array{Float64}(3780, 1) |
| 103 | + |
| 104 | +# #Apply classifier using a sliding window approach and store classification score for not-human at every location in score array |
| 105 | +# for j = 32:10:cols-32 |
| 106 | +# for i = 64:10:rows-64 |
| 107 | +# box = img[i-63:i+64, j-31:j+32] |
| 108 | +# descriptor[:, 1] = create_descriptor(box, HOG()) |
| 109 | +# predicted_label, s = svmpredict(model, descriptor) |
| 110 | +# scores[Int((i - 64) / 10)+1, Int((j - 32) / 10)+1] = s[1] |
| 111 | +# end |
| 112 | +# end |
| 113 | +# ``` |
| 114 | + |
| 115 | +#  |
| 116 | + |
| 117 | +# You can see that classifier gave low score to not-human class (i.e. |
| 118 | +# high score to human class) at positions corresponding to humans in |
| 119 | +# the original image. |
| 120 | +# Below we threshold the image and supress non-minimal values to get |
| 121 | +# the human locations. We then plot the bounding boxes using `ImageDraw`. |
| 122 | + |
| 123 | +# ```julia |
| 124 | +# using ImageDraw, ImageView |
| 125 | + |
| 126 | +# scores[scores.>0] = 0 |
| 127 | +# object_locations = findlocalminima(scores) |
| 128 | + |
| 129 | +# rectangles = [ |
| 130 | +# [ |
| 131 | +# ((i[2] - 1) * 10 + 1, (i[1] - 1) * 10 + 1), |
| 132 | +# ((i[2] - 1) * 10 + 64, (i[1] - 1) * 10 + 1), |
| 133 | +# ((i[2] - 1) * 10 + 64, (i[1] - 1) * 10 + 128), |
| 134 | +# ((i[2] - 1) * 10 + 1, (i[1] - 1) * 10 + 128), |
| 135 | +# ] for i in object_locations |
| 136 | +# ]; |
| 137 | + |
| 138 | +# for rec in rectangles |
| 139 | +# draw!(img, Polygon(rec), RGB{N0f8}(0, 0, 1.0)) |
| 140 | +# end |
| 141 | +# imshow(img) |
| 142 | +# ``` |
| 143 | + |
| 144 | +#  |
| 145 | + |
| 146 | +# In our example we were lucky that the persons in our image had roughly |
| 147 | +# the same size (128x64) as examples in our train set. We will generally |
| 148 | +# need to take bounding boxes across multiple scales (and multiple |
| 149 | +# aspect ratios for some object classes). |
| 150 | + |
| 151 | +using FileIO #src |
| 152 | +img1 = load("assets/humans.jpg") #src |
| 153 | +img2 = load("assets/boxes.jpg") #src |
| 154 | +save("assets/hog.gif", cat(img1[1:342,1:342], img2[1:342,1:342]; dims=3); fps=2) #src |
| 155 | + |
0 commit comments