Skip to content

Commit 35a4749

Browse files
committed
feat: added mask detection training article + livelabs QA + grammarly
1 parent f6c1eaf commit 35a4749

27 files changed

+580
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Lab 2: Augment Dataset & Train Model
2+
3+
Estimated Time: 40 minutes
4+
5+
## Task 1: Hyperparameters & Checkpoints
6+
7+
The most important part of training a model is choosing the right **hyperparameters**. In this section, I'll explain the parameters I usually use, and why these are recommended for this specific problem.
8+
9+
Then, once we have the hyperparameters set, we just need to launch the training process.
10+
11+
### Training Parameters
12+
13+
We're ready to make a couple of extra decisions regarding which parameters we'll use during training.
14+
15+
It's important to choose the right parameters, as doing otherwise can cause terrible models to be created. So, let's dive deep into what's important about training parameters. Official documentation can be found [here](https://docs.ultralytics.com/config/).
16+
17+
* `--device`: specifies which CUDA device (or by default, CPU) we want to use. Since we're working with an OCI CPU Instance, let's set this to "cpu", which will perform training with the machine's CPU.
18+
* `--epochs`: the total number of epochs we want to train the model for. If the model doesn't find an improvement during training. I set this to 3000 epochs, although my model converged very precisely long before the 3000th epoch was done.
19+
20+
> **Note**: YOLOv5 (and lots of Neural Networks) implement a function called **early stopping/patience**, which will stop training before the specified number of epochs if it can't find a way to improve the mAPs (Mean Average Precision) for any class.
21+
22+
* `--batch`: the batch size. I set this to either 16 images per batch, or 32. Setting a lower value (and considering that my dataset already has 10,000 images) is usually a *bad practice* and can cause instability.
23+
* `--lr`: I set the learning rate to 0.01 by default.
24+
* `--img` (image size): this parameter was probably the one that gave me the most trouble. I initially thought that all images -- if trained with a specific image size -- must always follow this size; however, you don't need to worry about this due to image subsampling and other techniques that are implemented to avoid this issue. This value needs to be the maximum value between the height and width of the pictures, averaged across the dataset.
25+
* `--save_period`: specifies how often the model should save a copy of the state. For example, if I set this to 25, it will create a YOLOv5 checkpoint that I can use every 25 trained epochs.
26+
* `--hyp`: specifies a custom YAML file that will contain the set of hyperparameters for our model. We will talk more specifically about this property in the next section.
27+
28+
> **Note**: if I have 1,000 images with an average width of 1920 and height of 1080, I'll probably create a model of image size = 640, and subsample my images. If I have issues with detections, perhaps I'll create a model with a higher image size value, but training time will ramp up, and inference will also require more computing power.
29+
30+
### YOLO Checkpoints - Which one to choose from?
31+
32+
The second and last decision we need to make is which YOLOv5 checkpoint we're going to start from. It's **highly recommended** that you start training from one of the possible checkpoints:
33+
34+
![yolov5 checkpoints](./images/yolov5_performance.jpg)
35+
36+
> **Note**: you can also start training 100% from scratch, without any checkpoints. You should only do this if what you're trying to detect has never been reproduced before, e.g. astrophotography. The upside of using a checkpoint is that YOLOv5 has already been trained up to a point, with real-world data. So, anything that resembles the real world can easily be trained from a checkpoint, which will help you reduce training time (and therefore expense).
37+
38+
The higher the average precision from each checkpoint, the more parameters it contains (typically). Here's a detailed comparison with all available pre-trained checkpoints:
39+
40+
| Model | size<br><sup>(pixels)</sup> | Mean Average Precision<sup>val<br>50-95</sup> | Mean Average Precision<sup>val<br>50</sup> | Speed<br><sup>CPU b1<br>(ms)</sup> | Speed<br><sup>V100 b1<br>(ms)</sup> | Speed<br><sup>V100 b32<br>(ms)</sup> | Number of parameters<br><sup>(M)</sup> | FLOPs<br><sup>@640 (B)</sup> |
41+
| ----- | ------------ | ------------------------------ | --------------------------- | --------------- | ---------------- | ----------------- | ----------------------- | ------------- |
42+
| [YOLOv5n](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt) | 640 | 28.0 | 45.7 | **45** | **6.3** | **0.6** | **1.9** | **4.5** |
43+
| [YOLOv5s](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt) | 640 | 37.4 | 56.8 | 98 | 6.4 | 0.9 | 7.2 | 16.5 |
44+
| [YOLOv5m](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5m.pt) | 640 | 45.4 | 64.1 | 224 | 8.2 | 1.7 | 21.2 | 49.0 |
45+
| [YOLOv5l](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5l.pt) | 640 | 49.0 | 67.3 | 430 | 10.1 | 2.7 | 46.5 | 109.1 |
46+
| [YOLOv5x](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5x.pt) | 640 | 50.7 | 68.9 | 766 | 12.1 | 4.8 | 86.7 | 205.7 |
47+
| | | | | | | | | |
48+
| [YOLOv5n6](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n6.pt) | 1280 | 36.0 | 54.4 | 153 | 8.1 | 2.1 | 3.2 | 4.6 |
49+
| [YOLOv5s6](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s6.pt) | 1280 | 44.8 | 63.7 | 385 | 8.2 | 3.6 | 12.6 | 16.8 |
50+
| [YOLOv5m6](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5m6.pt) | 1280 | 51.3 | 69.3 | 887 | 11.1 | 6.8 | 35.7 | 50.0 |
51+
| [YOLOv5l6](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5l6.pt) | 1280 | 53.7 | 71.3 | 1784 | 15.8 | 10.5 | 76.8 | 111.4 |
52+
| [YOLOv5x6](https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5x6.pt)<br>+[TTA](https://github.com/ultralytics/yolov5/issues/303) | 1280<br>1536 | 55.0<br>**55.8** | 72.7<br>**72.7** | 3136<br>- | 26.2<br>- | 19.4<br>- | 140.7<br>- | 209.8<br>- |
53+
54+
> **Note**: all checkpoints have been trained for 300 epochs with the default settings (find all of them [in the official docs](https://docs.ultralytics.com/config/)). The nano and small version use [these hyperparameters](https://github.com/ultralytics/yolov5/blob/master/data/hyps/hyp.scratch-low.yaml), all others use [these](https://github.com/ultralytics/yolov5/blob/master/data/hyps/hyp.scratch-high.yaml).
55+
56+
YOLOv8 also has checkpoints with the above naming convention, so if you're using YOLOv8 instead of YOLOv5 you will still need to decide which checkpoint is best for your problem.
57+
58+
Also, note that - if we want to create a model with an _`image size>640`_ - we should select those YOLOv5 checkpoints that end with the number `6` in the end.
59+
60+
So, for this model, since I will use 640 pixels, we will just create a first version using **YOLOv5s**, and another one with **YOLOv5x**. You only really need to train one, but if you have extra time, it will be interesting to see the differences between two (or more) models when doing training against the same dataset.
61+
62+
## Task 2: Augment Dataset
63+
64+
In this part, we're going to augment our dataset. Now that we have some knowledge of the set of checkpoints and training parameters we can specify, I'm going to focus on a parameter that is **specifically created** for data augmentation: _`--hyp`__.
65+
66+
This option allows us to specify a custom YAML file that will hold the values for all hyperparameters of our Computer Vision model.
67+
68+
In our YOLOv5 repository, we go to the default YAML path:
69+
70+
```
71+
<copy>
72+
cd /home/$USER/yolov5/data/hyps/
73+
</copy>
74+
```
75+
76+
Now, we can copy one of these files and start modifying these hyperparameters at our convenience. For this specific problem, I'm not going to use all customizations, since we already augmented our dataset in the previous workshop quite a lot. Therefore, I will explain the augmentations that are usually used for a problem of this type.
77+
78+
Here are all available augmentations:
79+
80+
![augmentation types](./images/initial_parameters.png)
81+
82+
The most notable ones are:
83+
- _`lr0`_: initial learning rate. If you want to use SGD optimizer, set this option to `0.01`. If you want to use ADAM, set it to `0.001`.
84+
- _`hsv_h`_, _`hsv_s`_, _`hsv_v`_: allows us to control HSV modifications to the image. We can either change the **H**ue, **S**aturation, or **V**alue of the image.
85+
- _`degrees`_: it will rotate the image and let the model learn how to detect objects in different directions of the camera.
86+
- _`translate`_: translating the image will displace it to the right or to the left.
87+
- _`scale`_: it will resize selected images (more or less % gain).
88+
- _`shear`_: it will create new images from a new viewing perspective. The changing axis is horizontal but works like opening a door in real life.
89+
- _`flipud`_, _`fliplr`_: they will simply take an image and flip it either "upside down" or "left to right", which will generate exact copies of the image but in reverse. This will teach the model how to detect objects from different angles of a camera. Also notice that _`flipud`_ works in very limited scenarios: mostly with satellite imagery. And _`fliplr`_ is better suited for ground pictures of any sort (which envelops 99% of Computer Vision models nowadays).
90+
- _`mosaic`_: this will take four images from the dataset and create a mosaic. This is particularly useful when we want to teach the model to detect smaller-than-usual objects, as each detection from the mosaic will be "harder" for the model: each object we want to predict will be represented by fewer pixels.
91+
- _`mixup`_: I have found this augmentation method particularly useful when training **classification** models. It will mix two images, one with more transparency and one with less, and let the model learn the differences between two _problematic_ classes.
92+
93+
Once we create a separate YAML file for our custom augmentation, we can use it in training as a parameter by setting the _`--hyp`_ option. We'll see how to do that right below.
94+
95+
## Task 3: Train Model
96+
97+
Now that we have our hyperparameters and checkpoint chosen, we just need to run the following commands. To execute training, we first navigate to YOLOv5's cloned repository path:
98+
99+
```
100+
<copy>
101+
cd /home/$USER/yolov5
102+
</copy>
103+
```
104+
105+
And then, start training:
106+
107+
```
108+
<copy>
109+
~/anaconda3/bin/python train.py --img 640 --data <data.yaml path in dataset> --weights <yolo checkpoint path> --name <final_training_project_name> --save-period 25 --device cpu --batch 16 --epochs 3000
110+
</copy>
111+
```
112+
> **Note**: if you don't specify a custom _`--hyp`_ file, augmentation will still happen in the background, but it won't be customizable. Refer to the YOLO checkpoint section above to see which default YAML file is used by which checkpoint by default. However, if you want to specify custom augmentations, make sure to add this option to the command above.
113+
114+
```
115+
<copy>
116+
# for yolov5s
117+
~/anaconda3/bin/python train.py --img 640 --data ./datasets/y5_mask_model_v1/data.yaml --weights yolov5s.pt --name markdown --save-period 25 --device cpu --batch 16 --epochs 3000
118+
119+
# for yolov5x
120+
~/anaconda3/bin/python train.py --img 640 --data ./datasets/y5_mask_model_v1/data.yaml --weights yolov5x.pt --name y5_mask_detection --save-period 25 --device cpu --batch 16 --epochs 3000
121+
</copy>
122+
```
123+
124+
And the model will start training. Depending on the size of the dataset, each epoch will take more or less time. In my case, with 10.000 images, each epoch took about 2 minutes to train and 20 seconds to validate.
125+
126+
![Training GIF](./images/training.gif)
127+
128+
For each epoch, we will have broken-down information about epoch training time and mAP for the model, so we can see how our model progresses over time.
129+
130+
## Task 4: Check Results
131+
132+
After the training is done, we can have a look at the results. Visualizations are provided automatically, and they are pretty similar to what we discovered in the previous workshop using RoboFlow Train.
133+
134+
Some images, visualizations, and statistics about training are saved in the destination folder. With these visualizations, we can improve our understanding of our data, mean average precisions, and many other things which will help us improve the model upon the next iteration.
135+
136+
For example, we can see how well each class in our dataset is represented:
137+
138+
![Number of instances per class](./images/num_instances.jpg)
139+
140+
> **Note**: this means that both the `incorrect` and `no mask` classes are underrepresented if we compare them to the `mask` class. An idea for the future is to increase the number of examples for both of these underrepresented classes.
141+
142+
The confusion matrix tells us how many predictions from images in the validation set were correct, and how many weren't:
143+
144+
![confusion matrix](./images/confusion_matrix.jpg)
145+
146+
As we have previously specified, our model autosaves its training progress every 25 epochs with the _`--save-period`_ option. This will cause the resulting directory to be about will about 1GB.
147+
148+
In the end, we only care about the best-performing models out of all the checkpoints, so let us keep _`best.pt`_ as the best model for the training we performed (the model with the highest mAP of all checkpoints) and delete all others.
149+
150+
The model took **168** epochs to finish (early stopping happened, so it found the best model at the 68th epoch), with an average of **10 minutes** per epoch.
151+
152+
Remember that training time can be significantly reduced if you try this with a GPU. You can rent an OCI GPU at a fraction of the price you will find other GPUs in other Cloud vendors. For example, I did originally train this model with 2 OCI Compute NVIDIA V100s *just for **$2.50/hr***, and training time went from ~30 hours to about 6 hours.
153+
154+
This is a list of the mAPs, broken down by each class type.
155+
156+
![results](./images/results.jpg)
157+
158+
The model has a notable mAP of **70%**. This is awesome, but this can always be improved with a bigger dataset and fine-tuning our augmentation and training hyperparameters. Keep in mind that real-world problems, like this one, will never achieve 100% accuracy due to the nature of the problem
159+
160+
## Acknowledgements
161+
162+
* **Author** - Nacho Martinez, Data Science Advocate @ Oracle DevRel
163+
* **Last Updated By/Date** - March 6th, 2023
45.1 KB
Loading
198 KB
Loading
17.8 MB
Loading
51.5 KB
Loading
90.5 KB
Loading
883 KB
Loading
187 KB
Loading
63.3 KB
Loading
231 KB
Loading

0 commit comments

Comments
 (0)