You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: neural_networks_hero/augment_train/augment_train.md
+26-27Lines changed: 26 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,15 +6,13 @@ Estimated Time: 40 minutes
6
6
7
7
In this section, we're going to learn about the benefits of augmenting datasets, the different ways in which this can be achieved; and how to properly train a model using on-demand infrastructure (with Oracle Cloud Infrastructure).
8
8
9
-
10
9
### Prerequisites
11
10
12
11
* It's highly recommended to have completed [the first workshop](../../workshops/mask_detection_labeling/index.html) before starting to do this one, as we'll use some files and datasets that come from our work in the first workshop.
13
12
14
13
* An [Oracle Free Tier, Paid or LiveLabs Cloud Account](https://signup.cloud.oracle.com/?language=en&sourceType=:ow:de:ce::::RC_WWMK220210P00063:LoL_handsonLab_introduction&intcmp=:ow:de:ce::::RC_WWMK220210P00063:LoL_handsonLab_introduction)
15
14
* Active Oracle Cloud Account with available credits to use for Data Science service.
16
15
17
-
18
16
### Objectives
19
17
20
18
In this lab, you will complete the following steps:
@@ -39,7 +37,6 @@ It's important to choose the right parameters, as doing otherwise can cause terr
39
37
40
38
*`--device`: specifies which CUDA device (or by default, CPU) we want to use. Since we're working with an OCI CPU Instance, let's set this to "cpu", which will perform training with the machine's CPU.
41
39
*`--epochs`: the total number of epochs we want to train the model for. If the model doesn't find an improvement during training. I set this to 3000 epochs, although my model converged very precisely long before the 3000th epoch was done.
42
-
43
40
> **Note**: YOLOv5 (and lots of Neural Networks) implement a function called **early stopping/patience**, which will stop training before the specified number of epochs if it can't find a way to improve the mAPs (Mean Average Precision) for any class.
44
41
45
42
*`--batch`: the batch size. I set this to either 16 images per batch, or 32. Setting a lower value (and considering that my dataset already has 10,000 images) is usually a *bad practice* and can cause instability.
@@ -78,7 +75,7 @@ The higher the average precision from each checkpoint, the more parameters it co
78
75
79
76
YOLOv8 also has checkpoints with the above naming convention, so if you're using YOLOv8 instead of YOLOv5 you will still need to decide which checkpoint is best for your problem.
80
77
81
-
Also, note that - if we want to create a model with an _`image size>640`_ - we should select those YOLOv5 checkpoints that end with the number `6` in the end.
78
+
Also, note that - if we want to create a model with an *`image size>640`* - we should select those YOLOv5 checkpoints that end with the number `6` in the end.
82
79
83
80
So, for this model, since I will use 640 pixels, we will just create a first version using **YOLOv5s**, and another one with **YOLOv5x**. You only really need to train one, but if you have extra time, it will be interesting to see the differences between two (or more) models when doing training against the same dataset.
84
81
@@ -90,19 +87,19 @@ Image augmentation is a process through which you create new images based on exi
90
87
91
88
To make a decision as to what augmentations to apply and how they should be configured, we should ask yourselves the following:
92
89
93
-
_What types of augmentations will generate data that is beneficial for our use case?_
90
+
*What types of augmentations will generate data that is beneficial for our use case?*
94
91
95
92
For example, in the case of aerial images, they might be taken in the early morning when the sun is rising, during the day when the sky is clear, during a cloudy day, and in the early evening. During these times, there will be different levels of brightness in the sky and thus in the images. Thus, modifying the brightness of images can be considered a **great** augmentation for this example.
96
93
97
94
If we see a decrease in performance from our model with this augmentation, we can always roll the augmentation back by reverting back to an earlier version of our dataset.
98
95
99
-
Now that we have some knowledge of the set of checkpoints and training parameters we can specify, I'm going to focus on a parameter that is **specifically created** for data augmentation: _`--hyp`_.
96
+
Now that we have some knowledge of the set of checkpoints and training parameters we can specify, I'm going to focus on a parameter that is **specifically created** for data augmentation: *`--hyp`*.
100
97
101
98
This option allows us to specify a custom YAML file that will hold the values for all hyperparameters of our Computer Vision model.
102
99
103
100
In our YOLOv5 repository, we go to the default YAML path:
104
101
105
-
```
102
+
```bash
106
103
<copy>
107
104
cd /home/$USER/yolov5/data/hyps/
108
105
</copy>
@@ -115,17 +112,18 @@ Here are all available augmentations:
-_`lr0`_: initial learning rate. If you want to use SGD optimizer, set this option to `0.01`. If you want to use ADAM, set it to `0.001`.
119
-
-_`hsv_h`_, _`hsv_s`_, _`hsv_v`_: allows us to control HSV modifications to the image. We can either change the **H**ue, **S**aturation, or **V**alue of the image. You can effectively change the brightness of a picture by modifying the _`hsv_v`_ parameter, which carries image information about intensity.
120
-
-_`degrees`_: it will rotate the image and let the model learn how to detect objects in different directions of the camera.
121
-
-_`translate`_: translating the image will displace it to the right or to the left.
122
-
-_`scale`_: it will resize selected images (more or less % gain).
123
-
-_`shear`_: it will create new images from a new viewing perspective (randomly distort an image across its horizontal or vertical axis.) The changing axis is horizontal but works like opening a door in real life. RoboFlow also supports vertical shear.
124
-
-_`flipud`_, _`fliplr`_: they will simply take an image and flip it either "upside down" or "left to right", which will generate exact copies of the image but in reverse. This will teach the model how to detect objects from different angles of a camera. Also notice that _`flipud`_ works in very limited scenarios: mostly with satellite imagery. And _`fliplr`_ is better suited for ground pictures of any sort (which envelops 99% of Computer Vision models nowadays).
125
-
-_`mosaic`_: this will take four images from the dataset and create a mosaic. This is particularly useful when we want to teach the model to detect smaller-than-usual objects, as each detection from the mosaic will be "harder" for the model: each object we want to predict will be represented by fewer pixels.
126
-
-_`mixup`_: I have found this augmentation method particularly useful when training **classification** models. It will mix two images, one with more transparency and one with less, and let the model learn the differences between two _problematic_ classes.
127
-
128
-
Once we create a separate YAML file for our custom augmentation, we can use it in training as a parameter by setting the _`--hyp`_ option. We'll see how to do that right below.
115
+
116
+
**`lr0`*: initial learning rate. If you want to use SGD optimizer, set this option to `0.01`. If you want to use ADAM, set it to `0.001`.
117
+
**`hsv_h`*, *`hsv_s`*, *`hsv_v`*: allows us to control HSV modifications to the image. We can either change the **H**ue, **S**aturation, or **V**alue of the image. You can effectively change the brightness of a picture by modifying the *`hsv_v`* parameter, which carries image information about intensity.
118
+
**`degrees`*: it will rotate the image and let the model learn how to detect objects in different directions of the camera.
119
+
**`translate`*: translating the image will displace it to the right or to the left.
120
+
**`scale`*: it will resize selected images (more or less % gain).
121
+
**`shear`*: it will create new images from a new viewing perspective (randomly distort an image across its horizontal or vertical axis.) The changing axis is horizontal but works like opening a door in real life. RoboFlow also supports vertical shear.
122
+
**`flipud`*, *`fliplr`*: they will simply take an image and flip it either "upside down" or "left to right", which will generate exact copies of the image but in reverse. This will teach the model how to detect objects from different angles of a camera. Also notice that *`flipud`* works in very limited scenarios: mostly with satellite imagery. And *`fliplr`* is better suited for ground pictures of any sort (which envelops 99% of Computer Vision models nowadays).
123
+
**`mosaic`*: this will take four images from the dataset and create a mosaic. This is particularly useful when we want to teach the model to detect smaller-than-usual objects, as each detection from the mosaic will be "harder" for the model: each object we want to predict will be represented by fewer pixels.
124
+
**`mixup`*: I have found this augmentation method particularly useful when training **classification** models. It will mix two images, one with more transparency and one with less, and let the model learn the differences between two *problematic* classes.
125
+
126
+
Once we create a separate YAML file for our custom augmentation, we can use it in training as a parameter by setting the *`--hyp`* option. We'll see how to do that right below.
129
127
130
128
RoboFlow also supports more augmentations. Here's an figure with their available augmentations:
> **Note**: if you don't specify a custom _`--hyp`_ file, augmentation will still happen in the background, but it won't be customizable. Refer to the YOLO checkpoint section above to see which default YAML file is used by which checkpoint by default. However, if you want to specify custom augmentations, make sure to add this option to the command above.
154
151
155
-
```
152
+
> **Note**: if you don't specify a custom *`--hyp`* file, augmentation will still happen in the background, but it won't be customizable. Refer to the YOLO checkpoint section above to see which default YAML file is used by which checkpoint by default. However, if you want to specify custom augmentations, make sure to add this option to the command above.
@@ -166,11 +165,11 @@ And the model will start training. Depending on the size of the dataset, each ep
166
165
167
166

168
167
169
-
For each epoch, we will have broken-down information about epoch training time and mAP for the model, so we can see how our model progresses over time.
168
+
For each epoch, we will have broken-down information about epoch training time and mAP for the model, so we can see how our model progresses over time.
170
169
171
170
## Task 4: Check Results
172
171
173
-
After the training is done, we can have a look at the results. Visualizations are provided automatically, and they are pretty similar to what we discovered in the previous workshop using RoboFlow Train.
172
+
After the training is done, we can have a look at the results. Visualizations are provided automatically, and they are pretty similar to what we discovered in the previous workshop using RoboFlow Train.
174
173
175
174
Some images, visualizations, and statistics about training are saved in the destination folder. With these visualizations, we can improve our understanding of our data, mean average precisions, and many other things which will help us improve the model upon the next iteration.
176
175
@@ -184,11 +183,11 @@ The confusion matrix tells us how many predictions from images in the validation
As we have previously specified, our model autosaves its training progress every 25 epochs with the _`--save-period`_ option. This will cause the resulting directory to be about will about 1GB.
186
+
As we have previously specified, our model autosaves its training progress every 25 epochs with the *`--save-period`* option. This will cause the resulting directory to be about will about 1GB.
188
187
189
-
In the end, we only care about the best-performing models out of all the checkpoints, so let us keep _`best.pt`_ as the best model for the training we performed (the model with the highest mAP of all checkpoints) and delete all others.
188
+
In the end, we only care about the best-performing models out of all the checkpoints, so let us keep *`best.pt`* as the best model for the training we performed (the model with the highest mAP of all checkpoints) and delete all others.
190
189
191
-
The model took **168** epochs to finish (early stopping happened, so it found the best model at the 68th epoch), with an average of **10 minutes** per epoch.
190
+
The model took **168** epochs to finish (early stopping happened, so it found the best model at the 68th epoch), with an average of **10 minutes** per epoch.
192
191
193
192
Remember that training time can be significantly reduced if you try this with a GPU. You can rent an OCI GPU at a fraction of the price you will find other GPUs in other Cloud vendors. For example, I did originally train this model with 2 OCI Compute NVIDIA V100s *just for **$2.50/hr***, and training time went from ~30 hours to about 6 hours.
194
193
@@ -201,4 +200,4 @@ The model has a notable mAP of **70%**. This is awesome, but this can always be
201
200
## Acknowledgements
202
201
203
202
***Author** - Nacho Martinez, Data Science Advocate @ Oracle DevRel
0 commit comments