Skip to content

Commit f0e4df0

Browse files
committed
🐛
1 parent b98f054 commit f0e4df0

File tree

8 files changed

+95
-79
lines changed

8 files changed

+95
-79
lines changed

README.md

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
### 1. 📣 数据介绍
99

10-
确定了业务场景之后,需要手机大量的数据(之前参加过一个安全帽识别检测的比赛,但是数据在比赛平台无法下载为己用),一般来说包含两大来源,一部分是网络数据,可以通过百度、Google图片爬虫拿到,另一部分是用户场景的视频录像,后一部分相对来说数据量更大,但出于商业因素几乎不会开放。本文开源的安全帽检测数据集([SafetyHelmetWearing-Dataset, SHWD](https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset))主要通过爬虫拿到,总共有7581张图像,包含9044个佩戴安全帽的bounding box(正类),以及111514个未佩戴安全帽的bounding box(负类),所有的图像用labelimg标注出目标区域及类别。其中每个bounding box的标签:hat”表示佩戴安全帽,“person”表示普通未佩戴的行人头部区域的bounding box。另外本数据集中person标签的数据大多数来源于[SCUT-HEAD](https://github.com/HCIILAB/SCUT-HEAD-Dataset-Release)数据集,用于判断是未佩戴安全帽的人。大致说一下数据集构造的过程:
10+
确定了业务场景之后,需要收集大量的数据(之前参加过一个安全帽识别检测的比赛,但是数据在比赛平台无法下载为己用),一般来说包含两大来源,一部分是网络数据,可以通过百度、Google图片爬虫拿到,另一部分是用户场景的视频录像,后一部分相对来说数据量更大,但出于商业因素几乎不会开放。本文开源的安全帽检测数据集([SafetyHelmetWearing-Dataset, SHWD](https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset))主要通过爬虫拿到,总共有7581张图像,包含9044个佩戴安全帽的bounding box(正类),以及111514个未佩戴安全帽的bounding box(负类),所有的图像用labelimg标注出目标区域及类别。其中每个bounding box的标签:hat”表示佩戴安全帽,“person”表示普通未佩戴的行人头部区域的bounding box。另外本数据集中person标签的数据大多数来源于[SCUT-HEAD](https://github.com/HCIILAB/SCUT-HEAD-Dataset-Release)数据集,用于判断是未佩戴安全帽的人。大致说一下数据集构造的过程:
1111

1212
1.数据爬取
1313

@@ -47,15 +47,15 @@ Packages:
4747
- opencv-python
4848
- tqdm
4949

50-
将预训练的darknet的权重下载,下载地址:<https://pjreddie.com/media/files/yolov3.weights>,并将该weight文件拷贝大`./data/darknet_weights/`下,因为这是darknet版本的预训练权重,需要转化为Tensorflow可用的版本,运行如下代码可以实现:
50+
将预训练的darknet的权重下载,下载地址:<https://pjreddie.com/media/files/yolov3.weights>,并将该weight文件拷贝到`./data/darknet_weights/`下,因为这是darknet版本的预训练权重,需要转化为Tensorflow可用的版本,运行如下代码可以实现:
5151

5252
```shell
5353
python convert_weight.py
5454
```
5555

5656
这样转化后的Tensorflow checkpoint文件被存放在:`./data/darknet_weights/`目录。你也可以下载已经转化好的模型:
5757

58-
[Google云盘]((https://drive.google.com/drive/folders/1mXbNgNxyXPi7JNsnBaxEv1-nWr7SVoQt?usp=sharing) [GitHub Release](https://github.com/wizyoung/YOLOv3_TensorFlow/releases/)
58+
[Google云盘](https://drive.google.com/drive/folders/1mXbNgNxyXPi7JNsnBaxEv1-nWr7SVoQt?usp=sharing) [GitHub Release](https://github.com/wizyoung/YOLOv3_TensorFlow/releases/)
5959

6060

6161
### 3.🔰 训练数据构建
@@ -67,17 +67,19 @@ python convert_weight.py
6767
```shell
6868
python data_pro.py
6969
```
70-
分割训练集,验证集,测试集并在`./data/my_data/`下生成`train.txt/val.txt/test.txt`,对于一张图像对应一行数据,包括`image_index`,`image_absolute_path`,`box_1`,`box_2`,...,`box_n`,每个字段中间是用空格分隔的,其中:
70+
分割训练集,验证集,测试集并在`./data/my_data/`下生成`train.txt/val.txt/test.txt`,对于一张图像对应一行数据,包括`image_index`,`image_absolute_path`, `img_width`, `img_height`,`box_1`,`box_2`,...,`box_n`,每个字段中间是用空格分隔的,其中:
7171

7272
+ `image_index`文本的行号
73+
+ `image_absolute_path` 一定是绝对路径
74+
+ `img_width`, `img_height`,`box_1`,`box_2`,...,`box_n`中涉及数值的取值一定取int型
7375
+ `box_x`的形式为:`label_index, x_min,y_min,x_max,y_max`(注意坐标原点在图像的左上角)
7476
+ `label_index`是label对应的index(取值为0-class_num-1),这里要注意YOLO系列的模型训练与SSD不同,label不包含background
7577

7678
例子:
7779

7880
```
79-
0 xxx/xxx/a.jpg 0 453 369 473 391 1 588 245 608 268
80-
1 xxx/xxx/b.jpg 1 466 403 485 422 2 793 300 809 320
81+
0 xxx/xxx/a.jpg 1920,1080,0 453 369 473 391 1 588 245 608 268
82+
1 xxx/xxx/b.jpg 1920,1080,1 466 403 485 422 2 793 300 809 320
8183
...
8284
```
8385

@@ -98,6 +100,8 @@ person
98100
python get_kmeans.py
99101
```
100102

103+
![](docs/kmeans.png)
104+
101105
可以得到9个anchors和平均的IOU,把anchors保存在文本文件:`./data/yolo_anchors.txt`,
102106

103107
**注意: Kmeans计算出的YOLO Anchors是在在调整大小的图像比例的,默认的调整大小方法是保持图像的纵横比。**
@@ -112,20 +116,21 @@ python get_kmeans.py
112116
<summary><mark><font color=darkred>修改arg.py</font></mark></summary>
113117
<pre><code>
114118
### Some paths
115-
train_file = './data/my_data/train.txt' # The path of the training txt file.
116-
val_file = './data/my_data/val.txt' # The path of the validation txt file.
119+
train_file = './data/my_data/label/train.txt' # The path of the training txt file.
120+
val_file = './data/my_data/label/val.txt' # The path of the validation txt file.
117121
restore_path = './data/darknet_weights/yolov3.ckpt' # The path of the weights to restore.
118122
save_dir = './checkpoint/' # The directory of the weights to save.
119123
log_dir = './data/logs/' # The directory to store the tensorboard log files.
120124
progress_log_path = './data/progress.log' # The path to record the training progress.
121125
anchor_path = './data/yolo_anchors.txt' # The path of the anchor txt file.
122126
class_name_path = './data/coco.names' # The path of the class names.
123127
### Training releated numbers
124-
batch_size = 2 # 需要调整为自己的类别数
128+
batch_size = 32 #6
125129
img_size = [416, 416] # Images will be resized to `img_size` and fed to the network, size format: [width, height]
126-
total_epoches = 500 # 训练周期调整
127-
train_evaluation_step = 50 # Evaluate on the training batch after some steps.
128-
val_evaluation_epoch = 1 # Evaluate on the whole validation dataset after some steps. Set to None to evaluate every epoch.
130+
letterbox_resize = True # Whether to use the letterbox resize, i.e., keep the original aspect ratio in the resized image.
131+
total_epoches = 500
132+
train_evaluation_step = 100 # Evaluate on the training batch after some steps.
133+
val_evaluation_epoch = 50 # Evaluate on the whole validation dataset after some epochs. Set to None to evaluate every epoch.
129134
save_epoch = 10 # Save the model after some epochs.
130135
batch_norm_decay = 0.99 # decay in bn ops
131136
weight_decay = 5e-4 # l2 weight decay
@@ -134,45 +139,52 @@ global_step = 0 # used when resuming training
134139
num_threads = 10 # Number of threads for image processing used in tf.data pipeline.
135140
prefetech_buffer = 5 # Prefetech_buffer used in tf.data pipeline.
136141
### Learning rate and optimizer
137-
optimizer_name = 'adam' # Chosen from [sgd, momentum, adam, rmsprop]
142+
optimizer_name = 'momentum' # Chosen from [sgd, momentum, adam, rmsprop]
138143
save_optimizer = True # Whether to save the optimizer parameters into the checkpoint file.
139-
learning_rate_init = 1e-3
140-
lr_type = 'exponential' # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
144+
learning_rate_init = 1e-4
145+
lr_type = 'piecewise' # Chosen from [fixed, exponential, cosine_decay, cosine_decay_restart, piecewise]
141146
lr_decay_epoch = 5 # Epochs after which learning rate decays. Int or float. Used when chosen `exponential` and `cosine_decay_restart` lr_type.
142147
lr_decay_factor = 0.96 # The learning rate decay factor. Used when chosen `exponential` lr_type.
143148
lr_lower_bound = 1e-6 # The minimum learning rate.
144-
# piecewise params
145-
pw_boundaries = [60, 80] # epoch based boundaries
146-
pw_values = [learning_rate_init, 3e-5, 1e-4]
149+
# only used in piecewise lr type
150+
pw_boundaries = [30, 50] # epoch based boundaries
151+
pw_values = [learning_rate_init, 3e-5, 1e-5]
147152
### Load and finetune
148153
# Choose the parts you want to restore the weights. List form.
149-
# Set to None to restore the whole model.
150-
restore_part = ['yolov3/darknet53_body']
154+
# restore_include: None, restore_exclude: None => restore the whole model
155+
# restore_include: None, restore_exclude: scope => restore the whole model except `scope`
156+
# restore_include: scope1, restore_exclude: scope2 => if scope1 contains scope2, restore scope1 and not restore scope2 (scope1 - scope2)
157+
# choise 1: only restore the darknet body
158+
# restore_include = ['yolov3/darknet53_body']
159+
# restore_exclude = None
160+
# choise 2: restore all layers except the last 3 conv2d layers in 3 scale
161+
restore_include = None
162+
restore_exclude = ['yolov3/yolov3_head/Conv_14', 'yolov3/yolov3_head/Conv_6', 'yolov3/yolov3_head/Conv_22']
151163
# Choose the parts you want to finetune. List form.
152164
# Set to None to train the whole model.
153165
update_part = ['yolov3/yolov3_head']
154166
### other training strategies
155-
multi_scale_train = False # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default.
156-
use_label_smooth = False # Whether to use class label smoothing strategy.
157-
use_focal_loss = False # Whether to apply focal loss on the conf loss.
158-
use_mix_up = False # Whether to use mix up data augmentation strategy. # 数据增强
167+
multi_scale_train = True # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default.
168+
use_label_smooth = True # Whether to use class label smoothing strategy.
169+
use_focal_loss = True # Whether to apply focal loss on the conf loss.
170+
use_mix_up = True # Whether to use mix up data augmentation strategy.
159171
use_warm_up = True # whether to use warm up strategy to prevent from gradient exploding.
160172
warm_up_epoch = 3 # Warm up training epoches. Set to a larger value if gradient explodes.
161173
### some constants in validation
162-
# nms 非极大值抑制
163-
nms_threshold = 0.5 # iou threshold in nms operation
164-
score_threshold = 0.5 # threshold of the probability of the classes in nms operation
165-
nms_topk = 50 # keep at most nms_topk outputs after nms
174+
# nms
175+
nms_threshold = 0.45 # iou threshold in nms operation
176+
score_threshold = 0.01 # threshold of the probability of the classes in nms operation, i.e. score = pred_confs * pred_probs. set lower for higher recall.
177+
nms_topk = 150 # keep at most nms_topk outputs after nms
166178
# mAP eval
167179
eval_threshold = 0.5 # the iou threshold applied in mAP evaluation
180+
use_voc_07_metric = False # whether to use voc 2007 evaluation metric, i.e. the 11-point metric
168181
### parse some params
169182
anchors = parse_anchors(anchor_path)
170183
classes = read_class_names(class_name_path)
171184
class_num = len(classes)
172185
train_img_cnt = len(open(train_file, 'r').readlines())
173186
val_img_cnt = len(open(val_file, 'r').readlines())
174-
train_batch_num = int(math.ceil(float(train_img_cnt) / batch_size)) # iteration
175-
187+
train_batch_num = int(math.ceil(float(train_img_cnt) / batch_size))
176188
lr_decay_freq = int(train_batch_num * lr_decay_epoch)
177189
pw_boundaries = [float(i) * train_batch_num + global_step for i in pw_boundaries]
178190
</code></pre>

args.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
import math
88

99
### Some paths
10-
train_file = './data/my_data/train.txt' # The path of the training txt file.
11-
val_file = './data/my_data/val.txt' # The path of the validation txt file.
10+
train_file = './data/my_data/label/train.txt' # The path of the training txt file.
11+
val_file = './data/my_data/label/val.txt' # The path of the validation txt file.
1212
restore_path = './data/darknet_weights/yolov3.ckpt' # The path of the weights to restore.
1313
save_dir = './checkpoint/' # The directory of the weights to save.
1414
log_dir = './data/logs/' # The directory to store the tensorboard log files.

data/coco.names

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
biopsy forceps
1+
hat
2+
person

data/yolo_anchors.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
676,197, 763,250, 684,283, 868,231, 745,273, 544,391, 829,258, 678,316, 713,355
1+
5,5, 6,7, 7,9, 10,11, 13,15, 19,21, 27,31, 43,50, 79,93

data_pro.py

Lines changed: 44 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,12 @@ def __init__(self,data_path):
3333

3434
def load_labels(self, model):
3535
if model == 'train':
36-
txtname = os.path.join(self.data_path, 'train_img.txt')
36+
txtname = os.path.join(self.data_path, 'ImageSets/Main/train.txt')
3737
if model == 'test':
38-
txtname = os.path.join(self.data_path, 'test_img.txt')
38+
txtname = os.path.join(self.data_path, 'ImageSets/Main/test.txt')
3939

4040
if model == "val":
41-
txtname = os.path.join(self.data_path, 'val_img.txt')
41+
txtname = os.path.join(self.data_path, 'ImageSets/Main/val.txt')
4242

4343

4444
with open(txtname, 'r') as f:
@@ -47,23 +47,23 @@ def load_labels(self, model):
4747

4848
my_index = 0
4949
for ind in image_ind:
50-
class_inds, x1s, y1s, x2s, y2s = self.load_data(ind)
50+
class_inds, x1s, y1s, x2s, y2s,img_width,img_height = self.load_data(ind)
5151

5252
if len(class_inds) == 0:
5353
pass
5454
else:
5555
annotation_label = ""
5656
#box_x: label_index, x_min,y_min,x_max,y_max
57-
for label_i in range(len(clas_inds)):
57+
for label_i in range(len(class_inds)):
5858

5959
annotation_label += " " + str(class_inds[label_i])
6060
annotation_label += " " + str(x1s[label_i])
6161
annotation_label += " " + str(y1s[label_i])
6262
annotation_label += " " + str(x2s[label_i])
6363
annotation_label += " " + str(y2s[label_i])
6464

65-
with open(model+".txt","a") as f:
66-
f.write(str(my_index) + " " + data_path+"/ImageSets/"+ind+".jpg" + annotation_label + "\n")
65+
with open("./data/my_data/label/"+model+".txt","a") as f:
66+
f.write(str(my_index) + " " + data_path+"/JPEGImages/"+ind+".jpg"+" "+str(img_width) +" "+str(img_height)+ annotation_label + "\n")
6767

6868
my_index += 1
6969

@@ -76,8 +76,8 @@ def load_data(self, index):
7676
filename = os.path.join(self.data_path, 'Annotations', index + '.xml')
7777
tree = ET.parse(filename)
7878
image_size = tree.find('size')
79-
# image_width = float(image_size.find('width').text)
80-
# image_height = float(image_size.find('height').text)
79+
image_width = int(float(image_size.find('width').text))
80+
image_height = int(float(image_size.find('height').text))
8181
# h_ratio = 1.0 * self.image_size / image_height
8282
# w_ratio = 1.0 * self.image_size / image_width
8383

@@ -91,37 +91,38 @@ def load_data(self, index):
9191

9292
for obj in objects:
9393
box = obj.find('bndbox')
94-
x1 = float(box.find('xmin').text)
95-
y1 = float(box.find('ymin').text)
96-
x2 = float(box.find('xmax').text)
97-
y2 = float(box.find('ymax').text)
94+
x1 = int(float(box.find('xmin').text))
95+
y1 = int(float(box.find('ymin').text))
96+
x2 = int(float(box.find('xmax').text))
97+
y2 = int(float(box.find('ymax').text))
9898
# x1 = max(min((float(box.find('xmin').text)) * w_ratio, self.image_size), 0)
9999
# y1 = max(min((float(box.find('ymin').text)) * h_ratio, self.image_size), 0)
100100
# x2 = max(min((float(box.find('xmax').text)) * w_ratio, self.image_size), 0)
101101
# y2 = max(min((float(box.find('ymax').text)) * h_ratio, self.image_size), 0)
102-
class_ind = self.class_to_ind[obj.find('name').text]
103-
# class_ind = self.class_to_ind[obj.find('name').text.lower().strip()]
104-
105-
# boxes = [0.5 * (x1 + x2) / self.image_size, 0.5 * (y1 + y2) / self.image_size, np.sqrt((x2 - x1) / self.image_size), np.sqrt((y2 - y1) / self.image_size)]
106-
# cx = 1.0 * boxes[0] * self.cell_size
107-
# cy = 1.0 * boxes[1] * self.cell_size
108-
# xind = int(np.floor(cx))
109-
# yind = int(np.floor(cy))
110-
111-
# label[yind, xind, :, 0] = 1
112-
# label[yind, xind, :, 1:5] = boxes
113-
# label[yind, xind, :, 5 + class_ind] = 1
114-
115-
if x1 >= x2 or y1 >= y2:
116-
pass
117-
else:
118-
class_inds.append(class_ind)
119-
x1s.append(x1)
120-
y1s.append(y1)
121-
x2s.append(x2)
122-
y2s.append(y2)
123-
124-
return class_inds, x1s, y1s, x2s, y2s
102+
if obj.find('name').text in self.classes:
103+
class_ind = self.class_to_ind[obj.find('name').text]
104+
# class_ind = self.class_to_ind[obj.find('name').text.lower().strip()]
105+
106+
# boxes = [0.5 * (x1 + x2) / self.image_size, 0.5 * (y1 + y2) / self.image_size, np.sqrt((x2 - x1) / self.image_size), np.sqrt((y2 - y1) / self.image_size)]
107+
# cx = 1.0 * boxes[0] * self.cell_size
108+
# cy = 1.0 * boxes[1] * self.cell_size
109+
# xind = int(np.floor(cx))
110+
# yind = int(np.floor(cy))
111+
112+
# label[yind, xind, :, 0] = 1
113+
# label[yind, xind, :, 1:5] = boxes
114+
# label[yind, xind, :, 5 + class_ind] = 1
115+
116+
if x1 >= x2 or y1 >= y2:
117+
pass
118+
else:
119+
class_inds.append(class_ind)
120+
x1s.append(x1)
121+
y1s.append(y1)
122+
x2s.append(x2)
123+
y2s.append(y2)
124+
125+
return class_inds, x1s, y1s, x2s, y2s, image_width, image_height
125126

126127

127128
def data_split(img_path):
@@ -141,19 +142,19 @@ def data_split(img_path):
141142
for file in files:
142143
if file in val_part:
143144

144-
with open("./data/my_data/val_img.txt","a") as val_f:
145+
with open("./data/my_data/ImageSets/Main/val.txt","a") as val_f:
145146
val_f.write(file[:-4] + "\n" )
146147

147148
val_index += 1
148149

149150
elif file in test_part:
150-
with open("./data/my_data/test_img.txt","a") as test_f:
151+
with open("./data/my_data/ImageSets/Main/test.txt","a") as test_f:
151152
test_f.write(file[:-4] + "\n")
152153

153154
test_index += 1
154155

155156
else:
156-
with open("./data/my_data/train_img.txt","a") as train_f:
157+
with open("./data/my_data/ImageSets/Main/train.txt","a") as train_f:
157158
train_f.write(file[:-4] + "\n")
158159

159160
train_index += 1
@@ -166,12 +167,13 @@ def data_split(img_path):
166167
if __name__ == "__main__":
167168

168169
# 分割train, val, test
169-
img_path = "./data/my_data/ImageSets"
170-
data_split(img_path)
170+
# img_path = "./data/my_data/ImageSets"
171+
# data_split(img_path)
171172
print("===========split data finish============")
172173

173174
# 做YOLO V3需要的训练集
174-
data_path = "./data/my_data" # 尽量用绝对路径
175+
base_path = os.getcwd()
176+
data_path = os.path.join(base_path,"data/my_data") # 绝对路径
175177

176178
data_p = Data_preprocess(data_path)
177179
data_p.load_labels("train")

docs/kmeans.png

8.79 KB
Loading

get_kmeans.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,8 @@ def parse_anno(annotation_path, target_size=None):
9898
result = []
9999
for line in anno:
100100
s = line.strip().split(' ')
101-
img_w = int(s[2])
102-
img_h = int(s[3])
101+
img_w = int(float(s[2]))
102+
img_h = int(float(s[3]))
103103
s = s[4:]
104104
box_cnt = len(s) // 5
105105
for i in range(box_cnt):
@@ -139,7 +139,7 @@ def get_kmeans(anno, cluster_num=9):
139139
# if target_resize is speficied, the anchors are on the resized image scale
140140
# if target_resize is set to None, the anchors are on the original image scale
141141
target_size = [416, 416]
142-
annotation_path = "./data/my_data/train.txt"
142+
annotation_path = "./data/my_data/label/train.txt"
143143
anno_result = parse_anno(annotation_path, target_size=target_size)
144144
anchors, ave_iou = get_kmeans(anno_result, 9)
145145

0 commit comments

Comments
 (0)