Skip to content

Commit 74878d1

Browse files
authored
[Docs] Add BEV-based detection pipeline in NuScenes Dataset tutorial (#2672)
* update the part of in doc of nuScenes dataset * update nuScenes tutorial * add alternative bev sample code and necessary description for the nuscenes dataset * update nuscenes tutorial * update nuscenes tutorial * update nuscenes tutorial * use two subsections to introduce monocular and BEV * use two subsections to introduce monocular and BEV * use two subsections to introduce monocular and BEV * update NuScenes dataset BEV based tutorial * update NuScenes dataset BEV based tutorial
1 parent c04831c commit 74878d1

File tree

2 files changed

+131
-3
lines changed

2 files changed

+131
-3
lines changed

docs/en/advanced_guides/datasets/nuscenes.md

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,9 @@ Intensity is not used by default due to its yielded noise when concatenating the
153153

154154
### Vision-Based Methods
155155

156-
A typical training pipeline of image-based 3D detection on nuScenes is as below.
156+
#### Monocular-based
157+
158+
In the NuScenes dataset, for multi-view images, this paradigm usually involves detecting and outputting 3D object detection results separately for each image, and then obtaining the final detection results through post-processing (such as NMS). Essentially, it directly extends monocular 3D detection to multi-view settings. A typical training pipeline of image-based monocular 3D detection on nuScenes is as below.
157159

158160
```python
159161
train_pipeline = [
@@ -184,6 +186,68 @@ It follows the general pipeline of 2D detection while differs in some details:
184186
- Some data augmentation techniques need to be adjusted, such as `RandomFlip3D`.
185187
Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.
186188

189+
#### BEV-based
190+
191+
BEV, Bird's-Eye-View, is another popular 3D detection paradigm. It directly takes multi-view images to perform 3D detection, for nuScenes, they are `CAM_FRONT`, `CAM_FRONT_LEFT`, `CAM_FRONT_RIGHT`, `CAM_BACK`, `CAM_BACK_LEFT` and `CAM_BACK_RIGHT`. A basic training pipeline of bev-based 3D detection on nuScenes is as below.
192+
193+
```python
194+
class_names = [
195+
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
196+
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
197+
]
198+
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
199+
train_transforms = [
200+
dict(type='PhotoMetricDistortion3D'),
201+
dict(
202+
type='RandomResize3D',
203+
scale=(1600, 900),
204+
ratio_range=(1., 1.),
205+
keep_ratio=True)
206+
]
207+
train_pipeline = [
208+
dict(type='LoadMultiViewImageFromFiles',
209+
to_float32=True,
210+
num_views=6, ),
211+
dict(type='LoadAnnotations3D',
212+
with_bbox_3d=True,
213+
with_label_3d=True,
214+
with_attr_label=False),
215+
# optional, data augmentation
216+
dict(type='MultiViewWrapper', transforms=train_transforms),
217+
# optional, filter object within specific point cloud range
218+
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
219+
# optional, filter object of specific classes
220+
dict(type='ObjectNameFilter', classes=class_names),
221+
dict(type='Pack3DDetInputs', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])
222+
]
223+
```
224+
225+
To load multiple view of images, a little modification should be made to the dataset.
226+
227+
```python
228+
data_prefix = dict(
229+
CAM_FRONT='samples/CAM_FRONT',
230+
CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
231+
CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
232+
CAM_BACK='samples/CAM_BACK',
233+
CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
234+
CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
235+
)
236+
train_dataloader = dict(
237+
batch_size=4,
238+
persistent_workers=True,
239+
sampler=dict(type='DefaultSampler', shuffle=True),
240+
dataset=dict(
241+
type="NuScenesDataset",
242+
data_root="./data/nuScenes",
243+
ann_file="nuscenes_infos_train.pkl",
244+
data_prefix=data_prefix,
245+
modality=dict(use_camera=True, use_lidar=False, ),
246+
pipeline=train_pipeline,
247+
test_mode=False, )
248+
)
249+
```
250+
187251
## Evaluation
188252

189253
An example to evaluate PointPillars with 8 GPUs with nuScenes metrics is as follows.

docs/zh_cn/advanced_guides/datasets/nuscenes.md

Lines changed: 66 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,9 @@ train_pipeline = [
146146

147147
### 基于视觉的方法
148148

149-
nuScenes 上基于图像的 3D 检测的典型训练流水线如下。
149+
#### 基于单目方法
150+
151+
在NuScenes数据集中,对于多视角图像,单目检测范式通常由针对每张图像检测和输出 3D 检测结果以及通过后处理(例如 NMS )得到最终检测结果两步组成。从本质上来说,这种范式直接将单目 3D 检测扩展到多视角任务。NuScenes 上基于图像的 3D 检测的典型训练流水线如下。
150152

151153
```python
152154
train_pipeline = [
@@ -159,7 +161,7 @@ train_pipeline = [
159161
with_bbox_3d=True,
160162
with_label_3d=True,
161163
with_bbox_depth=True),
162-
dict(type='mmdet.Resize', img_scale=(1600, 900), keep_ratio=True),
164+
dict(type='mmdet.Resize', scale=(1600, 900), keep_ratio=True),
163165
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
164166
dict(
165167
type='Pack3DDetInputs',
@@ -176,6 +178,68 @@ train_pipeline = [
176178
- 它需要加载 3D 标注。
177179
- 一些数据增强技术需要调整,例如`RandomFlip3D`。目前我们不支持更多的增强方法,因为如何迁移和应用其他技术仍在探索中。
178180

181+
#### 基于BEV方法
182+
183+
鸟瞰图,BEV(Bird's-Eye-View),是另一种常用的 3D 检测范式。它直接利用多个视角图像进行 3D 检测。对于 NuScenes 数据集而言,这些视角包括前方`CAM_FRONT`、左前方`CAM_FRONT_LEFT`、右前方`CAM_FRONT_RIGHT`、后方`CAM_BACK`、左后方`CAM_BACK_LEFT`、右后方`CAM_BACK_RIGHT`。一个基本的用于 BEV 方法的流水线如下。
184+
185+
```python
186+
class_names = [
187+
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
188+
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
189+
]
190+
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
191+
train_transforms = [
192+
dict(type='PhotoMetricDistortion3D'),
193+
dict(
194+
type='RandomResize3D',
195+
scale=(1600, 900),
196+
ratio_range=(1., 1.),
197+
keep_ratio=True)
198+
]
199+
train_pipeline = [
200+
dict(type='LoadMultiViewImageFromFiles',
201+
to_float32=True,
202+
num_views=6, ),
203+
dict(type='LoadAnnotations3D',
204+
with_bbox_3d=True,
205+
with_label_3d=True,
206+
with_attr_label=False),
207+
# 可选,数据增强
208+
dict(type='MultiViewWrapper', transforms=train_transforms),
209+
# 可选, 筛选特定点云范围内物体
210+
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
211+
# 可选, 筛选特定类别物体
212+
dict(type='ObjectNameFilter', classes=class_names),
213+
dict(type='Pack3DDetInputs', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])
214+
]
215+
```
216+
217+
为了读取多个视角的图像,数据集也应进行相应微调。
218+
219+
```python
220+
data_prefix = dict(
221+
CAM_FRONT='samples/CAM_FRONT',
222+
CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
223+
CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
224+
CAM_BACK='samples/CAM_BACK',
225+
CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
226+
CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
227+
)
228+
train_dataloader = dict(
229+
batch_size=4,
230+
persistent_workers=True,
231+
sampler=dict(type='DefaultSampler', shuffle=True),
232+
dataset=dict(
233+
type="NuScenesDataset",
234+
data_root="./data/nuScenes",
235+
ann_file="nuscenes_infos_train.pkl",
236+
data_prefix=data_prefix,
237+
modality=dict(use_camera=True, use_lidar=False, ),
238+
pipeline=train_pipeline,
239+
test_mode=False, )
240+
)
241+
```
242+
179243
## 评估
180244

181245
使用 8 个 GPU 以及 nuScenes 指标评估的 PointPillars 的示例如下

0 commit comments

Comments
 (0)