Skip to content

Commit 5b88c7b

Browse files
[Refactor] Refactor Waymo dataset_converter/dataset/evaluator (#2836)
Co-authored-by: sjh <sunjiahao1999>
1 parent 395b86d commit 5b88c7b

File tree

18 files changed

+990
-1285
lines changed

18 files changed

+990
-1285
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,3 +134,4 @@ data/sunrgbd/OFFICIAL_SUNRGBD/
134134
# Waymo evaluation
135135
mmdet3d/evaluation/functional/waymo_utils/compute_detection_metrics_main
136136
mmdet3d/evaluation/functional/waymo_utils/compute_detection_let_metrics_main
137+
mmdet3d/evaluation/functional/waymo_utils/compute_segmentation_metrics_main

configs/_base_/datasets/waymoD5-3d-3class.py

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,10 @@
8989
dict(
9090
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
9191
]),
92-
dict(type='Pack3DDetInputs', keys=['points'])
92+
dict(
93+
type='Pack3DDetInputs',
94+
keys=['points'],
95+
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
9396
]
9497
# construct a pipeline for data and gt loading in show function
9598
# please keep its loading function consistent with test_pipeline (e.g. client)
@@ -100,7 +103,10 @@
100103
load_dim=6,
101104
use_dim=5,
102105
backend_args=backend_args),
103-
dict(type='Pack3DDetInputs', keys=['points']),
106+
dict(
107+
type='Pack3DDetInputs',
108+
keys=['points'],
109+
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
104110
]
105111

106112
train_dataloader = dict(
@@ -164,12 +170,7 @@
164170
backend_args=backend_args))
165171

166172
val_evaluator = dict(
167-
type='WaymoMetric',
168-
ann_file='./data/waymo/kitti_format/waymo_infos_val.pkl',
169-
waymo_bin_file='./data/waymo/waymo_format/gt.bin',
170-
data_root='./data/waymo/waymo_format',
171-
backend_args=backend_args,
172-
convert_kitti_format=False)
173+
type='WaymoMetric', waymo_bin_file='./data/waymo/waymo_format/gt.bin')
173174
test_evaluator = val_evaluator
174175

175176
vis_backends = [dict(type='LocalVisBackend')]

configs/_base_/datasets/waymoD5-3d-car.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@
6262
dict(type='PointShuffle'),
6363
dict(
6464
type='Pack3DDetInputs',
65-
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
65+
keys=['points'],
66+
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
6667
]
6768
test_pipeline = [
6869
dict(
@@ -86,7 +87,10 @@
8687
dict(
8788
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
8889
]),
89-
dict(type='Pack3DDetInputs', keys=['points'])
90+
dict(
91+
type='Pack3DDetInputs',
92+
keys=['points'],
93+
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
9094
]
9195
# construct a pipeline for data and gt loading in show function
9296
# please keep its loading function consistent with test_pipeline (e.g. client)
@@ -161,12 +165,7 @@
161165
backend_args=backend_args))
162166

163167
val_evaluator = dict(
164-
type='WaymoMetric',
165-
ann_file='./data/waymo/kitti_format/waymo_infos_val.pkl',
166-
waymo_bin_file='./data/waymo/waymo_format/gt.bin',
167-
data_root='./data/waymo/waymo_format',
168-
convert_kitti_format=False,
169-
backend_args=backend_args)
168+
type='WaymoMetric', waymo_bin_file='./data/waymo/waymo_format/gt.bin')
170169
test_evaluator = val_evaluator
171170

172171
vis_backends = [dict(type='LocalVisBackend')]

docs/en/advanced_guides/datasets/waymo.md

Lines changed: 52 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,7 @@ This page provides specific tutorials about the usage of MMDetection3D for Waymo
77
Before preparing Waymo dataset, if you only installed requirements in `requirements/build.txt` and `requirements/runtime.txt` before, please install the official package for this dataset at first by running
88

99
```
10-
# tf 2.1.0.
11-
pip install waymo-open-dataset-tf-2-1-0==1.2.0
12-
# tf 2.0.0
13-
# pip install waymo-open-dataset-tf-2-0-0==1.2.0
14-
# tf 1.15.0
15-
# pip install waymo-open-dataset-tf-1-15-0==1.2.0
10+
pip install waymo-open-dataset-tf-2-6-0
1611
```
1712

1813
or
@@ -38,15 +33,19 @@ mmdetection3d
3833
│ │ │ ├── validation
3934
│ │ │ ├── testing
4035
│ │ │ ├── gt.bin
36+
│ │ │ ├── cam_gt.bin
37+
│ │ │ ├── fov_gt.bin
4138
│ │ ├── kitti_format
4239
│ │ │ ├── ImageSets
4340
4441
```
4542

46-
You can download Waymo open dataset V1.2 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin files for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare Waymo data by running
43+
You can download Waymo open dataset V1.4 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin files for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare Waymo data by running
4744

4845
```bash
49-
python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo
46+
# TF_CPP_MIN_LOG_LEVEL=3 will disable all logging output from TensorFlow.
47+
# The number of `--workers` depends on the maximum number of cores in your CPU.
48+
TF_CPP_MIN_LOG_LEVEL=3 python tools/create_data.py waymo --root-path ./data/waymo --out-dir ./data/waymo --workers 128 --extra-tag waymo --version v1.4
5049
```
5150

5251
Note that if your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
@@ -65,22 +64,16 @@ mmdetection3d
6564
│ │ │ ├── validation
6665
│ │ │ ├── testing
6766
│ │ │ ├── gt.bin
67+
│ │ │ ├── cam_gt.bin
68+
│ │ │ ├── fov_gt.bin
6869
│ │ ├── kitti_format
6970
│ │ │ ├── ImageSets
7071
│ │ │ ├── training
71-
│ │ │ │ ├── calib
7272
│ │ │ │ ├── image_0
7373
│ │ │ │ ├── image_1
7474
│ │ │ │ ├── image_2
7575
│ │ │ │ ├── image_3
7676
│ │ │ │ ├── image_4
77-
│ │ │ │ ├── label_0
78-
│ │ │ │ ├── label_1
79-
│ │ │ │ ├── label_2
80-
│ │ │ │ ├── label_3
81-
│ │ │ │ ├── label_4
82-
│ │ │ │ ├── label_all
83-
│ │ │ │ ├── pose
8477
│ │ │ │ ├── velodyne
8578
│ │ │ ├── testing
8679
│ │ │ │ ├── (the same as training)
@@ -93,15 +86,56 @@ mmdetection3d
9386
9487
```
9588

96-
Here because there are several cameras, we store the corresponding image and labels that can be projected to that camera respectively and save pose for further usage of consecutive frames point clouds. We use a coding way `{a}{bbb}{ccc}` to name the data for each frame, where `a` is the prefix for different split (`0` for training, `1` for validation and `2` for testing), `bbb` for segment index and `ccc` for frame index. You can easily locate the required frame according to this naming rule. We gather the data for training and validation together as KITTI and store the indices for different set in the `ImageSet` files.
89+
- `kitti_format/training/image_{0-4}/{a}{bbb}{ccc}.jpg` Here because there are several cameras, we store the corresponding images. We use a coding way `{a}{bbb}{ccc}` to name the data for each frame, where `a` is the prefix for different split (`0` for training, `1` for validation and `2` for testing), `bbb` for segment index and `ccc` for frame index. You can easily locate the required frame according to this naming rule. We gather the data for training and validation together as KITTI and store the indices for different set in the `ImageSet` files.
90+
- `kitti_format/training/velodyne/{a}{bbb}{ccc}.bin` point cloud data for each frame.
91+
- `kitti_format/waymo_gt_database/xxx_{Car/Pedestrian/Cyclist}_x.bin`. point cloud data included in each 3D bounding box of the training dataset. These point clouds will be used in data augmentation e.g. `ObjectSample`. `xxx` is the index of training samples and `x` is the index of objects in this frame.
92+
- `kitti_format/waymo_infos_train.pkl`. training dataset information, a dict contains two keys: `metainfo` and `data_list`.`metainfo` contains the basic information for the dataset itself, such as `dataset`, `version` and `info_version`, while `data_list` is a list of dict, each dict (hereinafter referred to as `info`) contains all the detailed information of single sample as follows:
93+
- info\['sample_idx'\]: The index of this sample in the whole dataset.
94+
- info\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list).
95+
- info\['timestamp'\]: Timestamp of the sample data.
96+
- info\['context_name'\]: The context name of sample indices which `*.tfrecord` segment it extracted from.
97+
- info\['lidar_points'\]: A dict containing all the information related to the lidar points.
98+
- info\['lidar_points'\]\['lidar_path'\]: The filename of the lidar point cloud data.
99+
- info\['lidar_points'\]\['num_pts_feats'\]: The feature dimension of point.
100+
- info\['lidar_sweeps'\]: A list contains sweeps information of lidar
101+
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['lidar_path'\]: The lidar data path of i-th sweep.
102+
- info\['lidar_sweeps'\]\[i\]\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
103+
- info\['lidar_sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
104+
- info\['images'\]: A dict contains five keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. Each dict contains all data information related to corresponding camera.
105+
- info\['images'\]\['CAM_XXX'\]\['img_path'\]: The filename of the image.
106+
- info\['images'\]\['CAM_XXX'\]\['height'\]: The height of the image.
107+
- info\['images'\]\['CAM_XXX'\]\['width'\]: The width of the image.
108+
- info\['images'\]\['CAM_XXX'\]\['cam2img'\]: The transformation matrix recording the intrinsic parameters when projecting 3D points to each image plane. (4x4 list)
109+
- info\['images'\]\['CAM_XXX'\]\['lidar2cam'\]: The transformation matrix from lidar sensor to this camera. (4x4 list)
110+
- info\['images'\]\['CAM_XXX'\]\['lidar2img'\]: The transformation matrix from lidar sensor to each image plane. (4x4 list)
111+
- info\['image_sweeps'\]: A list containing sweeps information of images.
112+
- info\['image_sweeps'\]\[i\]\['images'\]\['CAM_XXX'\]\['img_path'\]: The image path of i-th sweep.
113+
- info\['image_sweeps'\]\[i\]\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
114+
- info\['image_sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
115+
- info\['instances'\]: It is a list of dict. Each dict contains all annotation information of single instance. For the i-th instance:
116+
- info\['instances'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, w, h, yaw) order.
117+
- info\['instances'\]\[i\]\['bbox'\]: List of 4 numbers representing the 2D bounding box of the instance, in (x1, y1, x2, y2) order. (some instances may not have a corresponding 2D bounding box)
118+
- info\['instances'\]\[i\]\['bbox_label_3d'\]: A int indicating the label of instance and the -1 indicating ignore.
119+
- info\['instances'\]\[i\]\['bbox_label'\]: A int indicating the label of instance and the -1 indicating ignore.
120+
- info\['instances'\]\[i\]\['num_lidar_pts'\]: Number of lidar points included in each 3D bounding box.
121+
- info\['instances'\]\[i\]\['camera_id'\]: The index of the most visible camera for this instance.
122+
- info\['instances'\]\[i\]\['group_id'\]: The index of this instance in this sample.
123+
- info\['cam_sync_instances'\]: It is a list of dict. Each dict contains all annotation information of single instance. Its format is same with \['instances'\]. However, \['cam_sync_instances'\] is only for multi-view camera-based 3D Object Detection task.
124+
- info\['cam_instances'\]: It is a dict containing keys `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. For monocular camera-based 3D Object Detection task, we split 3D annotations of the whole scenes according to the camera they belong to. For the i-th instance:
125+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, h, w, yaw) order.
126+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox'\]: 2D bounding box annotation (exterior rectangle of the projected 3D box), a list arrange as \[x1, y1, x2, y2\].
127+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label_3d'\]: Label of instance.
128+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label'\]: Label of instance.
129+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['center_2d'\]: Projected center location on the image, a list has shape (2,).
130+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['depth'\]: The depth of projected center.
97131

98132
## Training
99133

100134
Considering there are many similar frames in the original dataset, we can basically use a subset to train our model primarily. In our preliminary baselines, we load one frame every five frames, and thanks to our hyper parameters settings and data augmentation, we obtain a better result compared with the performance given in the original dataset [paper](https://arxiv.org/pdf/1912.04838.pdf). For more details about the configuration and performance, please refer to README.md in the `configs/pointpillars/`. A more complete benchmark based on other settings and methods is coming soon.
101135

102136
## Evaluation
103137

104-
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
138+
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/r1.3/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
105139

106140
```shell
107141
# download the code and enter the base directory

0 commit comments

Comments
 (0)