You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/advanced_guides/datasets/waymo.md
+52-18Lines changed: 52 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,7 @@ This page provides specific tutorials about the usage of MMDetection3D for Waymo
7
7
Before preparing Waymo dataset, if you only installed requirements in `requirements/build.txt` and `requirements/runtime.txt` before, please install the official package for this dataset at first by running
8
8
9
9
```
10
-
# tf 2.1.0.
11
-
pip install waymo-open-dataset-tf-2-1-0==1.2.0
12
-
# tf 2.0.0
13
-
# pip install waymo-open-dataset-tf-2-0-0==1.2.0
14
-
# tf 1.15.0
15
-
# pip install waymo-open-dataset-tf-1-15-0==1.2.0
10
+
pip install waymo-open-dataset-tf-2-6-0
16
11
```
17
12
18
13
or
@@ -38,15 +33,19 @@ mmdetection3d
38
33
│ │ │ ├── validation
39
34
│ │ │ ├── testing
40
35
│ │ │ ├── gt.bin
36
+
│ │ │ ├── cam_gt.bin
37
+
│ │ │ ├── fov_gt.bin
41
38
│ │ ├── kitti_format
42
39
│ │ │ ├── ImageSets
43
40
44
41
```
45
42
46
-
You can download Waymo open dataset V1.2[HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin files for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare Waymo data by running
43
+
You can download Waymo open dataset V1.4[HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin files for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare Waymo data by running
Note that if your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
@@ -65,22 +64,16 @@ mmdetection3d
65
64
│ │ │ ├── validation
66
65
│ │ │ ├── testing
67
66
│ │ │ ├── gt.bin
67
+
│ │ │ ├── cam_gt.bin
68
+
│ │ │ ├── fov_gt.bin
68
69
│ │ ├── kitti_format
69
70
│ │ │ ├── ImageSets
70
71
│ │ │ ├── training
71
-
│ │ │ │ ├── calib
72
72
│ │ │ │ ├── image_0
73
73
│ │ │ │ ├── image_1
74
74
│ │ │ │ ├── image_2
75
75
│ │ │ │ ├── image_3
76
76
│ │ │ │ ├── image_4
77
-
│ │ │ │ ├── label_0
78
-
│ │ │ │ ├── label_1
79
-
│ │ │ │ ├── label_2
80
-
│ │ │ │ ├── label_3
81
-
│ │ │ │ ├── label_4
82
-
│ │ │ │ ├── label_all
83
-
│ │ │ │ ├── pose
84
77
│ │ │ │ ├── velodyne
85
78
│ │ │ ├── testing
86
79
│ │ │ │ ├── (the same as training)
@@ -93,15 +86,56 @@ mmdetection3d
93
86
94
87
```
95
88
96
-
Here because there are several cameras, we store the corresponding image and labels that can be projected to that camera respectively and save pose for further usage of consecutive frames point clouds. We use a coding way `{a}{bbb}{ccc}` to name the data for each frame, where `a` is the prefix for different split (`0` for training, `1` for validation and `2` for testing), `bbb` for segment index and `ccc` for frame index. You can easily locate the required frame according to this naming rule. We gather the data for training and validation together as KITTI and store the indices for different set in the `ImageSet` files.
89
+
-`kitti_format/training/image_{0-4}/{a}{bbb}{ccc}.jpg` Here because there are several cameras, we store the corresponding images. We use a coding way `{a}{bbb}{ccc}` to name the data for each frame, where `a` is the prefix for different split (`0` for training, `1` for validation and `2` for testing), `bbb` for segment index and `ccc` for frame index. You can easily locate the required frame according to this naming rule. We gather the data for training and validation together as KITTI and store the indices for different set in the `ImageSet` files.
90
+
-`kitti_format/training/velodyne/{a}{bbb}{ccc}.bin` point cloud data for each frame.
91
+
-`kitti_format/waymo_gt_database/xxx_{Car/Pedestrian/Cyclist}_x.bin`. point cloud data included in each 3D bounding box of the training dataset. These point clouds will be used in data augmentation e.g. `ObjectSample`. `xxx` is the index of training samples and `x` is the index of objects in this frame.
92
+
-`kitti_format/waymo_infos_train.pkl`. training dataset information, a dict contains two keys: `metainfo` and `data_list`.`metainfo` contains the basic information for the dataset itself, such as `dataset`, `version` and `info_version`, while `data_list` is a list of dict, each dict (hereinafter referred to as `info`) contains all the detailed information of single sample as follows:
93
+
- info\['sample_idx'\]: The index of this sample in the whole dataset.
94
+
- info\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list).
95
+
- info\['timestamp'\]: Timestamp of the sample data.
96
+
- info\['context_name'\]: The context name of sample indices which `*.tfrecord` segment it extracted from.
97
+
- info\['lidar_points'\]: A dict containing all the information related to the lidar points.
98
+
- info\['lidar_points'\]\['lidar_path'\]: The filename of the lidar point cloud data.
99
+
- info\['lidar_points'\]\['num_pts_feats'\]: The feature dimension of point.
100
+
- info\['lidar_sweeps'\]: A list contains sweeps information of lidar
101
+
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['lidar_path'\]: The lidar data path of i-th sweep.
102
+
- info\['lidar_sweeps'\]\[i\]\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
103
+
- info\['lidar_sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
104
+
- info\['images'\]: A dict contains five keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. Each dict contains all data information related to corresponding camera.
105
+
- info\['images'\]\['CAM_XXX'\]\['img_path'\]: The filename of the image.
106
+
- info\['images'\]\['CAM_XXX'\]\['height'\]: The height of the image.
107
+
- info\['images'\]\['CAM_XXX'\]\['width'\]: The width of the image.
108
+
- info\['images'\]\['CAM_XXX'\]\['cam2img'\]: The transformation matrix recording the intrinsic parameters when projecting 3D points to each image plane. (4x4 list)
109
+
- info\['images'\]\['CAM_XXX'\]\['lidar2cam'\]: The transformation matrix from lidar sensor to this camera. (4x4 list)
110
+
- info\['images'\]\['CAM_XXX'\]\['lidar2img'\]: The transformation matrix from lidar sensor to each image plane. (4x4 list)
111
+
- info\['image_sweeps'\]: A list containing sweeps information of images.
112
+
- info\['image_sweeps'\]\[i\]\['images'\]\['CAM_XXX'\]\['img_path'\]: The image path of i-th sweep.
113
+
- info\['image_sweeps'\]\[i\]\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
114
+
- info\['image_sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
115
+
- info\['instances'\]: It is a list of dict. Each dict contains all annotation information of single instance. For the i-th instance:
116
+
- info\['instances'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, w, h, yaw) order.
117
+
- info\['instances'\]\[i\]\['bbox'\]: List of 4 numbers representing the 2D bounding box of the instance, in (x1, y1, x2, y2) order. (some instances may not have a corresponding 2D bounding box)
118
+
- info\['instances'\]\[i\]\['bbox_label_3d'\]: A int indicating the label of instance and the -1 indicating ignore.
119
+
- info\['instances'\]\[i\]\['bbox_label'\]: A int indicating the label of instance and the -1 indicating ignore.
120
+
- info\['instances'\]\[i\]\['num_lidar_pts'\]: Number of lidar points included in each 3D bounding box.
121
+
- info\['instances'\]\[i\]\['camera_id'\]: The index of the most visible camera for this instance.
122
+
- info\['instances'\]\[i\]\['group_id'\]: The index of this instance in this sample.
123
+
- info\['cam_sync_instances'\]: It is a list of dict. Each dict contains all annotation information of single instance. Its format is same with \['instances'\]. However, \['cam_sync_instances'\] is only for multi-view camera-based 3D Object Detection task.
124
+
- info\['cam_instances'\]: It is a dict containing keys `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. For monocular camera-based 3D Object Detection task, we split 3D annotations of the whole scenes according to the camera they belong to. For the i-th instance:
125
+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, h, w, yaw) order.
126
+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox'\]: 2D bounding box annotation (exterior rectangle of the projected 3D box), a list arrange as \[x1, y1, x2, y2\].
127
+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label_3d'\]: Label of instance.
128
+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label'\]: Label of instance.
129
+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['center_2d'\]: Projected center location on the image, a list has shape (2,).
130
+
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['depth'\]: The depth of projected center.
97
131
98
132
## Training
99
133
100
134
Considering there are many similar frames in the original dataset, we can basically use a subset to train our model primarily. In our preliminary baselines, we load one frame every five frames, and thanks to our hyper parameters settings and data augmentation, we obtain a better result compared with the performance given in the original dataset [paper](https://arxiv.org/pdf/1912.04838.pdf). For more details about the configuration and performance, please refer to README.md in the `configs/pointpillars/`. A more complete benchmark based on other settings and methods is coming soon.
101
135
102
136
## Evaluation
103
137
104
-
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
138
+
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/r1.3/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
0 commit comments