Toolkit for preprocessing, data loading with validation, and visualization of MORAI datasets.
The initial structure of the directory should look as follows:
2025_HYUNDAI_E2EAD_Competition/
├── hyundai_2025/
│ ├── tools/
│ │ ├── custom_dataset.py
│ │ ├── data_preprocessing.py
│ │ ├── dataset_config.py
│ │ ├── download_dataset.py
│ ├── preprocess_global_path.py
│ ├── train.py
│ ├── README.md
└── Dataset/ (Original Dataset)
Note The custom dataset (e.g., USRG_Dataset) directory should not exist initially.
-
Install Dependencies:
pip3 install boto3 cloudpathlib tqdm numpy multiprocessing
-
Change
LOCAL_SAVE_DIRto your local path where you want to save the dataset. -
Run the download script:
python3 download_dataset.py
Preprocess the original dataset (e.g., Dataset) into a custom dataset (e.g., USRG_Dataset).
The script performs the following steps:
-
(Deep) Copies the entire original dataset.
-
Preprocesses the data (currently adds
GLOBAL_PATHsynchronized data for each scenario). -
Renames directories to ensure compatibility with later processes.
Run the preprocessing script:
python3 preprocess_global_path.pyDefault Arguments:
-
--original_dataset(default:Dataset): Original dataset name. -
--target_dataset(default:USRG_Dataset): Target dataset name. -
--root_dir(default:/path/to/datasets): Root directory where the datasets are stored. (!!Change this!!)
Note:
The script prompts you to confirm deletion of the original dataset after preprocessing.
The directory structure after preprocessing should look as follows:
2025_HYUNDAI_E2EAD_Competition/
├── hyundai_2025/
│ ├── tools/
│ │ ├── custom_dataset.py
│ │ ├── data_preprocessing.py
│ │ ├── dataset_config.py
│ │ ├── download_dataset.py
│ ├── preprocess_global_path.py
│ ├── train.py
│ ├── README.md
├── Dataset/ (Original Dataset)
└── USRG_Dataset/ (Custom Dataset)
This script prepares for training by performing the following steps:
-
Prepare Training Removes existing log files, parses and validates the directory structure, and saves a summary.
-
Create Dataset and DataLoader Utilizes PyTorch utilities to create the dataset and DataLoader.
-
Training Loop Skeleton Provides a skeleton for the training loop where users only need to input arguments and add training code in the designated area.
Run the training script:
python3 train.pyDefault Arguments:
-
Directory Arguments:
-
--dataset_dir_name(default:USRG_Dataset): Name of the dataset directory. -
--dataset_root_path(default:/path/to/root): Root path for datasets.
-
-
Training Arguments:
-
--epochs(default:2): Number of training epochs. -
--batch_size(default:4): Batch size for the DataLoader. -
--number_of_samples(default:10): Number of sequences to generate. -
--sequence_length(default:5): Length of each sequence in terms of frames. -
--frequency(default:10): Sampling frequency in Hz (currently only supports10Hz). -
--shuffle(default:True): Shuffle the dataset during training. -
--num_workers(default:0): Number of workers for the DataLoader. -
--drop_last(default:True): Drop the last incomplete batch if set toTrue.
-
-
Additional Options:
-
--verbose(default:1): Verbosity level, 0 (None), 1(Basic), 2(Detailed) -
--visualization(default:False): Enable visualization during dataset creation with identifiers
-
Note Real data is replaced with identifiers (float, string) during visualization for simplicity. The identifiers consists of {index}_{subindex}_{timestamp}, and the contents and order of the identifiers are validated during the visualization process.
| Data Type | Shape | Format |
|---|---|---|
| CAMERA_1 | 3, 720, 1280 |
jpeg |
| CAMERA_2 | 3, 720, 1280 |
jpeg |
| CAMERA_3 | 3, 720, 1280 |
jpeg |
| CAMERA_4 | 3, 720, 1280 |
jpeg |
| CAMERA_5 | 3, 720, 1280 |
jpeg |
| LIDAR_6 | 69504, 4 |
bin |
| EGO_INFO | 23 |
txt |
| OBJECT_INFO | *, 14 |
txt |
| TRAFFIC_INFO | *, 3 or None |
txt |
| GLOBAL_PATH (CUSTOM) | (NUM_OF_FORWARD + NUM_OF_BACKWARD), 5 |
csv |
Note: * indicates dynamically determined (variable length).
B: Batch Size, N: # of Samples, L: Sequence Length
MAX_NUM_OF_OBJECT = 10 (currently fixed)
MAX_NUM_OF_TRAFFIC = 5 (currently fixed)
NUM_OF_FORWARD = 50 (curretnly fixed)
NUM_OF_BACKWARD = 25 (currently fixed)
NUM_OF_GLOBAL_PATH = 75 (=NUM_OF_FORWARD + NUM_OF_BACKWARD, currently fixed)
| Data Type | Name | Shape | Format |
|---|---|---|---|
| Multi-Camera | camera1 |
B x N, L, 3, 720, 1280 |
torch, float32 |
camera2 |
B x N, L, 3, 720, 1280 |
torch, float32 |
|
camera3 |
B x N, L, 3, 720, 1280 |
torch, float32 |
|
camera4 |
B x N, L, 3, 720, 1280 |
torch, float32 |
|
camera5 |
B x N, L, 3, 720, 1280 |
torch, float32 |
|
| Lidar | lidar6 |
B x N, L, 69504, 4 |
torch, float32 |
| Ego Info | ego_status |
B x N, L, 21 |
torch, float32 |
ego_linkid |
B x N, L, 1 |
list, string |
|
ego_trafficlightid |
B x N, L, 1 |
list, string |
|
| Object Info | object_class |
B x N, L, MAX_NUM_OF_OBJECT |
list, string |
object_info |
B x N, L, MAX_NUM_OF_OBJECT, 12 |
torch, float32 |
|
object_trackid |
B x N, L, MAX_NUM_OF_OBJECT |
torch, int32 |
|
| Traffic Info | trafficlight_id |
B x N, L, MAX_NUM_OF_TRAFFIC |
list, string |
trafficlight_type |
B x N, L, MAX_NUM_OF_TRAFFIC |
torch, int32 |
|
trafficlight_status |
B x N, L, MAX_NUM_OF_TRAFFIC |
torch, int32 |
|
| Global Path | global_path |
B x N, L, NUM_OF_GLOBAL_PATH, 3 |
torch, float32 |
global_path_linkid |
B x N, L, NUM_OF_GLOBAL_PATH |
list, string |
|
global_path_direction |
B x N, L, NUM_OF_GLOBAL_PATH |
list, string |
Note: The first point in the global path with the "Forward" direction represents the current ego position.