Skip to content
This repository was archived by the owner on Jul 16, 2025. It is now read-only.

kangjehun/MORAI_Dataset_Toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is archived and no longer maintained

MORAI Dataset Toolkit

Toolkit for preprocessing, data loading with validation, and visualization of MORAI datasets.


Quick Start

Initial Directory Tree

The initial structure of the directory should look as follows:

2025_HYUNDAI_E2EAD_Competition/
├── hyundai_2025/
│   ├── tools/
│   │   ├── custom_dataset.py
│   │   ├── data_preprocessing.py
│   │   ├── dataset_config.py
│   │   ├── download_dataset.py
│   ├── preprocess_global_path.py
│   ├── train.py
│   ├── README.md
└── Dataset/ (Original Dataset)

Note The custom dataset (e.g., USRG_Dataset) directory should not exist initially.


Download the MORAI Dataset (HMG Sample Data)

  1. Install Dependencies:

    pip3 install boto3 cloudpathlib tqdm numpy multiprocessing
  2. Change LOCAL_SAVE_DIR to your local path where you want to save the dataset.

  3. Run the download script:

    python3 download_dataset.py

Preprocess Dataset

Preprocess the original dataset (e.g., Dataset) into a custom dataset (e.g., USRG_Dataset).
The script performs the following steps:

  1. (Deep) Copies the entire original dataset.

  2. Preprocesses the data (currently adds GLOBAL_PATH synchronized data for each scenario).

  3. Renames directories to ensure compatibility with later processes.

Run the preprocessing script:

python3 preprocess_global_path.py

Default Arguments:

  • --original_dataset (default: Dataset): Original dataset name.

  • --target_dataset (default: USRG_Dataset): Target dataset name.

  • --root_dir (default: /path/to/datasets): Root directory where the datasets are stored. (!!Change this!!)

Note:
The script prompts you to confirm deletion of the original dataset after preprocessing.


Directory Tree After Preprocessing

The directory structure after preprocessing should look as follows:

2025_HYUNDAI_E2EAD_Competition/
├── hyundai_2025/
│   ├── tools/
│   │   ├── custom_dataset.py
│   │   ├── data_preprocessing.py
│   │   ├── dataset_config.py
│   │   ├── download_dataset.py
│   ├── preprocess_global_path.py
│   ├── train.py
│   ├── README.md
├── Dataset/ (Original Dataset)
└── USRG_Dataset/ (Custom Dataset)

Execute the Training Skeleton

This script prepares for training by performing the following steps:

  1. Prepare Training Removes existing log files, parses and validates the directory structure, and saves a summary.

  2. Create Dataset and DataLoader Utilizes PyTorch utilities to create the dataset and DataLoader.

  3. Training Loop Skeleton Provides a skeleton for the training loop where users only need to input arguments and add training code in the designated area.

Run the training script:

python3 train.py

Default Arguments:

  • Directory Arguments:

    • --dataset_dir_name (default: USRG_Dataset): Name of the dataset directory.

    • --dataset_root_path (default: /path/to/root): Root path for datasets.

  • Training Arguments:

    • --epochs (default: 2): Number of training epochs.

    • --batch_size (default: 4): Batch size for the DataLoader.

    • --number_of_samples (default: 10): Number of sequences to generate.

    • --sequence_length (default: 5): Length of each sequence in terms of frames.

    • --frequency (default: 10): Sampling frequency in Hz (currently only supports 10Hz).

    • --shuffle (default: True): Shuffle the dataset during training.

    • --num_workers (default: 0): Number of workers for the DataLoader.

    • --drop_last (default: True): Drop the last incomplete batch if set to True.

  • Additional Options:

    • --verbose (default: 1): Verbosity level, 0 (None), 1(Basic), 2(Detailed)

    • --visualization (default: False): Enable visualization during dataset creation with identifiers

Note Real data is replaced with identifiers (float, string) during visualization for simplicity. The identifiers consists of {index}_{subindex}_{timestamp}, and the contents and order of the identifiers are validated during the visualization process.


Data Overview

Data Types in Scenarios

Data Type Shape Format
CAMERA_1 3, 720, 1280 jpeg
CAMERA_2 3, 720, 1280 jpeg
CAMERA_3 3, 720, 1280 jpeg
CAMERA_4 3, 720, 1280 jpeg
CAMERA_5 3, 720, 1280 jpeg
LIDAR_6 69504, 4 bin
EGO_INFO 23 txt
OBJECT_INFO *, 14 txt
TRAFFIC_INFO *, 3 or None txt
GLOBAL_PATH (CUSTOM) (NUM_OF_FORWARD + NUM_OF_BACKWARD), 5 csv

Note: * indicates dynamically determined (variable length).


Data Names After Loading with DataLoader

B: Batch Size, N: # of Samples, L: Sequence Length

MAX_NUM_OF_OBJECT = 10 (currently fixed)

MAX_NUM_OF_TRAFFIC = 5 (currently fixed)

NUM_OF_FORWARD = 50 (curretnly fixed)

NUM_OF_BACKWARD = 25 (currently fixed)

NUM_OF_GLOBAL_PATH = 75 (=NUM_OF_FORWARD + NUM_OF_BACKWARD, currently fixed)

Data Type Name Shape Format
Multi-Camera camera1 B x N, L, 3, 720, 1280 torch, float32
camera2 B x N, L, 3, 720, 1280 torch, float32
camera3 B x N, L, 3, 720, 1280 torch, float32
camera4 B x N, L, 3, 720, 1280 torch, float32
camera5 B x N, L, 3, 720, 1280 torch, float32
Lidar lidar6 B x N, L, 69504, 4 torch, float32
Ego Info ego_status B x N, L, 21 torch, float32
ego_linkid B x N, L, 1 list, string
ego_trafficlightid B x N, L, 1 list, string
Object Info object_class B x N, L, MAX_NUM_OF_OBJECT list, string
object_info B x N, L, MAX_NUM_OF_OBJECT, 12 torch, float32
object_trackid B x N, L, MAX_NUM_OF_OBJECT torch, int32
Traffic Info trafficlight_id B x N, L, MAX_NUM_OF_TRAFFIC list, string
trafficlight_type B x N, L, MAX_NUM_OF_TRAFFIC torch, int32
trafficlight_status B x N, L, MAX_NUM_OF_TRAFFIC torch, int32
Global Path global_path B x N, L, NUM_OF_GLOBAL_PATH, 3 torch, float32
global_path_linkid B x N, L, NUM_OF_GLOBAL_PATH list, string
global_path_direction B x N, L, NUM_OF_GLOBAL_PATH list, string

Note: The first point in the global path with the "Forward" direction represents the current ego position.

About

Toolkit for MORAI dataset handling, training, and ROS-based visualization

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages