This repository is archived and no longer maintained

MORAI Dataset Toolkit

Toolkit for preprocessing, data loading with validation, and visualization of MORAI datasets.

Quick Start

Initial Directory Tree

The initial structure of the directory should look as follows:

2025_HYUNDAI_E2EAD_Competition/
├── hyundai_2025/
│   ├── tools/
│   │   ├── custom_dataset.py
│   │   ├── data_preprocessing.py
│   │   ├── dataset_config.py
│   │   ├── download_dataset.py
│   ├── preprocess_global_path.py
│   ├── train.py
│   ├── README.md
└── Dataset/ (Original Dataset)

Note The custom dataset (e.g., USRG_Dataset) directory should not exist initially.

Download the MORAI Dataset (HMG Sample Data)

Install Dependencies:

pip3 install boto3 cloudpathlib tqdm numpy multiprocessing

Change LOCAL_SAVE_DIR to your local path where you want to save the dataset.
Run the download script:
```
python3 download_dataset.py
```

Preprocess Dataset

Preprocess the original dataset (e.g., Dataset) into a custom dataset (e.g., USRG_Dataset).
The script performs the following steps:

(Deep) Copies the entire original dataset.
Preprocesses the data (currently adds GLOBAL_PATH synchronized data for each scenario).
Renames directories to ensure compatibility with later processes.

Run the preprocessing script:

python3 preprocess_global_path.py

Default Arguments:

--original_dataset (default: Dataset): Original dataset name.
--target_dataset (default: USRG_Dataset): Target dataset name.
--root_dir (default: /path/to/datasets): Root directory where the datasets are stored. (!!Change this!!)

Note:
The script prompts you to confirm deletion of the original dataset after preprocessing.

Directory Tree After Preprocessing

The directory structure after preprocessing should look as follows:

2025_HYUNDAI_E2EAD_Competition/
├── hyundai_2025/
│   ├── tools/
│   │   ├── custom_dataset.py
│   │   ├── data_preprocessing.py
│   │   ├── dataset_config.py
│   │   ├── download_dataset.py
│   ├── preprocess_global_path.py
│   ├── train.py
│   ├── README.md
├── Dataset/ (Original Dataset)
└── USRG_Dataset/ (Custom Dataset)

Execute the Training Skeleton

This script prepares for training by performing the following steps:

Prepare Training Removes existing log files, parses and validates the directory structure, and saves a summary.
Create Dataset and DataLoader Utilizes PyTorch utilities to create the dataset and DataLoader.
Training Loop Skeleton Provides a skeleton for the training loop where users only need to input arguments and add training code in the designated area.

Run the training script:

python3 train.py

Default Arguments:

Directory Arguments:
- --dataset_dir_name (default: USRG_Dataset): Name of the dataset directory.
- --dataset_root_path (default: /path/to/root): Root path for datasets.
Training Arguments:
- --epochs (default: 2): Number of training epochs.
- --batch_size (default: 4): Batch size for the DataLoader.
- --number_of_samples (default: 10): Number of sequences to generate.
- --sequence_length (default: 5): Length of each sequence in terms of frames.
- --frequency (default: 10): Sampling frequency in Hz (currently only supports 10Hz).
- --shuffle (default: True): Shuffle the dataset during training.
- --num_workers (default: 0): Number of workers for the DataLoader.
- --drop_last (default: True): Drop the last incomplete batch if set to True.
Additional Options:
- --verbose (default: 1): Verbosity level, 0 (None), 1(Basic), 2(Detailed)
- --visualization (default: False): Enable visualization during dataset creation with identifiers

Note Real data is replaced with identifiers (float, string) during visualization for simplicity. The identifiers consists of {index}_{subindex}_{timestamp}, and the contents and order of the identifiers are validated during the visualization process.

Data Overview

Data Types in Scenarios

Data Type	Shape	Format
CAMERA_1	`3, 720, 1280`	`jpeg`
CAMERA_2	`3, 720, 1280`	`jpeg`
CAMERA_3	`3, 720, 1280`	`jpeg`
CAMERA_4	`3, 720, 1280`	`jpeg`
CAMERA_5	`3, 720, 1280`	`jpeg`
LIDAR_6	`69504, 4`	`bin`
EGO_INFO	`23`	`txt`
OBJECT_INFO	`*, 14`	`txt`
TRAFFIC_INFO	`*, 3` or None	`txt`
GLOBAL_PATH (CUSTOM)	`(NUM_OF_FORWARD + NUM_OF_BACKWARD), 5`	`csv`

Note: * indicates dynamically determined (variable length).

Data Names After Loading with DataLoader

B: Batch Size, N: # of Samples, L: Sequence Length

MAX_NUM_OF_OBJECT = 10 (currently fixed)

MAX_NUM_OF_TRAFFIC = 5 (currently fixed)

NUM_OF_FORWARD = 50 (curretnly fixed)

NUM_OF_BACKWARD = 25 (currently fixed)

NUM_OF_GLOBAL_PATH = 75 (=NUM_OF_FORWARD + NUM_OF_BACKWARD, currently fixed)

Data Type	Name	Shape	Format
Multi-Camera	`camera1`	`B x N, L, 3, 720, 1280`	torch, `float32`
	`camera2`	`B x N, L, 3, 720, 1280`	torch, `float32`
	`camera3`	`B x N, L, 3, 720, 1280`	torch, `float32`
	`camera4`	`B x N, L, 3, 720, 1280`	torch, `float32`
	`camera5`	`B x N, L, 3, 720, 1280`	torch, `float32`
Lidar	`lidar6`	`B x N, L, 69504, 4`	torch, `float32`
Ego Info	`ego_status`	`B x N, L, 21`	torch, `float32`
	`ego_linkid`	`B x N, L, 1`	list, `string`
	`ego_trafficlightid`	`B x N, L, 1`	list, `string`
Object Info	`object_class`	`B x N, L, MAX_NUM_OF_OBJECT`	list, `string`
	`object_info`	`B x N, L, MAX_NUM_OF_OBJECT, 12`	torch, `float32`
	`object_trackid`	`B x N, L, MAX_NUM_OF_OBJECT`	torch, `int32`
Traffic Info	`trafficlight_id`	`B x N, L, MAX_NUM_OF_TRAFFIC`	list, `string`
	`trafficlight_type`	`B x N, L, MAX_NUM_OF_TRAFFIC`	torch, `int32`
	`trafficlight_status`	`B x N, L, MAX_NUM_OF_TRAFFIC`	torch, `int32`
Global Path	`global_path`	`B x N, L, NUM_OF_GLOBAL_PATH, 3`	torch, `float32`
	`global_path_linkid`	`B x N, L, NUM_OF_GLOBAL_PATH`	list, `string`
	`global_path_direction`	`B x N, L, NUM_OF_GLOBAL_PATH`	list, `string`

Note: The first point in the global path with the "Forward" direction represents the current ego position.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repository is archived and no longer maintained

MORAI Dataset Toolkit

Quick Start

Initial Directory Tree

Download the MORAI Dataset (HMG Sample Data)

Preprocess Dataset

Directory Tree After Preprocessing

Execute the Training Skeleton

Data Overview

Data Types in Scenarios

Data Names After Loading with DataLoader

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
tools		tools
.gitignore		.gitignore
README.md		README.md
preprocess_global_path.py		preprocess_global_path.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

This repository is archived and no longer maintained

MORAI Dataset Toolkit

Quick Start

Initial Directory Tree

Download the MORAI Dataset (HMG Sample Data)

Preprocess Dataset

Directory Tree After Preprocessing

Execute the Training Skeleton

Data Overview

Data Types in Scenarios

Data Names After Loading with DataLoader

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages