Explainable Transformer-based 3D Object Detector from multiple cameras

This application assists in understanding the reasoning of a 3D Object detector, SpatialDETR. The Machine Learning framework used is Pytorch. The Graphical User Interface is created with Tkinter. It uses the MMDetection3D framework to load a model and dataset. The dataset used for this application is Nuscenes. The application displays saliency maps from various explainability techniques, allowing you to see where the model is focusing at. It also provides additional features such as visualization in Birth Eye View (BEV), generation of a segmentation map from the saliency map and visualization of the objects self-attention mechanism.

Setup

Repository

Clone the repository together with its submodules:

git clone --recurse-submodules https://gitlab.ika.rwth-aachen.de/ma-zahr/xai.git

If you have already cloned the repository without the --recurse-submodules flag, initialize all submodules manually:

git submodule update --init --recursive

Nuscenes Dataset

Follow the mmdetection3d instructions to preprocess the data of the nuScenes dataset.

Model weights

Download the weights for FCOS3D (which are used by SpatialDETR) and put them inside a directory called pretrained.
Download the weights for SpatialDETR (query_proj_value_proj) and put them inside a directory called checkpoints.

Docker Setup

The docker container is based on the following packages:

Python 3.7
Pytorch 1.9
Cuda 11.1
MMCV 1.5
MMDetection 2.23
MMSegmentation 0.20

To ease the setup of this project we provide a docker container and some convenience scripts (see docker). To setup use:

(if not done alread) setup Docker and Nvidia-Docker
Run

./docker/build.sh

to build the container

If on a Linux machine, open bash /docker/run_loc.sh. If on a Windows machine bash /docker/run_win.sh. If on a remote Linux server, open /docker/run_cluster.sh.
Adapt the nusc_data_dir to the nuscenes directory and work_dirs to the direcory where model weights and pretrained weights directories are saved.

Container Run and Setup

If on Linux machine run:

./docker/run_loc.sh

If on Windows machine run:

./docker/run_win.sh

Otherwise, if working on a remote server, run:

./docker/run_cluster.sh

Now the container is running, but some packages still need to be installed. Run

./docker/in_docker_setup.sh

Application testing

After the docker container is set-up, run the application:

python scripts/main.py

A GUI will appear. Select File-Load model to load the model configuration and the checkpoints. Otherwise, if the same configurations are used each time, modify the config.toml file accordinly and select File-Load from config file.
Change the visualization settings with the drop-down menus. Then, click Visualize.
For Advanced Mode click Settings-Advanced Mode.

Usage

The application provides a GUI with a series of dropdown menus for customization.

Model Configuration: Start by loading a SpatialDETR model configuration. You can opt for random sample data or choose a specific index.

Prediction Threshold: Set your prediction threshold. This helps to disregard low score queries and focus on confident predictions. The default threshold of 0.5 will streamline visualization by eliminating redundant queries.

Explainability Methods: Choose between Raw Attention, Grad-CAM, or Gradient Rollout to understand the model's decisions. Each method offers unique insights. With Raw Attention, you can select a specific head or fuse them using maximum, minimum, or mean. In Advanced Mode, you can also set a discard threshold to filter out lower attention maps, do perturbation or sanity tests.

Saliency Maps Generation: The app will then generate saliency maps—heat maps of the model's "attention"—across all six cameras for your chosen sample data.

Object-specific Analysis: View saliency maps for all objects in a sample or focus on a specific one for detailed analysis. The app generates saliency maps for all layers and self-attention scores for the selected object, helping you understand the detection process layer-by-layer.

Real-time Visualization: You can generate and view a sequence of images with saliency maps for all objects, giving a sense of the attention mechanism's dynamics. Pause the sequence, select objects for further analysis, and resume. Apply an object-specific filter to focus on samples containing a certain object.

Visualization Options: Visualize a Bird's Eye View (BEV) perspective using LiDAR points and bounding boxes for a broad environmental view. You can convert 3D bounding boxes into 2D for simpler visualization and overlay ground truth bounding boxes on model predictions to compare performance. You can also create a segmentation mask from the saliency map, helpful when comparing with the dataset's ground truth.

Overview of the GUI application:

Figure showing saliency maps of a car through the layers:

Figure illustrating the self attention of a car with other objects in the scene:

Issues and To-Do

The shell for running the docker container makes sure that the display environment is correctly forwarded to the docker and ssh server. However, some display errors still can be faced reported while trying to run the GUI application. For testing the GUI inside the Docker container, run xclock and see if a window opens. The following guides are useful: https://www.baeldung.com/linux/forward-x-over-ssh, https://x410.dev/cookbook/enabling-ssh-x11-forwarding-in-visual-studio-code-for-remote-development/
The application works only with SpatialDETR.

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
configs		configs
detr3d @ c2a8b3a		detr3d @ c2a8b3a
docker		docker
misc		misc
mmdetection3d @ 1c26dba		mmdetection3d @ 1c26dba
modules		modules
scripts		scripts
spatial_detr		spatial_detr
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.toml		config.toml
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Explainable Transformer-based 3D Object Detector from multiple cameras

Setup

Repository

Nuscenes Dataset

Model weights

Docker Setup

Container Run and Setup

Application testing

Usage

Issues and To-Do

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ZassiM/SpatialDETR

Folders and files

Latest commit

History

Repository files navigation

Explainable Transformer-based 3D Object Detector from multiple cameras

Setup

Repository

Nuscenes Dataset

Model weights

Docker Setup

Container Run and Setup

Application testing

Usage

Issues and To-Do

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages