-
Notifications
You must be signed in to change notification settings - Fork 3
more details about image-text pair data #8
Copy link
Copy link
Open
Description
Hi,
Thanks for your great work.
I have several questions about the data and method:
- I am curious about the pipeline about generation of text list about HD map. Can you share more details about how to get text for multi-view and bev images? Are those information from a pretrained mulit-modal model or rules based on hd map?
- Are the visual encoder the same for multi-view images and bev cloud images? The encoder in the paper seems different, but in the inference code https://github.com/LLVM-AD/MAPLM/blob/main/baseline/evaluation/inference.py#L72C29-L72C44, the image processors are the same.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels