What is the intended usage of processed_LAE-1M? (Significant annotation reduction observed)

I am currently working with the provided datasets and noticed a significant discrepancy between the processed_LAE-1M versions and the original official datasets (e.g., DOTAv2 / DIOR).

Observation: I compared processed_LAE-1M_DOTAv2_train.json with the official DOTAv2_train.json and found that the "1M" version has drastically fewer annotations.

File Size: The 1M version (67MB) is much smaller than the original (165MB).

Annotation Count: In dense scenes, the original dataset has ~2000 objects, while the 1M version often has fewer than 1000.

<img width="1610" height="800" alt="Image" src="https://github.com/user-attachments/assets/18bafd24-3918-4cce-ba1b-2c05cb6ce72b" />
<img width="1610" height="800" alt="Image" src="https://github.com/user-attachments/assets/1e5b0135-5319-4497-9cb2-fce98e14a8bf" />
<img width="2058" height="1024" alt="Image" src="https://github.com/user-attachments/assets/3bce6e63-dac8-46b0-9340-063723bcb259" />

c9b2ff236" />

My Question: What is the specific purpose or intended usage of this processed_LAE-1M dataset?

Is it intended to be a subset for low-shot/semi-supervised learning?

Is it generated by a model (pseudo-labels) rather than human annotation?

Or is this a potential data processing error?

I am confused because the annotations seem too sparse to be used as standard Ground Truth for fully supervised training. Clarification on how this dataset fits into the LAE pipeline would be greatly appreciated.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the intended usage of processed_LAE-1M? (Significant annotation reduction observed) #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What is the intended usage of processed_LAE-1M? (Significant annotation reduction observed) #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions