Skip to content

What is the intended usage of processed_LAE-1M? (Significant annotation reduction observed) #27

@Ananas367

Description

@Ananas367

I am currently working with the provided datasets and noticed a significant discrepancy between the processed_LAE-1M versions and the original official datasets (e.g., DOTAv2 / DIOR).

Observation: I compared processed_LAE-1M_DOTAv2_train.json with the official DOTAv2_train.json and found that the "1M" version has drastically fewer annotations.

File Size: The 1M version (67MB) is much smaller than the original (165MB).

Annotation Count: In dense scenes, the original dataset has ~2000 objects, while the 1M version often has fewer than 1000.

Image Image Image

c9b2ff236" />

My Question: What is the specific purpose or intended usage of this processed_LAE-1M dataset?

Is it intended to be a subset for low-shot/semi-supervised learning?

Is it generated by a model (pseudo-labels) rather than human annotation?

Or is this a potential data processing error?

I am confused because the annotations seem too sparse to be used as standard Ground Truth for fully supervised training. Clarification on how this dataset fits into the LAE pipeline would be greatly appreciated.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions