Skip to content

Feedback on Optimizing the Annotation and Sampling Strategy of the LAE-1M Dataset #25

@AnXMuy

Description

@AnXMuy

Core Conclusion: The current random instance sampling strategy adopted by the LAE-1M dataset to balance class counts is inconsistent with the core requirements of detection tasks. Additionally, the random sampling method for optimizing dataset distribution is not clearly described in the paper or repository. It is recommended to adjust the sampling scheme to retain all instances and filter samples at the image level, while supplementing detailed explanations of the sampling logic.

Description of Existing Issues

To balance the number of different classes, the dataset uses a random instance sampling method. The core goal of detection tasks is to identify all targets in scenes, but the current random instance sampling leads to the loss of a large number of target samples in the training set. This may prevent models trained on this dataset from learning complete target features, thereby affecting the models' detection performance and potentially reducing the reference value of relevant experimental results.

The specific implementation of the random sampling method—used to optimize dataset distribution—is not clearly explained in either the published paper or the dataset repository. This lack of transparency makes it difficult for researchers to reproduce experiments or evaluate the rationality of the sampling strategy.

Suggestions for Optimization Schemes

Retain all target instances of all classes in the dataset to ensure that the training process can fully cover the true distribution of various targets.

Adjust the sampling logic to filter samples at the image level. Indirectly balance the number of different classes by removing some images containing redundant or repetitive targets, while ensuring the integrity of target instances in the remaining images.

Supplement detailed descriptions of the sampling logic (including the adjusted strategy) in both the paper and the repository, specifying key details such as sampling criteria, implementation steps, and parameter settings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions