Skip to content

Training validation dataset #5

@0xddom

Description

@0xddom

To ensure that our code works as intended we need to build a testing dataset and some form of validation.

For extracting apps for the testing dataset we can use f-droid. We need the compiled APK, the same users would install on their devices and the list of third-party libraries the app contains. For common android projects the third-party libraries are declared in the gradle build scripts. A quick manual inspection can reveal the list of libraries the app uses.

The problem is now how to do the validation. We could split the dataset into training and validation datasets if the dataset is large enough. The problem with this and any similar approaches is that if a library is not in the training dataset, it will not see it in the validation dataset. This is a fundamental problem of signature databases, which is libradars approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions