|
| 1 | +# Datasets |
| 2 | + |
| 3 | +Orion uses images from open source object-detection datasets to create a dataset of military vehicles and format it correctly for [YOLO](https://github.com/ultralytics/ultralytics) training. This page describes the various datasets used by Orion. |
| 4 | + |
| 5 | +## ImageNet |
| 6 | + |
| 7 | +The first dataset Orion uses is `ImageNet21k`. The `ImageNet21k` dataset is available from the [image-net website](https://image-net.org/download-images.php). You need to register and be granted access to download the images. We use the Winter 21 version since it gives the option of downloading the images for a single synset (a class) and we're only interested in images of specific classes (military vehicles). |
| 8 | + |
| 9 | +* The processed version of ImageNet21k is available on the [ImageNet21k repository](https://github.com/Alibaba-MIIL/ImageNet21K). |
| 10 | +* The class ids and names are available in [this issue](https://github.com/google-research/big_transfer/issues/7#issuecomment-640048775). |
| 11 | + |
| 12 | +Orion provides a `search` function to search ImageNet class names for a given query. |
| 13 | + |
| 14 | +```python |
| 15 | +def search( |
| 16 | + keywords: list[str], |
| 17 | + dir: Path = settings.ORION_HOME_DIR / "imagenet", |
| 18 | +): |
| 19 | + """ |
| 20 | + Search image net classes matching the given keywords. |
| 21 | +
|
| 22 | + Args: |
| 23 | + keywords (list[str]): List of keywords to search for. |
| 24 | + dir (Path, optional): directory where files will be downloaded. |
| 25 | + Defaults to ORION_HOME_DIR / "imagenet". |
| 26 | + """ |
| 27 | +``` |
| 28 | + |
| 29 | +!!! note |
| 30 | + The `search` function will download the list of class names and ids in the dataset `dir` if they are not already present. |
| 31 | + |
| 32 | +Orion also provides a convenience `download` function to download images and annotations for a specific class id. |
| 33 | + |
| 34 | +```python |
| 35 | +def download( |
| 36 | + ids: list[str], |
| 37 | + dir: Path = settings.ORION_HOME_DIR / "imagenet", |
| 38 | +): |
| 39 | + """ |
| 40 | + Download ImageNet images and annotations for the given class ids. |
| 41 | +
|
| 42 | + Args: |
| 43 | + ids (list[str]): the class ids to download. |
| 44 | + dir (Path, optional): the dataset directory. |
| 45 | + Defaults to settings.ORION_HOME_DIR / "imagenet". |
| 46 | + """ |
| 47 | +``` |
| 48 | + |
| 49 | +!!! note |
| 50 | + The `download` function will only download images for classes that actually have object detection annotations (a lot of classes in the ImageNet21k dataset do not have annotations). |
| 51 | + |
| 52 | +!!! info |
| 53 | + Orion uses **378 annotated images** from the ImageNet dataset, coming from the `n04389033` class (tank, army tank, armored combat vehicle, armoured combat vehicle), which are all mapped to the `AFV` class.` |
| 54 | + |
| 55 | +## OpenImage |
| 56 | + |
| 57 | +The second dataset Orion uses is [Open Images](https://storage.googleapis.com/openimages/web/index.html) which contains images with `Tank` detection labels. Images from OpenImages are downloaded and managed with [fiftyone](https://docs.voxel51.com/integrations/open_images.html). |
| 58 | + |
| 59 | +!!! info |
| 60 | + Orion uses **1246 annotated images** from the OpenImage dataset which are all mapped to the `AFV` class. |
| 61 | + |
| 62 | +## Russian Military annotated dataset |
| 63 | + |
| 64 | +Another dataset Orion uses is the [Russian Military vehicles](https://universe.roboflow.com/capstoneproject/russian-military-annotated) annotated dataset provided by Tuomo Hiippala from Digital Geography Lab on Roboflow. It contains 1042 annotated images of russing military vehicles with 10 classes which we map to either the `AFC` or the `APC` class. |
| 65 | + |
| 66 | +```python |
| 67 | +LABEL_MAPPING = { |
| 68 | + "bm-21": "AFV", |
| 69 | + "t-80": "AFV", |
| 70 | + "t-64": "AFV", |
| 71 | + "t-72": "AFV", |
| 72 | + "bmp-1": "AFV", |
| 73 | + "bmp-2": "AFV", |
| 74 | + "bmd-2": "AFV", |
| 75 | + "btr-70": "APC", |
| 76 | + "btr-80": "APC", |
| 77 | + "mt-lb": "APC", |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +Orion provides a `download` function to download the images and annotations from this dataset and structure the directory to be imported into a `fiftyone` dataset. |
| 82 | + |
| 83 | +```python |
| 84 | +def download(dir: Path = settings.ORION_HOME_DIR / "roboflow"): |
| 85 | + """ |
| 86 | + Downlad images and annotations from the russian military annotated dataset |
| 87 | + on roboflow and format them to be imported into a fo.Dataset. |
| 88 | +
|
| 89 | + Args: |
| 90 | + dir (Path, optional): the dataset dir. |
| 91 | + Defaults to settings.ORION_HOME_DIR / "roboflow". |
| 92 | + """ |
| 93 | +``` |
| 94 | + |
| 95 | +!!! info |
| 96 | + Orion uses **1042 annotated images** from the Russian Military vehicles dataset which are mapped to the `AFV` or `APC` class. |
| 97 | + |
| 98 | +## Google Images |
| 99 | + |
| 100 | +To improve our training dataset, we also scraped images of military vehicles from Google Image and annotated them by hand. This sample dataset is available for download from Orion's github repository and contains 669 images of vehicles from all four classes (`AFV`, `APC`, `MEV` and `LAV`). |
| 101 | + |
| 102 | +!!! info |
| 103 | + Orion uses **669 annotated images** scraped from Google Images for all four classes. |
| 104 | + |
| 105 | +## The Search 2 |
| 106 | + |
| 107 | +The [The Search_2](https://figshare.com/articles/dataset/The_Search_2_dataset/1041463) consist of 44 high-resolution digital color images of different complex natural scenes, with each scene (image) containing a single military vehicle that serves as a search target. This dataset is not used by Orion for training; it is used instead for evaluating the models on **realistic long range automatic target recognition (ATR) samples**. |
| 108 | + |
| 109 | +## Command-line |
| 110 | + |
| 111 | +Orion provides a CLI command to download and setup a dataset of annotated military vehicles for training and development of automatic target recognition models. The `prepare` command will download images from the [ImageNet](#imagenet), [OpenImages](#openimage), [Russian military](#russian-military-annotated-dataset) and [Google Images](#google-images) sources and combine them into a single dataset on disk. |
| 112 | + |
| 113 | +The `prepare` command takes as an option the directory where all the source images will be downloaded and where the full combined dataset will be saved (by default, `~/.cache/orion`). |
| 114 | + |
| 115 | +```bash |
| 116 | +orion prepare --help |
| 117 | + |
| 118 | + Usage: orion prepare [OPTIONS] |
| 119 | + |
| 120 | + Prepare a dataset of annotated military vehicle images. |
| 121 | + |
| 122 | +╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮ |
| 123 | +│ --dir -d DIRECTORY Orion home directory. [default: ~/.cache/orion] │ |
| 124 | +│ --ids TEXT List of class ids to download. [default: n04389033] │ |
| 125 | +│ --help Show this message and exit. │ |
| 126 | +╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ |
| 127 | +``` |
0 commit comments