Skip to content

Commit 944437e

Browse files
committed
changing directory name for windows compatibility
1 parent f796137 commit 944437e

File tree

14 files changed

+2064140
-0
lines changed

14 files changed

+2064140
-0
lines changed

PyTorch-Multi-Label-Image-Classification-Image-Tagging/Pipeline.ipynb

Lines changed: 462 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Setup
2+
Before installation create and activate virtual environment
3+
```bash
4+
python3 -m venv venv
5+
source venv/bin/activate
6+
```
7+
Install the dependencies
8+
```bash
9+
pip install -r requirements.txt
10+
```
11+
12+
# Training
13+
For training run [jupyter notebook](Pipeline.ipynb)
14+
15+
# Additional instructions
16+
## Data preparation
17+
We use [the NUS-WIDE dataset](https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html) for this tutorial.
18+
Instead of parsing Flickr for image downloading we use [a dump](https://drive.google.com/open?id=0B7IzDz-4yH_HSmpjSTlFeUlSS00) from [this github repository](https://github.com/thuml/HashNet/tree/master/pytorch#datasets)
19+
Download and extract it.
20+
21+
Also, we added pre-processed annotations:
22+
```nus_wide/train.json```
23+
```nus_wide/test.json```
24+
25+
If you want to create them yourself, run the command:
26+
```bash
27+
python split_data_nus.py -i images
28+
```
29+
where ``` -i images``` is the path to the folder with extracted images
30+
31+
## Subset creation
32+
You can train the model for the entire data set, but it takes a lot of time. For this tutorial we use part of this data.
33+
34+
For subset creation run the command:
35+
```bash
36+
python create_subset.py -i images
37+
```
38+
where ``` -i images``` is the path to the folder with extracted images
39+
40+
Additional options:
41+
```bash
42+
python create_subset.py -h
43+
usage: Subset creation [-h] -i IMG_PATH [-v VAL_SIZE] [-t TRAIN_SIZE]
44+
[--shuffle] [-l LABELS [LABELS ...]]
45+
46+
optional arguments:
47+
-h, --help show this help message and exit
48+
-i IMG_PATH, --img-path IMG_PATH
49+
Path to the "images" folder
50+
-v VAL_SIZE, --val-size VAL_SIZE
51+
Size of the validation data
52+
-t TRAIN_SIZE, --train-size TRAIN_SIZE
53+
Size of the train data
54+
--shuffle Shuffle samples before splitting
55+
-l LABELS [LABELS ...], --labels LABELS [LABELS ...]
56+
Subset labels
57+
```
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
import argparse
2+
import json
3+
import os
4+
from random import shuffle
5+
6+
7+
def main():
8+
parser = argparse.ArgumentParser('Subset creation')
9+
parser.add_argument("-i", "--img-path", required=True, type=str, help='Path to the "images" folder')
10+
parser.add_argument("-v", "--val-size", default=1000, type=int, help='Size of the validation data')
11+
parser.add_argument("-t", "--train-size", default=5000, type=int, help='Size of the train data')
12+
parser.add_argument('--shuffle', action='store_true', help='Shuffle samples before splitting')
13+
parser.add_argument("-l", "--labels", nargs='+', default=['house', 'birds', 'sun', 'valley',
14+
'nighttime', 'boats', 'mountain', 'tree', 'snow',
15+
'beach', 'vehicle', 'rocks',
16+
'reflection', 'sunset', 'road', 'flowers', 'ocean',
17+
'lake', 'window', 'plants',
18+
'buildings', 'grass', 'water', 'animal', 'person',
19+
'clouds', 'sky'], help='Subset labels')
20+
args = parser.parse_args()
21+
img_path = args.img_path
22+
labels = args.labels
23+
24+
with open('nus_wide/cats') as l_f:
25+
possible_labels = l_f.readlines()
26+
possible_labels = [i.strip() for i in possible_labels]
27+
28+
for label in labels:
29+
if label not in possible_labels:
30+
print('Label:', label, "is unknown. Possible labels:", ', '.join(possible_labels))
31+
exit(-1)
32+
33+
with open(os.path.join(img_path, 'test.json')) as fp:
34+
test_data = json.load(fp)
35+
test_samples = test_data['samples']
36+
37+
with open(os.path.join(img_path, 'train.json')) as fp:
38+
train_data = json.load(fp)
39+
train_samples = train_data['samples']
40+
41+
if args.shuffle:
42+
shuffle(test_samples)
43+
shuffle(train_samples)
44+
45+
train_size = args.train_size
46+
test_size = args.val_size
47+
48+
small_train = []
49+
i = 0
50+
while len(small_train) < train_size:
51+
sample_img_path, sample_labels = train_samples[i]['image_name'], train_samples[i]['image_labels']
52+
sample_labels = [label for label in sample_labels if label in labels]
53+
if len(sample_labels):
54+
small_train.append({'image_name': sample_img_path, 'image_labels': sample_labels})
55+
i += 1
56+
57+
small_test = []
58+
i = 0
59+
while len(small_test) < test_size:
60+
sample_img_path, sample_labels = test_samples[i]['image_name'], test_samples[i]['image_labels']
61+
sample_labels = [label for label in sample_labels if label in labels]
62+
if len(sample_labels):
63+
small_test.append({'image_name': sample_img_path, 'image_labels': sample_labels})
64+
i += 1
65+
66+
with open(os.path.join(img_path, 'small_train.json'), 'w') as fp:
67+
json.dump({'samples': small_train, 'labels': labels}, fp, indent=3)
68+
69+
with open(os.path.join(img_path, 'small_test.json'), 'w') as fp:
70+
json.dump({'samples': small_test, 'labels': labels}, fp, indent=3)
71+
72+
73+
if __name__ == '__main__':
74+
main()

0 commit comments

Comments
 (0)