|
1 |
| -# Image Classification |
2 |
| - |
3 |
| -**Warning:** the features in the `image_classification/` folder have been fully |
4 |
| -intergrated into vision/beta. Please use the [new code base](../beta/README.md). |
5 |
| - |
6 |
| -This folder contains TF 2.0 model examples for image classification: |
7 |
| - |
8 |
| -* [MNIST](#mnist) |
9 |
| -* [Classifier Trainer](#classifier-trainer), a framework that uses the Keras |
10 |
| -compile/fit methods for image classification models, including: |
11 |
| - * ResNet |
12 |
| - * EfficientNet[^1] |
13 |
| - |
14 |
| -[^1]: Currently a work in progress. We cannot match "AutoAugment (AA)" in [the original version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet). |
15 |
| -For more information about other types of models, please refer to this |
16 |
| -[README file](../../README.md). |
17 |
| - |
18 |
| -## Before you begin |
19 |
| -Please make sure that you have the latest version of TensorFlow |
20 |
| -installed and |
21 |
| -[add the models folder to your Python path](/official/#running-the-models). |
22 |
| - |
23 |
| -### ImageNet preparation |
24 |
| - |
25 |
| -#### Using TFDS |
26 |
| -`classifier_trainer.py` supports ImageNet with |
27 |
| -[TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets/overview). |
28 |
| - |
29 |
| -Please see the following [example snippet](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/download_and_prepare.py) |
30 |
| -for more information on how to use TFDS to download and prepare datasets, and |
31 |
| -specifically the [TFDS ImageNet readme](https://github.com/tensorflow/datasets/blob/master/docs/catalog/imagenet2012.md) |
32 |
| -for manual download instructions. |
33 |
| - |
34 |
| -#### Legacy TFRecords |
35 |
| -Download the ImageNet dataset and convert it to TFRecord format. |
36 |
| -The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py) |
37 |
| -and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy) |
38 |
| -provide a few options. |
39 |
| - |
40 |
| -Note that the legacy ResNet runners, e.g. [resnet/resnet_ctl_imagenet_main.py](resnet/resnet_ctl_imagenet_main.py) |
41 |
| -require TFRecords whereas `classifier_trainer.py` can use both by setting the |
42 |
| -builder to 'records' or 'tfds' in the configurations. |
43 |
| - |
44 |
| -### Running on Cloud TPUs |
45 |
| - |
46 |
| -Note: These models will **not** work with TPUs on Colab. |
47 |
| - |
48 |
| -You can train image classification models on Cloud TPUs using |
49 |
| -[tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf.distribute.TPUStrategy?version=nightly). |
50 |
| -If you are not familiar with Cloud TPUs, it is strongly recommended that you go |
51 |
| -through the |
52 |
| -[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to |
53 |
| -create a TPU and GCE VM. |
54 |
| - |
55 |
| -### Running on multiple GPU hosts |
56 |
| - |
57 |
| -You can also train these models on multiple hosts, each with GPUs, using |
58 |
| -[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy). |
59 |
| - |
60 |
| -The easiest way to run multi-host benchmarks is to set the |
61 |
| -[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG) |
62 |
| -appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on |
63 |
| -2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and |
64 |
| -host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker", |
65 |
| -"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the |
66 |
| -available GPUs at each host. |
67 |
| - |
68 |
| -## MNIST |
69 |
| - |
70 |
| -To download the data and run the MNIST sample model locally for the first time, |
71 |
| -run one of the following command: |
72 |
| - |
73 |
| -```bash |
74 |
| -python3 mnist_main.py \ |
75 |
| - --model_dir=$MODEL_DIR \ |
76 |
| - --data_dir=$DATA_DIR \ |
77 |
| - --train_epochs=10 \ |
78 |
| - --distribution_strategy=one_device \ |
79 |
| - --num_gpus=$NUM_GPUS \ |
80 |
| - --download |
81 |
| -``` |
82 |
| - |
83 |
| -To train the model on a Cloud TPU, run the following command: |
84 |
| - |
85 |
| -```bash |
86 |
| -python3 mnist_main.py \ |
87 |
| - --tpu=$TPU_NAME \ |
88 |
| - --model_dir=$MODEL_DIR \ |
89 |
| - --data_dir=$DATA_DIR \ |
90 |
| - --train_epochs=10 \ |
91 |
| - --distribution_strategy=tpu \ |
92 |
| - --download |
93 |
| -``` |
94 |
| - |
95 |
| -Note: the `--download` flag is only required the first time you run the model. |
96 |
| - |
97 |
| - |
98 |
| -## Classifier Trainer |
99 |
| -The classifier trainer is a unified framework for running image classification |
100 |
| -models using Keras's compile/fit methods. Experiments should be provided in the |
101 |
| -form of YAML files, some examples are included within the configs/examples |
102 |
| -folder. Please see [configs/examples](./configs/examples) for more example |
103 |
| -configurations. |
104 |
| - |
105 |
| -The provided configuration files use a per replica batch size and is scaled |
106 |
| -by the number of devices. For instance, if `batch size` = 64, then for 1 GPU |
107 |
| -the global batch size would be 64 * 1 = 64. For 8 GPUs, the global batch size |
108 |
| -would be 64 * 8 = 512. Similarly, for a v3-8 TPU, the global batch size would |
109 |
| -be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048. |
110 |
| - |
111 |
| -### ResNet50 |
112 |
| - |
113 |
| -#### On GPU: |
114 |
| -```bash |
115 |
| -python3 classifier_trainer.py \ |
116 |
| - --mode=train_and_eval \ |
117 |
| - --model_type=resnet \ |
118 |
| - --dataset=imagenet \ |
119 |
| - --model_dir=$MODEL_DIR \ |
120 |
| - --data_dir=$DATA_DIR \ |
121 |
| - --config_file=configs/examples/resnet/imagenet/gpu.yaml \ |
122 |
| - --params_override='runtime.num_gpus=$NUM_GPUS' |
123 |
| -``` |
124 |
| - |
125 |
| -To train on multiple hosts, each with GPUs attached using |
126 |
| -[MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) |
127 |
| -please update `runtime` section in gpu.yaml |
128 |
| -(or override using `--params_override`) with: |
129 |
| - |
130 |
| -```YAML |
131 |
| -# gpu.yaml |
132 |
| -runtime: |
133 |
| - distribution_strategy: 'multi_worker_mirrored' |
134 |
| - worker_hosts: '$HOST1:port,$HOST2:port' |
135 |
| - num_gpus: $NUM_GPUS |
136 |
| - task_index: 0 |
137 |
| -``` |
138 |
| -By having `task_index: 0` on the first host and `task_index: 1` on the second |
139 |
| -and so on. `$HOST1` and `$HOST2` are the IP addresses of the hosts, and `port` |
140 |
| -can be chosen any free port on the hosts. Only the first host will write |
141 |
| -TensorBoard Summaries and save checkpoints. |
142 |
| - |
143 |
| -#### On TPU: |
144 |
| -```bash |
145 |
| -python3 classifier_trainer.py \ |
146 |
| - --mode=train_and_eval \ |
147 |
| - --model_type=resnet \ |
148 |
| - --dataset=imagenet \ |
149 |
| - --tpu=$TPU_NAME \ |
150 |
| - --model_dir=$MODEL_DIR \ |
151 |
| - --data_dir=$DATA_DIR \ |
152 |
| - --config_file=configs/examples/resnet/imagenet/tpu.yaml |
153 |
| -``` |
154 |
| - |
155 |
| -### EfficientNet |
156 |
| -**Note: EfficientNet development is a work in progress.** |
157 |
| -#### On GPU: |
158 |
| -```bash |
159 |
| -python3 classifier_trainer.py \ |
160 |
| - --mode=train_and_eval \ |
161 |
| - --model_type=efficientnet \ |
162 |
| - --dataset=imagenet \ |
163 |
| - --model_dir=$MODEL_DIR \ |
164 |
| - --data_dir=$DATA_DIR \ |
165 |
| - --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml \ |
166 |
| - --params_override='runtime.num_gpus=$NUM_GPUS' |
167 |
| -``` |
168 |
| - |
169 |
| - |
170 |
| -#### On TPU: |
171 |
| -```bash |
172 |
| -python3 classifier_trainer.py \ |
173 |
| - --mode=train_and_eval \ |
174 |
| - --model_type=efficientnet \ |
175 |
| - --dataset=imagenet \ |
176 |
| - --tpu=$TPU_NAME \ |
177 |
| - --model_dir=$MODEL_DIR \ |
178 |
| - --data_dir=$DATA_DIR \ |
179 |
| - --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml |
180 |
| -``` |
181 |
| - |
182 |
| -Note that the number of GPU devices can be overridden in the command line using |
183 |
| -`--params_overrides`. The TPU does not need this override as the device is fixed |
184 |
| -by providing the TPU address or name with the `--tpu` flag. |
185 |
| - |
| 1 | +This repository is deprecated and replaced by the solid |
| 2 | +implementations inside vision/beta/. All the content has been moved to |
| 3 | +[official/legacy/image_classification](https://github.com/tensorflow/models/tree/master/official/legacy/image_classification). |
0 commit comments