Skip to content
This repository was archived by the owner on Oct 31, 2025. It is now read-only.

Commit 67c9447

Browse files
authored
Merge pull request #61 from captain-pool/add_image_retraining_tpu
Updating TPU trainer Sample
2 parents 1e11a80 + a18ad71 commit 67c9447

File tree

3 files changed

+293
-212
lines changed

3 files changed

+293
-212
lines changed

E1_TPU_Sample/README.md

Lines changed: 25 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
### Cloud TPU
77

88
**TPU Type:** v2.8
9-
**Tensorflow Version:** Nightly
9+
**Tensorflow Version:** 1.14
1010

1111
### Cloud VM
1212

@@ -17,7 +17,7 @@
1717
Launching Instance and VM
1818
---------------------------
1919
- Open Google Cloud Shell
20-
- `ctpu up -tf-version nightly`
20+
- `ctpu up -tf-version 1.14`
2121
- If cloud bucket is not setup automatically, create a cloud storage bucket
2222
with the same name as TPU and the VM
2323
- enable HTTP traffic for the VM instance
@@ -26,35 +26,6 @@ with the same name as TPU and the VM
2626
- `pip3 install -r requirements.txt`
2727
- `export CTPU_NAME=<common name of the tpu, vm and bucket>`
2828

29-
Chaning Tensorflow Source Code For Support to Cloud TPU:
30-
--------------------------------------------------------
31-
TPU is not Officially Supported for Tensorflow 2.0, so it is not exposed in the Public API.
32-
However in the code, the python files containing the required modules are imported explicitly.
33-
There's a small bug in `CrossShardOptimizer` which tries to use OptimizerV1 and all Optimizers
34-
available in the Public API are in V2. To support V2 Optimizers, a small Code Fragment is needed
35-
to be changed in CrossShardOptimizer's `apply_gradients(...)` function.
36-
To do that
37-
- Browse (`cd`) to the installation directory of tensorflow.
38-
39-
**To find the installation directory:**
40-
```python3
41-
>>> import os
42-
>>> import tensorflow as tf
43-
>>> print(os.path.dirname(str(tf).split(" ")[-1][1:]))
44-
```
45-
46-
- `cd` to `python/tpu` inside the installation directory
47-
- open `tpu_optimizer.py` in an editor
48-
- change line no. 173 (For Tensorflow 2.0 Beta)
49-
**From**
50-
```python3
51-
return self._opt.apply_gradients(summed_grads_and_vars, global_step, name)
52-
```
53-
**To**
54-
```python3
55-
return self._opt.apply_gradients(summed_grads_and_vars, name=name)
56-
```
57-
- Save Changes
5829

5930
Running Tensorboard:
6031
----------------------
@@ -74,11 +45,30 @@ To view Tensorboard, Browse to the Public IP of the VM Instance
7445

7546
Running the Code:
7647
----------------------
48+
#### Train The Model
49+
7750
```bash
7851
$ python3 image_retraining_tpu.py --tpu $CTPU_NAME --use_tpu \
79-
--model_dir gs://$CTPU_NAME/model_dir \
80-
--data_dir gs://$CTPU_NAME/data_dir \
81-
--batch_size 16 \
82-
--iterations 4 \
52+
--modeldir gs://$CTPU_NAME/modeldir \
53+
--datadir gs://$CTPU_NAME/datadir \
54+
--logdir gs://$CTPU_NAME/logdir \
55+
--num_steps 2000 \
8356
--dataset horses_or_humans
8457
```
58+
Training Saves one single checkpoint at the end of training. This checkpoint can be loaded up
59+
later to export a SavedModel from it.
60+
61+
#### Export Model
62+
63+
```bash
64+
$ python3 image_retraining_tpu.py --tpu $CTPU_NAME --use_tpu \
65+
--modeldir gs://$CTPU_NAME/modeldir \
66+
--datadir gs://$CTPU_NAME/datadir \
67+
--logdir gs://$CTPU_NAME/logdir \
68+
--dataset horses_or_humans \
69+
--export_only \
70+
--export_path modeldir/model
71+
```
72+
Exporting SavedModel of trained model
73+
----------------------------
74+
The trained model gets saved at `gs://$CTPU_NAME/modeldir/model` by default if the path is not explicitly stated using `--export_path`

E1_TPU_Sample/image_retraining_tpu.py

Lines changed: 0 additions & 177 deletions
This file was deleted.

0 commit comments

Comments
 (0)