Skip to content
This repository was archived by the owner on Sep 23, 2024. It is now read-only.

Commit eb04c17

Browse files
authored
Merge pull request #222 from FIRST-Tech-Challenge/pr_fix_gpu_utilization
Use a custom image to run training and evaluation jobs with GPU.
2 parents 96fc023 + d42aa38 commit eb04c17

File tree

8 files changed

+357
-40
lines changed

8 files changed

+357
-40
lines changed

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,19 @@ popd
160160
## Install JDK
161161
Depending on your OS and distribution there are various ways to install JDK. See https://www.oracle.com/java/technologies/downloads/ for instructions.
162162
163+
## Docker
164+
165+
### Install
166+
167+
Depending on your OS and distribution there are various ways to install Docker. https://docs.docker.com/get-docker/
168+
169+
### Authenticate
170+
171+
```
172+
gcloud auth configure-docker
173+
Do you want to continue (Y/n)? y
174+
```
175+
163176
## Fill in the values in server/env_variables.yaml
164177
165178
1. Replace `<YOUR-PROJECT-ID>` with the Google Cloud Project ID for your project.
@@ -195,16 +208,19 @@ source env_setup.sh
195208
source env_setup.sh
196209
scripts/deploy_indexes.sh
197210
```
211+
198212
1. Deploy the static content (the CSS styles and the favicon).
199213
```
200214
source env_setup.sh
201215
scripts/deploy_static.sh
202216
```
217+
203218
1. Deploy the javascript code.
204219
```
205220
source env_setup.sh
206221
scripts/deploy_js.sh
207222
```
223+
208224
1. Deploy the Cloud Function.
209225
```
210226
source env_setup.sh
@@ -222,6 +238,12 @@ source env_setup.sh
222238
scripts/deploy_gae.sh
223239
```
224240
241+
1. Deploy the object detection docker image.
242+
```
243+
source env_setup.sh
244+
scripts/deploy_docker_image.sh
245+
```
246+
225247
## Try it out
226248
227249
Go to the URL you found earlier at https://console.cloud.google.com/appengine?project=YOUR-PROJECT-ID (replace my_project_id with your actual project ID)

doc/object_detection.md

Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
# This document describes how I created the docker image and object_detection-0.1.tar.gz that is used to run training and evaluation jobs.
2+
3+
November 25, 2021
4+
5+
## Preparing the git branch
6+
7+
1. Clone the repo
8+
```bash
9+
git clone [email protected]:lizlooney/models.git
10+
cd models
11+
git remote add upstream [email protected]:tensorflow/models.git
12+
```
13+
14+
1. Run git log to see what commits are in the master branch.
15+
```bash
16+
git log
17+
```
18+
19+
The output looked like this:
20+
```
21+
commit 65407126c5adc216d606d360429fe12ed3c3f187
22+
Author: Vighnesh Birodkar <[email protected]>
23+
Date: Tue Nov 23 13:50:18 2021 -0800
24+
25+
Fix conditional convs by adding ReLU.
26+
27+
PiperOrigin-RevId: 411887283
28+
29+
commit c280c4eefe13305a1e6b67c58ecf093ad5a754f0
30+
Author: A. Unique TensorFlower <[email protected]>
31+
Date: Tue Nov 23 09:55:42 2021 -0800
32+
33+
Internal change
34+
35+
PiperOrigin-RevId: 411835650
36+
37+
commit f6557386f8f82eb18a18be91693e4fcdf717fa31
38+
Author: Frederick Liu <[email protected]>
39+
Date: Mon Nov 22 23:24:37 2021 -0800
40+
41+
Internal change
42+
43+
PiperOrigin-RevId: 411729044
44+
45+
commit 65c81380cb3bcc0f1d0c756dc8fddf39630d90e0
46+
Author: Fan Yang <[email protected]>
47+
Date: Mon Nov 22 17:23:43 2021 -0800
48+
49+
Internal change.
50+
51+
PiperOrigin-RevId: 411683806
52+
```
53+
54+
1. Create the branch
55+
```bash
56+
git checkout -b for_fmltc_2021_11_25
57+
```
58+
59+
1. Modify research/object_detection/model_main_tf2.py so the evaluation job does not wait if the next checkpoint is already available.
60+
```
61+
90c90
62+
< wait_interval=300, timeout=FLAGS.eval_timeout)
63+
---
64+
> wait_interval=0, timeout=FLAGS.eval_timeout)
65+
```
66+
67+
```bash
68+
git add research/object_detection/model_main_tf2.py
69+
git commit -m "Set wait_interval to 0 when calling model_lib_v2.eval_continuously."
70+
```
71+
72+
1. Modify research/object_detection/packages/tf2/setup.py to prevent binary incompatibility error.
73+
```
74+
26,28c26,31
75+
< # Workaround due to
76+
< # https://github.com/keras-team/keras/issues/15583
77+
< 'keras==2.6.0'
78+
---
79+
> # Prevent "ValueError: numpy.ndarray size changed, may indicate
80+
> # binary incompatibility. Expected 88 from C header, got 80 from
81+
> # PyObject". See https://stackoverflow.com/questions/66060487
82+
> # # Workaround due to
83+
> # # https://github.com/keras-team/keras/issues/15583
84+
> # 'keras==2.6.0'
85+
```
86+
87+
```bash
88+
git add research/object_detection/packages/tf2/setup.py
89+
git commit -m "Update setup.py to prevent numpy problem"
90+
```
91+
92+
1. Run git log to see what commits are in the new branch.
93+
```bash
94+
git log
95+
```
96+
97+
The output looked like this:
98+
```
99+
commit 164cf55d364e1b524282983c42c9a3ac2c6ab90c
100+
Author: lizlooney <[email protected]>
101+
Date: Thu Nov 25 09:28:57 2021 -0800
102+
103+
Update setup.py to prevent numpy problem
104+
105+
commit 5915d449455a966bd45569d76a7c1cd4d3b00574
106+
Author: lizlooney <[email protected]>
107+
Date: Wed Nov 24 14:55:51 2021 -0800
108+
109+
Set wait_interval to 0 when calling model_lib_v2.eval_continuously.
110+
111+
commit 65407126c5adc216d606d360429fe12ed3c3f187
112+
Author: Vighnesh Birodkar <[email protected]>
113+
Date: Tue Nov 23 13:50:18 2021 -0800
114+
115+
Fix conditional convs by adding ReLU.
116+
117+
PiperOrigin-RevId: 411887283
118+
119+
commit c280c4eefe13305a1e6b67c58ecf093ad5a754f0
120+
Author: A. Unique TensorFlower <[email protected]>
121+
Date: Tue Nov 23 09:55:42 2021 -0800
122+
123+
Internal change
124+
125+
PiperOrigin-RevId: 411835650
126+
```
127+
128+
1. Push the branch up to github.
129+
```bash
130+
git push origin for_fmltc_2021_11_25
131+
```
132+
133+
134+
## Creating the docker image
135+
136+
1. Build the docker image. Make sure that FMLTC_GCLOUD_PROJECT_ID is set before doing this step.
137+
```bash
138+
export IMAGE_TAG=2021_11_25
139+
export IMAGE_URI=gcr.io/$FMLTC_GCLOUD_PROJECT_ID/object_detection:$IMAGE_TAG
140+
141+
cd models/research
142+
cp object_detection/dockerfiles/tf2_ai_platform/Dockerfile .
143+
144+
docker build -f Dockerfile -t ${IMAGE_URI} .
145+
```
146+
147+
The partial output looked like this:
148+
> Successfully installed Cython-0.29.24 absl-py-0.12.0 apache-beam-2.34.0 attrs-21.2.0 avro-python3-1.9.2.1 charset-normalizer-2.0.8 colorama-0.4.4 contextlib2-21.6.0 crcmod-1.7 cycler-0.11.0 dill-0.3.1.1 dm-tree-0.1.6 docopt-0.6.2 fastavro-1.4.7 fonttools-4.28.2 future-0.18.2 gin-config-0.5.0 google-api-core-2.2.2 google-api-python-client-2.31.0 google-auth-httplib2-0.1.0 googleapis-common-protos-1.53.0 hdfs-2.6.0 httplib2-0.19.1 importlib-resources-5.4.0 joblib-1.1.0 kaggle-1.5.12 kiwisolver-1.3.2 lvis-0.5.3 matplotlib-3.5.0 numpy-1.20.3 oauth2client-4.1.3 object-detection-0.1 opencv-python-4.5.4.60 opencv-python-headless-4.5.4.60 orjson-3.6.4 packaging-21.3 pandas-1.3.4 portalocker-2.3.2 promise-2.3 psutil-5.8.0 py-cpuinfo-8.0.0 pyarrow-5.0.0 pycocotools-2.0.3 pydot-1.4.2 pymongo-3.12.1 pyparsing-2.4.7 python-dateutil-2.8.2 python-slugify-5.0.2 pytz-2021.3 pyyaml-6.0 regex-2021.11.10 requests-2.26.0 sacrebleu-2.0.0 scikit-learn-1.0.1 scipy-1.7.3 sentencepiece-0.1.96 seqeval-1.2.2 setuptools-scm-6.3.2 tabulate-0.8.9 tensorflow-addons-0.15.0 tensorflow-datasets-4.4.0 tensorflow-hub-0.12.0 tensorflow-io-0.22.0 tensorflow-io-gcs-filesystem-0.22.0 tensorflow-metadata-1.4.0 tensorflow-model-optimization-0.7.0 tensorflow-text-2.7.3 text-unidecode-1.3 tf-models-official-2.7.0 tf-slim-1.1.0 threadpoolctl-3.0.0 tomli-1.2.2 tqdm-4.62.3 typeguard-2.13.2 uritemplate-4.1.1 zipp-3.6.0
149+
150+
1. Push the docker image to Google Cloud Container Registry.
151+
```bash
152+
gcloud auth configure-docker
153+
docker push ${IMAGE_URI}
154+
```
155+
156+
1. Show the image uri so I can use it in model_trainer.py
157+
```bash
158+
echo "IMAGE_URI is $IMAGE_URI"
159+
```
160+
161+
162+
## Creating object_detection-0.1.tar.gz
163+
164+
1. Create a new python environment
165+
```bash
166+
python3 -m venv models_env
167+
source models_env/bin/activate
168+
pip install --upgrade pip
169+
```
170+
171+
1. Compile protos
172+
```bash
173+
cd models/research
174+
protoc object_detection/protos/*.proto --python_out=.
175+
```
176+
177+
1. Copy the setup.py file.
178+
```bash
179+
cd models/research
180+
cp object_detection/packages/tf2/setup.py .
181+
```
182+
183+
1. Modify models/research/setup.py with the exact versions from the output of docker build.
184+
```
185+
10,31c10,85
186+
< # Required for apache-beam with PY3
187+
< 'avro-python3',
188+
< 'apache-beam',
189+
< 'pillow',
190+
< 'lxml',
191+
< 'matplotlib',
192+
< 'Cython',
193+
< 'contextlib2',
194+
< 'tf-slim',
195+
< 'six',
196+
< 'pycocotools',
197+
< 'lvis',
198+
< 'scipy',
199+
< 'pandas',
200+
< 'tf-models-official>=2.5.1',
201+
< 'tensorflow_io',
202+
< # Prevent "ValueError: numpy.ndarray size changed, may indicate
203+
< # binary incompatibility. Expected 88 from C header, got 80 from
204+
< # PyObject". See https://stackoverflow.com/questions/66060487
205+
< # # Workaround due to
206+
< # # https://github.com/keras-team/keras/issues/15583
207+
< # 'keras==2.6.0'
208+
---
209+
> 'Cython==0.29.24',
210+
> 'absl-py==0.12.0',
211+
> 'apache-beam==2.34.0',
212+
> 'attrs==21.2.0',
213+
> 'avro-python3==1.9.2.1',
214+
> 'charset-normalizer==2.0.8',
215+
> 'colorama==0.4.4',
216+
> 'contextlib2==21.6.0',
217+
> 'crcmod==1.7',
218+
> 'cycler==0.11.0',
219+
> 'dill==0.3.1.1',
220+
> 'dm-tree==0.1.6',
221+
> 'docopt==0.6.2',
222+
> 'fastavro==1.4.7',
223+
> 'fonttools==4.28.2',
224+
> 'future==0.18.2',
225+
> 'gin-config==0.5.0',
226+
> 'google-api-core==2.2.2',
227+
> 'google-api-python-client==2.31.0',
228+
> 'google-auth-httplib2==0.1.0',
229+
> 'googleapis-common-protos==1.53.0',
230+
> 'hdfs==2.6.0',
231+
> 'httplib2==0.19.1',
232+
> 'importlib-resources==5.4.0',
233+
> 'joblib==1.1.0',
234+
> 'kaggle==1.5.12',
235+
> 'kiwisolver==1.3.2',
236+
> 'lvis==0.5.3',
237+
> 'matplotlib==3.5.0',
238+
> 'numpy==1.20.3',
239+
> 'oauth2client==4.1.3',
240+
> 'object-detection==0.1',
241+
> 'opencv-python==4.5.4.60',
242+
> 'opencv-python-headless==4.5.4.60',
243+
> 'orjson==3.6.4',
244+
> 'packaging==21.3',
245+
> 'pandas==1.3.4',
246+
> 'portalocker==2.3.2',
247+
> 'promise==2.3',
248+
> 'psutil==5.8.0',
249+
> 'py-cpuinfo==8.0.0',
250+
> 'pyarrow==5.0.0',
251+
> 'pycocotools==2.0.3',
252+
> 'pydot==1.4.2',
253+
> 'pymongo==3.12.1',
254+
> 'pyparsing==2.4.7',
255+
> 'python-dateutil==2.8.2',
256+
> 'python-slugify==5.0.2',
257+
> 'pytz==2021.3',
258+
> 'pyyaml==6.0',
259+
> 'regex==2021.11.10',
260+
> 'requests==2.26.0',
261+
> 'sacrebleu==2.0.0',
262+
> 'scikit-learn==1.0.1',
263+
> 'scipy==1.7.3',
264+
> 'sentencepiece==0.1.96',
265+
> 'seqeval==1.2.2',
266+
> 'setuptools-scm==6.3.2',
267+
> 'tabulate==0.8.9',
268+
> 'tensorflow-addons==0.15.0',
269+
> 'tensorflow-datasets==4.4.0',
270+
> 'tensorflow-hub==0.12.0',
271+
> 'tensorflow-io==0.22.0',
272+
> 'tensorflow-io-gcs-filesystem==0.22.0',
273+
> 'tensorflow-metadata==1.4.0',
274+
> 'tensorflow-model-optimization==0.7.0',
275+
> 'tensorflow-text==2.7.3',
276+
> 'text-unidecode==1.3',
277+
> 'tf-models-official==2.7.0',
278+
> 'tf-slim==1.1.0',
279+
> 'threadpoolctl==3.0.0',
280+
> 'tomli==1.2.2',
281+
> 'tqdm==4.62.3',
282+
> 'typeguard==2.13.2',
283+
> 'uritemplate==4.1.1',
284+
> 'zipp==3.6.0',
285+
```
286+
287+
1. Install the required packages
288+
```bash
289+
pip install Cython==0.29.24
290+
pip install numpy==1.20.3
291+
pip install pycocotools==2.0.3
292+
pip install .
293+
```
294+
295+
1. Build object_detection-0.1.tar.gz.
296+
```bash
297+
python3 setup.py sdist
298+
```
299+
300+
1. Locate object_detection-0.1.tar.gz.
301+
```bash
302+
ls -l dist/object_detection-0.1.tar.gz
303+
```

scripts/deploy_docker_image.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
docker pull ghcr.io/lizlooney/object_detection:2021_11_25
2+
docker tag ghcr.io/lizlooney/object_detection:2021_11_25 gcr.io/${FMLTC_GCLOUD_PROJECT_ID}/object_detection:2021_11_25
3+
docker push gcr.io/${FMLTC_GCLOUD_PROJECT_ID}/object_detection:2021_11_25

0 commit comments

Comments
 (0)