Skip to content

Commit 079c084

Browse files
authored
update gcloud tutorial according to the new elasticdl client tool (#2117)
* polish gcloud doc * add gen dataset
1 parent 5f8e307 commit 079c084

File tree

2 files changed

+59
-37
lines changed

2 files changed

+59
-37
lines changed

docs/tutorials/elasticdl_cloud.md

Lines changed: 32 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -16,37 +16,17 @@ We will use the project id and cluster name in next steps.
1616

1717
### Access the Kubernetes Cluster
1818

19-
To access GKE, we need to install [Google Cloud
19+
To access GKE in a local computer, we need to install [Google Cloud
2020
SDK](https://cloud.google.com/sdk/install), which includes command-line tools
2121
like `gcloud`.
2222

23-
Step 1: Set the PROJECT_ID environment variable in shell.
23+
Luckily, Google Cloud also provides Cloud Shell with `gcloud` installed already.
24+
In this tutorial, we use Cloud Shell to access the Kubernetes cluster.
25+
We run the following command in Cloud Shell.
2426

2527
```bash
2628
export PROJECT_ID=${your_project_id}
27-
gcloud config set project ${PROJECT_ID}
28-
```
29-
30-
Step 2: List clusters info with gcloud, and double check it with web console.
31-
32-
```bash
33-
gcloud container clusters list
34-
```
35-
36-
Step 3: Use the command below to generate the corresponding kubeconfig.
37-
38-
```bash
39-
gcloud container clusters get-credentials edl-cluster --zone us-central1-c
40-
```
41-
42-
Make sure you have
43-
[`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) available
44-
locally.
45-
46-
Use the following command to list all the started components.
47-
48-
```bash
49-
kubectl get all --all-namespaces
29+
gcloud container clusters get-credentials cluster-1 --zone us-central1-c --project ${PROJECT_ID}
5030
```
5131

5232
### Config the Kubernetes Cluster
@@ -56,6 +36,9 @@ have granted related permissions to the default or other related service
5636
accounts.
5737

5838
```bash
39+
export CODE_PATH=${your_code_dir}
40+
cd ${CODE_PATH} && git clone https://github.com/sql-machine-learning/elasticdl.git
41+
cd ${CODE_PATH}/elasticdl
5942
kubectl apply -f elasticdl/manifests/elasticdl-rbac.yaml
6043
```
6144

@@ -106,19 +89,25 @@ In this example, we create a persistent value claim named `fileserver-claim`.
10689
### Prepare Dataset
10790

10891
Step 1: We generate MNIST training and evaluation data in RecordIO format.
92+
We provide a script in elasticdl repo.
10993

11094
```bash
111-
python elasticdl/python/data/recordio_gen/image_label.py \
112-
--dataset mnist \
113-
--records_per_shard 4096 .
95+
cd ${CODE_PATH}/elasticdl
96+
docker run --rm -it \
97+
-v $HOME/.keras/datasets:/root/.keras/datasets \
98+
-v $PWD:/work \
99+
-w /work elasticdl/elasticdl:dev \
100+
bash -c "scripts/gen_dataset.sh data"
114101
```
115102

103+
The RecordIO format dataset will generated in the `data` directory.
104+
116105
Step 2: We launch a pod which mounts the volume, and use `kubectl cp` command
117-
to copy data from local to the volume.
106+
to copy MNIST dataset from local to the volume.
118107

119108
```bash
120109
kubectl create -f my-pod.yaml
121-
kubectl cp mnist my-pod:/data
110+
kubectl cp data/mnist my-pod:/data
122111
```
123112

124113
my-pod.yaml
@@ -144,22 +133,28 @@ spec:
144133
145134
### Submit Job
146135
147-
Please refer to [elasticdl_local tutorial](./elasticdl_local.md) to build the
148-
`elasticdl:ci` image. The difference is that we have to push the image to
149-
google cloud repo. We use the following command to get the authentication:
136+
Please refer to [elasticdl_local tutorial](./elasticdl_local.md) for more details.
137+
The difference is that we have to push the image to google cloud repo.
150138
151139
```bash
152-
gcloud auth configure-docker
140+
pip install elasticdl-client
141+
142+
cd ${CODE_PATH}/elasticdl/model_zoo
143+
144+
elasticdl zoo init
145+
146+
elasticdl zoo build --image=gcr.io/${PROJECT_ID}/elasticdl:mnist .
147+
148+
elasticdl zoo push gcr.io/${PROJECT_ID}/elasticdl:mnist
153149
```
154150

155151
We launch a training job with 2 PS pods and 4 worker pods. The master pod and
156152
PS pods are set with priority, while worker pods are set with low priority. The
157153
training docker image will be pushed to google cloud repo.
158154

159155
```bash
160-
python -m elasticdl.python.elasticdl.client train \
161-
--image_base=elasticdl:ci \
162-
--docker_image_repository=gcr.io/${PROJECT_ID} \
156+
elasticdl train \
157+
--image_name=gcr.io/${PROJECT_ID}/elasticdl:mnist \
163158
--model_zoo=model_zoo \
164159
--model_def=mnist_functional_api.mnist_functional_api.custom_model \
165160
--training_data=/data/mnist/train \

scripts/gen_dataset.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/usr/bin/env bash
2+
# Copyright 2020 The ElasticDL Authors. All rights reserved.
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# Generate mnist dataset
16+
DATA_PATH=$1
17+
18+
python elasticdl/python/data/recordio_gen/image_label.py --dataset mnist \
19+
--records_per_shard 4096 "$DATA_PATH"
20+
21+
# Generate frappe dataset
22+
python elasticdl/python/data/recordio_gen/frappe_recordio_gen.py --data /root/.keras/datasets \
23+
--output_dir "$DATA_PATH"/frappe
24+
25+
# Generate heart dataset
26+
python elasticdl/python/data/recordio_gen/heart_recordio_gen.py --data_dir /root/.keras/datasets \
27+
--output_dir "$DATA_PATH"/heart

0 commit comments

Comments
 (0)