@@ -16,37 +16,17 @@ We will use the project id and cluster name in next steps.
16
16
17
17
### Access the Kubernetes Cluster
18
18
19
- To access GKE, we need to install [ Google Cloud
19
+ To access GKE in a local computer , we need to install [ Google Cloud
20
20
SDK] ( https://cloud.google.com/sdk/install ) , which includes command-line tools
21
21
like ` gcloud ` .
22
22
23
- Step 1: Set the PROJECT_ID environment variable in shell.
23
+ Luckily, Google Cloud also provides Cloud Shell with ` gcloud ` installed already.
24
+ In this tutorial, we use Cloud Shell to access the Kubernetes cluster.
25
+ We run the following command in Cloud Shell.
24
26
25
27
``` bash
26
28
export PROJECT_ID=${your_project_id}
27
- gcloud config set project ${PROJECT_ID}
28
- ```
29
-
30
- Step 2: List clusters info with gcloud, and double check it with web console.
31
-
32
- ``` bash
33
- gcloud container clusters list
34
- ```
35
-
36
- Step 3: Use the command below to generate the corresponding kubeconfig.
37
-
38
- ``` bash
39
- gcloud container clusters get-credentials edl-cluster --zone us-central1-c
40
- ```
41
-
42
- Make sure you have
43
- [ ` kubectl ` ] ( https://kubernetes.io/docs/tasks/tools/install-kubectl/ ) available
44
- locally.
45
-
46
- Use the following command to list all the started components.
47
-
48
- ``` bash
49
- kubectl get all --all-namespaces
29
+ gcloud container clusters get-credentials cluster-1 --zone us-central1-c --project ${PROJECT_ID}
50
30
```
51
31
52
32
### Config the Kubernetes Cluster
@@ -56,6 +36,9 @@ have granted related permissions to the default or other related service
56
36
accounts.
57
37
58
38
``` bash
39
+ export CODE_PATH=${your_code_dir}
40
+ cd ${CODE_PATH} && git clone https://github.com/sql-machine-learning/elasticdl.git
41
+ cd ${CODE_PATH} /elasticdl
59
42
kubectl apply -f elasticdl/manifests/elasticdl-rbac.yaml
60
43
```
61
44
@@ -106,19 +89,25 @@ In this example, we create a persistent value claim named `fileserver-claim`.
106
89
### Prepare Dataset
107
90
108
91
Step 1: We generate MNIST training and evaluation data in RecordIO format.
92
+ We provide a script in elasticdl repo.
109
93
110
94
``` bash
111
- python elasticdl/python/data/recordio_gen/image_label.py \
112
- --dataset mnist \
113
- --records_per_shard 4096 .
95
+ cd ${CODE_PATH} /elasticdl
96
+ docker run --rm -it \
97
+ -v $HOME /.keras/datasets:/root/.keras/datasets \
98
+ -v $PWD :/work \
99
+ -w /work elasticdl/elasticdl:dev \
100
+ bash -c " scripts/gen_dataset.sh data"
114
101
```
115
102
103
+ The RecordIO format dataset will generated in the ` data ` directory.
104
+
116
105
Step 2: We launch a pod which mounts the volume, and use ` kubectl cp ` command
117
- to copy data from local to the volume.
106
+ to copy MNIST dataset from local to the volume.
118
107
119
108
``` bash
120
109
kubectl create -f my-pod.yaml
121
- kubectl cp mnist my-pod:/data
110
+ kubectl cp data/ mnist my-pod:/data
122
111
```
123
112
124
113
my-pod.yaml
@@ -144,22 +133,28 @@ spec:
144
133
145
134
### Submit Job
146
135
147
- Please refer to [elasticdl_local tutorial](./elasticdl_local.md) to build the
148
- ` elasticdl:ci` image. The difference is that we have to push the image to
149
- google cloud repo. We use the following command to get the authentication :
136
+ Please refer to [elasticdl_local tutorial](./elasticdl_local.md) for more details.
137
+ The difference is that we have to push the image to google cloud repo.
150
138
151
139
` ` ` bash
152
- gcloud auth configure-docker
140
+ pip install elasticdl-client
141
+
142
+ cd ${CODE_PATH}/elasticdl/model_zoo
143
+
144
+ elasticdl zoo init
145
+
146
+ elasticdl zoo build --image=gcr.io/${PROJECT_ID}/elasticdl:mnist .
147
+
148
+ elasticdl zoo push gcr.io/${PROJECT_ID}/elasticdl:mnist
153
149
```
154
150
155
151
We launch a training job with 2 PS pods and 4 worker pods. The master pod and
156
152
PS pods are set with priority, while worker pods are set with low priority. The
157
153
training docker image will be pushed to google cloud repo.
158
154
159
155
``` bash
160
- python -m elasticdl.python.elasticdl.client train \
161
- --image_base=elasticdl:ci \
162
- --docker_image_repository=gcr.io/${PROJECT_ID} \
156
+ elasticdl train \
157
+ --image_name=gcr.io/${PROJECT_ID} /elasticdl:mnist \
163
158
--model_zoo=model_zoo \
164
159
--model_def=mnist_functional_api.mnist_functional_api.custom_model \
165
160
--training_data=/data/mnist/train \
0 commit comments