Skip to content

Commit b56212b

Browse files
authored
Fix markdownlint errors of docs/tutorials/*.md (#2061)
* Use mdfmt * Manual fixes
1 parent 8868ae9 commit b56212b

File tree

5 files changed

+264
-136
lines changed

5 files changed

+264
-136
lines changed

docs/tutorials/dev.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ git clone https://github.com/sql-machine-learning/elasticdl
99
cd elasticdl
1010
```
1111

12-
1312
## Development Tools in a Docker Image
1413

1514
We prefer to install all building tools in a Docker image.
@@ -18,7 +17,6 @@ We prefer to install all building tools in a Docker image.
1817
docker build --target dev -t elasticdl:dev -f elasticdl/docker/Dockerfile .
1918
```
2019

21-
2220
## Check Code Style
2321

2422
The above Docker image contains pre-commit and hooks. We can run it as a

docs/tutorials/elasticdl_cloud.md

Lines changed: 70 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,71 @@
11
# ElasticDL on Public Cloud
22

3-
ElasticDL is a Kubernetes-native machine learning framework. This document explains how to run an ElasticDL job on a public cloud, namely, Google Kubernetes Engine (GKE).
3+
ElasticDL is a Kubernetes-native machine learning framework. This document
4+
explains how to run an ElasticDL job on a public cloud, namely, Google
5+
Kubernetes Engine (GKE).
46

57
## Configure GKE Environment
68

79
### Create a Project and a Kubernetes Cluster
810

9-
First, we create a new project for elasticdl in [web console](https://console.cloud.google.com/) and a new Kubernetes cluster under this project.
11+
First, we create a new project for elasticdl in [web
12+
console](https://console.cloud.google.com/) and a new Kubernetes cluster under
13+
this project.
1014

1115
We will use the project id and cluster name in next steps.
1216

1317
### Access the Kubernetes Cluster
1418

15-
To access GKE, we need to install [Google Cloud SDK](https://cloud.google.com/sdk/install), which includes command-line tools like `gcloud`.
16-
19+
To access GKE, we need to install [Google Cloud
20+
SDK](https://cloud.google.com/sdk/install), which includes command-line tools
21+
like `gcloud`.
1722

1823
Step 1: Set the PROJECT_ID environment variable in shell.
1924

20-
```
25+
```bash
2126
export PROJECT_ID=${your_project_id}
2227
gcloud config set project ${PROJECT_ID}
2328
```
2429

25-
2630
Step 2: List clusters info with gcloud, and double check it with web console.
2731

28-
```
32+
```bash
2933
gcloud container clusters list
3034
```
3135

32-
Following is an our testing cluster
33-
34-
```
35-
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
36-
edl-cluster us-central1-c 1.14.10-gke.36 x.x.x.x n1-standard-8 1.14.10-gke.36 3 RUNNING
37-
```
38-
3936
Step 3: Use the command below to generate the corresponding kubeconfig.
4037

41-
```
38+
```bash
4239
gcloud container clusters get-credentials edl-cluster --zone us-central1-c
4340
```
4441

45-
Make sure you have [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) available locally.
42+
Make sure you have
43+
[`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) available
44+
locally.
4645

4746
Use the following command to list all the started components.
4847

49-
```
48+
```bash
5049
kubectl get all --all-namespaces
5150
```
5251

53-
5452
### Config the Kubernetes Cluster
5553

56-
ElasticDL jobs require pod creation and deletion permissions. Make sure you have granted related permissions to the default or other related service accounts.
54+
ElasticDL jobs require pod creation and deletion permissions. Make sure you
55+
have granted related permissions to the default or other related service
56+
accounts.
5757

5858
```bash
5959
kubectl apply -f elasticdl/manifests/elasticdl-rbac.yaml
6060
```
6161

62-
ElasticDL supports elastic scheduling, and works well the priority-based scheduling of Kubernetes. We create two customized PriorityClass in the cluster, high and low.
63-
62+
ElasticDL supports elastic scheduling, and works well the priority-based
63+
scheduling of Kubernetes. We create two customized PriorityClass in the
64+
cluster, high and low.
6465

6566
high.yaml
6667

67-
```
68+
```yaml
6869
apiVersion: scheduling.k8s.io/v1
6970
kind: PriorityClass
7071
metadata:
@@ -75,7 +76,7 @@ globalDefault: false
7576
7677
low.yaml
7778
78-
```
79+
```yaml
7980
apiVersion: scheduling.k8s.io/v1
8081
kind: PriorityClass
8182
metadata:
@@ -84,17 +85,19 @@ value: 1000
8485
globalDefault: false
8586
```
8687
87-
```
88+
```bash
8889
kubectl create -f high.yaml
8990
kubectl create -f low.yaml
9091
```
9192

92-
9393
### Mount a Volume for the Kubernetes Cluster
9494

95-
First, we create a [Cloud Filestore](https://cloud.google.com/filestore) instance in web console.
95+
First, we create a [Cloud Filestore](https://cloud.google.com/filestore)
96+
instance in web console.
9697

97-
Then we follow the [doc](https://cloud.google.com/filestore/docs/accessing-fileshares) to access fileshares from the Kubernetes cluster.
98+
Then we follow the
99+
[doc](https://cloud.google.com/filestore/docs/accessing-fileshares) to access
100+
fileshares from the Kubernetes cluster.
98101

99102
In this example, we create a persistent value claim named `fileserver-claim`.
100103

@@ -104,21 +107,23 @@ In this example, we create a persistent value claim named `fileserver-claim`.
104107

105108
Step 1: We generate MNIST training and evaluation data in RecordIO format.
106109

107-
```
108-
python elasticdl/python/data/recordio_gen/image_label.py --dataset mnist --records_per_shard 4096 .
110+
```bash
111+
python elasticdl/python/data/recordio_gen/image_label.py \
112+
--dataset mnist \
113+
--records_per_shard 4096 .
109114
```
110115

111-
Step 2: We launch a pod which mounts the volume, and use `kubectl cp` command to copy data from local to the volume.
116+
Step 2: We launch a pod which mounts the volume, and use `kubectl cp` command
117+
to copy data from local to the volume.
112118

113-
```
119+
```bash
114120
kubectl create -f my-pod.yaml
115-
116121
kubectl cp mnist my-pod:/data
117122
```
118123

119124
my-pod.yaml
120125

121-
```
126+
```yaml
122127
apiVersion: v1
123128
kind: Pod
124129
metadata:
@@ -139,15 +144,19 @@ spec:
139144
140145
### Submit Job
141146
142-
Please refer to [elasticdl_local tutorial](./elasticdl_local.md) to build the `elasticdl:ci` image. The difference is that we have to push the image to google cloud repo. We use the following command to get the authentication:
147+
Please refer to [elasticdl_local tutorial](./elasticdl_local.md) to build the
148+
`elasticdl:ci` image. The difference is that we have to push the image to
149+
google cloud repo. We use the following command to get the authentication:
143150

144-
```
151+
```bash
145152
gcloud auth configure-docker
146153
```
147154

148-
We launch a training job with 2 PS pods and 4 worker pods. The master pod and PS pods are set with priority, while worker pods are set with low priority. The training docker image will be pushed to google cloud repo.
155+
We launch a training job with 2 PS pods and 4 worker pods. The master pod and
156+
PS pods are set with priority, while worker pods are set with low priority. The
157+
training docker image will be pushed to google cloud repo.
149158

150-
```
159+
```bash
151160
python -m elasticdl.python.elasticdl.client train \
152161
--image_base=elasticdl:ci \
153162
--docker_image_repository=gcr.io/${PROJECT_ID} \
@@ -180,28 +189,31 @@ python -m elasticdl.python.elasticdl.client train \
180189

181190
To see the status of each pod:
182191

183-
```
192+
```bash
184193
kubectl get pods
185194
```
186195

187196
To see the loss in worker pod:
188197

189-
```
198+
```bash
190199
kubectl logs elasticdl-test-mnist-worker-0 | grep "Loss"
191200
```
201+
192202
To see the evaluation metrics in the master pod:
193203

194-
```
204+
```bash
195205
kubectl logs elasticdl-test-mnist-master | grep "Evaluation"
196206
```
197207

198208
## Example of Job Fault Tolerance
199209

200-
ElasticDL supports fault tolerance in distributed training. When a worker pod is killed, the training job does not crash and the master pod will try to relaunch a new worker pod.
210+
ElasticDL supports fault tolerance in distributed training. When a worker pod
211+
is killed, the training job does not crash and the master pod will try to
212+
relaunch a new worker pod.
201213

202214
At first, all pods are running:
203215

204-
```
216+
```text
205217
elasticdl-test-mnist-master 1/1 Running 0 35s
206218
elasticdl-test-mnist-ps-0 1/1 Running 0 29s
207219
elasticdl-test-mnist-ps-1 1/1 Running 0 28s
@@ -213,13 +225,13 @@ elasticdl-test-mnist-worker-3 1/1 Running 0 28s
213225

214226
Then, we delete a worker pod:
215227

216-
```
228+
```bash
217229
kubectl delete pod elasticdl-test-mnist-worker-0
218230
```
219231

220232
The master pod creates a new worker pod `elasticdl-test-mnist-worker-4` at once.
221233

222-
```
234+
```text
223235
NAME READY STATUS RESTARTS AGE
224236
elasticdl-test-mnist-master 1/1 Running 0 51s
225237
elasticdl-test-mnist-ps-0 1/1 Running 0 45s
@@ -232,11 +244,12 @@ elasticdl-test-mnist-worker-4 1/1 Running 0 6s
232244

233245
## Example of Elastic Scheduling
234246

235-
After we launch the MNIST training job, we launch another nginx service with high priority in the same cluster.
247+
After we launch the MNIST training job, we launch another nginx service with
248+
high priority in the same cluster.
236249

237250
nginx.yaml
238251

239-
```
252+
```yaml
240253
apiVersion: apps/v1
241254
kind: Deployment
242255
metadata:
@@ -272,13 +285,14 @@ spec:
272285
restartPolicy: Always
273286
```
274287

275-
```
288+
```bash
276289
kubectl create -f nginx.yaml
277290
```
278291

279-
We will find that some worker pods with low priority are preempted by nginx pods with high priority.
292+
We will find that some worker pods with low priority are preempted by nginx
293+
pods with high priority.
280294

281-
```
295+
```text
282296
NAME READY STATUS RESTARTS AGE
283297
elasticdl-test-mnist-master 1/1 Running 0 34s
284298
elasticdl-test-mnist-ps-0 1/1 Running 0 27s
@@ -296,7 +310,7 @@ test-nginx-7585fc5976-ss8pk 0/1 Pending 0 2s
296310

297311
After preemption, the training job still goes on with one worker pod.
298312

299-
```
313+
```text
300314
elasticdl-test-mnist-master 1/1 Running 0 61s
301315
elasticdl-test-mnist-ps-0 1/1 Running 0 54s
302316
elasticdl-test-mnist-ps-1 1/1 Running 0 54s
@@ -311,16 +325,17 @@ test-nginx-7585fc5976-ckd94 1/1 Running 0 29s
311325
test-nginx-7585fc5976-ss8pk 1/1 Running 0 29s
312326
```
313327

314-
Then, we scale the nginx deployment down to 1 replica. Some cluster resources are freed.
315-
328+
Then, we scale the nginx deployment down to 1 replica. Some cluster resources
329+
are freed.
316330

317-
```
331+
```bash
318332
kubectl scale deployment.v1.apps/test-nginx --replicas=1
319333
```
320334

321-
We find that the training job takes over the freed resources, and goes on with 4 worker pods.
335+
We find that the training job takes over the freed resources, and goes on with
336+
4 worker pods.
322337

323-
```
338+
```text
324339
NAME READY STATUS RESTARTS AGE
325340
elasticdl-test-mnist-master 1/1 Running 0 2m3s
326341
elasticdl-test-mnist-ps-0 1/1 Running 0 116s

docs/tutorials/elasticdl_on_prem_cluster.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
## Environment preparation
44

5-
You should install ElasticDL first. Please refer to the installation part in [elastic_local](elasticdl_local.md) doc.
5+
You should install ElasticDL first. Please refer to the installation part in
6+
[elastic_local](elasticdl_local.md) doc.
67

78
Then, build needed images.
89

@@ -13,7 +14,8 @@ bash scripts/travis/build_images.sh
1314

1415
## Submit job to cluster
1516

16-
The submit command is similar to local mode. The local scripts will be built into a docker image, and pushed to `$DOCKER_HUB_REPO` remote docker hub.
17+
The submit command is similar to local mode. The local scripts will be built
18+
into a docker image, and pushed to `$DOCKER_HUB_REPO` remote docker hub.
1719

1820
Following is an exmaple:
1921

0 commit comments

Comments
 (0)