You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tools/aws_benchmarking/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,6 +54,7 @@ Training nodes will run your `ENTRYPOINT` script with the following environment
54
54
-`TRAINERS`: trainer count
55
55
-`SERVER_ENDPOINT`: current server end point if the node role is a pserver
56
56
-`TRAINER_INDEX`: an integer to identify the index of current trainer if the node role is a trainer.
57
+
-`PADDLE_INIT_TRAINER_ID`: same as above
57
58
58
59
Now we have a working distributed training script which takes advantage of node environment variables and docker file to generate the training image. Run the following command:
59
60
@@ -81,8 +82,7 @@ putcn/paddle_aws_client \
81
82
--action create \
82
83
--key_name <your key pare name> \
83
84
--security_group_id <your security group id> \
84
-
--pserver_image_id <your pserver image id> \
85
-
--trainer_image_id <your trainer images id> \
85
+
--docker_image myreponame/paddle_benchmark \
86
86
--pserver_count 2 \
87
87
--trainer_count 2
88
88
```
@@ -146,7 +146,7 @@ When the training is finished, pservers and trainers will be terminated. All the
146
146
Master exposes 4 major services:
147
147
148
148
- GET `/status`: return master log
149
-
- GET `/list_logs`: return list of log file names
149
+
- GET `/logs`: return list of log file names
150
150
- GET `/log/<logfile name>`: return a particular log by log file name
0 commit comments