Skip to content

Commit ed36591

Browse files
authored
Add fluid benchmark Dockerfile (#11095)
* add fluid benchmark Dockerfile * add_fluid_benchmark_dockerfile
1 parent d6997e5 commit ed36591

File tree

3 files changed

+51
-13
lines changed

3 files changed

+51
-13
lines changed

benchmark/fluid/Dockerfile

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
2+
RUN apt-get update && apt-get install -y python python-pip iputils-ping libgtk2.0-dev wget vim net-tools iftop
3+
RUN ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/lib/libcudnn.so && ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/lib/libnccl.so
4+
RUN pip install -U pip
5+
RUN pip install -U kubernetes opencv-python paddlepaddle
6+
7+
# IMPORTANT:
8+
# Add "ENV http_proxy=http://ip:port" if your download is slow, and don't forget to unset it at runtime.
9+
10+
RUN sh -c 'echo "import paddle.v2 as paddle\npaddle.dataset.cifar.train10()\npaddle.dataset.flowers.fetch()" | python'
11+
RUN sh -c 'echo "import paddle.v2 as paddle\npaddle.dataset.mnist.train()\npaddle.dataset.mnist.test()\npaddle.dataset.imdb.fetch()" | python'
12+
RUN sh -c 'echo "import paddle.v2 as paddle\npaddle.dataset.imikolov.fetch()" | python'
13+
RUN pip uninstall -y paddlepaddle && mkdir /workspace
14+
15+
ADD https://raw.githubusercontent.com/PaddlePaddle/cloud/develop/docker/paddle_k8s /usr/bin
16+
ADD https://raw.githubusercontent.com/PaddlePaddle/cloud/develop/docker/k8s_tools.py /root
17+
18+
ADD *.whl /
19+
RUN pip install /*.whl && rm -f /*.whl && chmod +x /usr/bin/paddle_k8s
20+
21+
ENV LD_LIBRARY_PATH=/usr/local/lib
22+
ADD fluid_benchmark.py dataset.py models/ /workspace/

benchmark/fluid/README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,25 @@ Currently supported `--model` argument include:
4444

4545
## Run Distributed Benchmark on Kubernetes Cluster
4646

47+
You may need to build a Docker image before submitting a cluster job onto Kubernetes, or you will
48+
have to start all those processes mannually on each node, which is not recommended.
49+
50+
To build the Docker image, you need to choose a paddle "whl" package to run with, you may either
51+
download it from
52+
http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_en.html or
53+
build it by your own. Once you've got the "whl" package, put it under the current directory and run:
54+
55+
```bash
56+
docker build -t [your docker image name]:[your docker image tag] .
57+
```
58+
59+
Then push the image to a Docker registry that your Kubernetes cluster can reach.
60+
4761
We provide a script `kube_gen_job.py` to generate Kubernetes yaml files to submit
4862
distributed benchmark jobs to your cluster. To generate a job yaml, just run:
4963
5064
```bash
51-
python kube_gen_job.py --jobname myjob --pscpu 4 --cpu 8 --gpu 8 --psmemory 20 --memory 40 --pservers 4 --trainers 4 --entry "python fluid_benchmark.py --model mnist --parallel 1 --device GPU --update_method pserver " --disttype pserver
65+
python kube_gen_job.py --jobname myjob --pscpu 4 --cpu 8 --gpu 8 --psmemory 20 --memory 40 --pservers 4 --trainers 4 --entry "python fluid_benchmark.py --model mnist --gpus 8 --device GPU --update_method pserver " --disttype pserver
5266
```
5367
5468
Then the yaml files are generated under directory `myjob`, you can run:

benchmark/fluid/run.sh

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ nohup stdbuf -oL nvidia-smi \
3737
-l 1 &
3838
# mnist
3939
# mnist gpu mnist 128
40-
FLAGS_benchmark=true stdbuf -oL python fluid/mnist.py \
40+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
41+
--model=mnist \
4142
--device=GPU \
4243
--batch_size=128 \
4344
--skip_batch_num=5 \
@@ -46,15 +47,17 @@ FLAGS_benchmark=true stdbuf -oL python fluid/mnist.py \
4647

4748
# vgg16
4849
# gpu cifar10 128
49-
FLAGS_benchmark=true stdbuf -oL python fluid/vgg16.py \
50+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
51+
--model=vgg16 \
5052
--device=GPU \
5153
--batch_size=128 \
5254
--skip_batch_num=5 \
5355
--iterations=30 \
5456
2>&1 | tee -a vgg16_gpu_128.log
5557

5658
# flowers gpu 128
57-
FLAGS_benchmark=true stdbuf -oL python fluid/vgg16.py \
59+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
60+
--model=vgg16 \
5861
--device=GPU \
5962
--batch_size=32 \
6063
--data_set=flowers \
@@ -64,40 +67,39 @@ FLAGS_benchmark=true stdbuf -oL python fluid/vgg16.py \
6467

6568
# resnet50
6669
# resnet50 gpu cifar10 128
67-
FLAGS_benchmark=true stdbuf -oL python fluid/resnet50.py \
70+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
71+
--model=resnet50 \
6872
--device=GPU \
6973
--batch_size=128 \
7074
--data_set=cifar10 \
71-
--model=resnet_cifar10 \
7275
--skip_batch_num=5 \
7376
--iterations=30 \
7477
2>&1 | tee -a resnet50_gpu_128.log
7578

7679
# resnet50 gpu flowers 64
77-
FLAGS_benchmark=true stdbuf -oL python fluid/resnet50.py \
80+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
81+
--model=resnet50 \
7882
--device=GPU \
7983
--batch_size=64 \
8084
--data_set=flowers \
81-
--model=resnet_imagenet \
8285
--skip_batch_num=5 \
8386
--iterations=30 \
8487
2>&1 | tee -a resnet50_gpu_flowers_64.log
8588

8689
# lstm
8790
# lstm gpu imdb 32 # tensorflow only support batch=32
88-
FLAGS_benchmark=true stdbuf -oL python fluid/stacked_dynamic_lstm.py \
91+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
92+
--model=stacked_dynamic_lstm \
8993
--device=GPU \
9094
--batch_size=32 \
9195
--skip_batch_num=5 \
9296
--iterations=30 \
93-
--hidden_dim=512 \
94-
--emb_dim=512 \
95-
--crop_size=1500 \
9697
2>&1 | tee -a lstm_gpu_32.log
9798

9899
# seq2seq
99100
# seq2seq gpu wmb 128
100-
FLAGS_benchmark=true stdbuf -oL python fluid/machine_translation.py \
101+
FLAGS_benchmark=true stdbuf -oL python fluid_benchmark.py \
102+
--model=machine_translation \
101103
--device=GPU \
102104
--batch_size=128 \
103105
--skip_batch_num=5 \

0 commit comments

Comments
 (0)