Skip to content

Commit db045ac

Browse files
author
王益
committed
Improve the design doc of Docker build
1 parent 56fcf9c commit db045ac

File tree

1 file changed

+130
-22
lines changed

1 file changed

+130
-22
lines changed

paddle/scripts/docker/README.md

Lines changed: 130 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,146 @@
1-
因为我们不提供非Ubuntu的bulid支持,所以如果用户用其他操作系统,比如CoreOS、CentOS、MacOS X、Windows,开发都得在docker里。所以需要能build本地修改后的代码。
1+
We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md.
22

3-
我们可能需要两个 Docker images:
3+
I am recording some ideas here, and we should file a PR later.
44

5-
1. development image:不包括源码,但是包括开发环境(预先安装好各种工具),也就是说Dockerfile.dev里既不需要 COPY 也不需要 RUN git clone。虽然这个image和源码无关,但是不同版本的源码需要依赖不同的第三方库,所以这个image的tag里还是要包含git branch/tag name,比如叫做 `paddlepaddle/paddle:dev-0.10.0rc1`,这里的0.10.0.rc1是一个branch name,其中rc是release candidate的意思。正是发布之后就成了master branch里的一个tag,叫做0.10.0。
5+
## Current Status
66

7-
1. production image: 不包括编译环境,也不包括源码,只包括build好的libpaddle.so和必要的Python packages,用于在Kubernetes机群上跑应用的image。比如叫做 `paddlepaddle/paddle:0.10.0rc1`
7+
Currently, we have four sets of Dockefiles:
88

9-
从1.生成2.的过程如下:
9+
1. Kubernetes examples:
1010

11-
1. 在本机(host)上开发。假设源码位于 `~/work/paddle`
11+
```
12+
doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh
13+
doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh
14+
doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one.
15+
```
16+
17+
1. Generate .deb packages:
18+
19+
```
20+
paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory
21+
```
22+
23+
1. In the `docker` directory:
24+
25+
```
26+
paddle/scripts/docker/Dockerfile
27+
paddle/scripts/docker/Dockerfile.gpu
28+
```
29+
30+
1. Document building
31+
32+
```
33+
paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets.
34+
```
35+
36+
## Goal
37+
38+
We want two Docker images for each version of PaddlePaddle:
39+
40+
1. `paddle:<version>-dev`
41+
42+
This a development image contains only the development tools. This standardizes the building tools and procedure. Users include:
43+
44+
- developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
45+
- release engineers -- use this to build the official release from certain branch/tag on Github.com.
46+
- document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages.
47+
48+
So the development image must contain not only source code building tools, but also documentation tools:
49+
50+
- gcc/clang
51+
- nvcc
52+
- Python
53+
- sphinx
54+
- woboq
55+
- sshd
56+
57+
where `sshd` makes it easy for developers to have multiple terminals connecting into the container.
58+
59+
1. `paddle:<version>`
60+
61+
This is the production image, generated using the development image. This image might have multiple variants:
62+
63+
- GPU/AVX `paddle:<version>-gpu`
64+
- GPU/no-AVX `paddle:<version>-gpu-noavx`
65+
- no-GPU/AVX `paddle:<version>`
66+
- no-GPU/no-AVX `paddle:<version>-noavx`
67+
68+
We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version.
69+
70+
We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.
71+
72+
## Dockerfile
73+
74+
To realize above goals, we need only one Dockerfile for the development image. We can put it in the root source directory.
75+
76+
Let us go over our daily development procedure to show how developers can use this file.
77+
78+
1. Check out the source code
1279

13-
1. 用dev image build 我们的源码:
1480
```bash
15-
docker run -it -p 2022:22 -v $PWD:/paddle paddlepaddle/paddle:dev-0.10.0rc1 /paddle/build.sh
16-
```
17-
注意,这里的 `-v ` 参数把host上的源码目录里的内容映射到了container里的`/paddle` 目录;而container里的 `/paddle/build.sh` 就是源码目录里的 `build.sh`。上述命令调用了本地源码中的 bulid.sh 来build了本地源码,结果在container里的 `/paddle/build` 目录里,也就是本地的源码目录里的 `build` 子目录。
81+
git clone https://github.com/PaddlePaddle/Paddle paddle
82+
```
83+
84+
1. Do something
1885

19-
1. 我们希望上述 `build.sh` 脚本在 `build` 子目录里生成一个Dockerfile,使得我们可以运行:
2086
```bash
21-
docker build -t paddle ./build
87+
cd paddle
88+
git checkout -b my_work
89+
Edit some files
2290
```
23-
来生成我们的production image。
24-
25-
1. 有了这个production image之后,我们可能会希望docker push 到dockerhub.com的我们自己的名下,然后可以用来启动本地或者远程(Kubernetes)jobs:
91+
92+
1. Build/update the development image (if not yet)
2693

2794
```bash
28-
docker tag paddle yiwang/paddle:did-some-change
29-
docker push
30-
paddlectl run yiwang/paddle:did-some-change /paddle/demo/mnist/train.py
95+
docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory.
96+
```
97+
98+
1. Build the source code
99+
100+
```bash
101+
docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev
102+
```
103+
104+
This command maps the source directory on the host into `/paddle` in the container.
105+
106+
Please be aware that the default entrypoint of `paddle:dev` is a shell script file `build.sh`, which builds the source code, and outputs to `/paddle/build` in the container, which is actually `$PWD/build` on the host.
107+
108+
`build.sh` doesn't only build binaries, but also generates a `$PWD/build/Dockerfile` file, which can be used to build the production image. We will talk about it later.
109+
110+
1. Run on the host (Not recommended)
111+
112+
If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program. But the recommended way is to running in a production image.
113+
114+
1. Run in the development container
115+
116+
`build.sh` generates binary files and invokes `make install`. So we can run the built program within the development container. This is convenient for developers.
117+
118+
1. Build a production image
119+
120+
On the host, we can use the `$PWD/build/Dockerfile` to generate a production image.
121+
122+
```bash
123+
docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile .
31124
```
32125

33-
其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。
126+
1. Run the Paddle Book
127+
128+
Once we have the production image, we can run [Paddle Book](http://book.paddlepaddle.org/) chapters in Jupyter Notebooks (if we chose to build them)
34129

130+
```bash
131+
docker run -it paddle
132+
```
133+
134+
Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book.
135+
136+
1. Run on Kubernetes
137+
138+
We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:
139+
140+
```bash
141+
docker tag paddle me/paddle
142+
docker push
143+
kubectl ...
144+
```
35145

36-
曾经的讨论背景:
37-
["PR 1599"](https://github.com/PaddlePaddle/Paddle/pull/1599)
38-
["PR 1598"](https://github.com/PaddlePaddle/Paddle/pull/1598)
146+
For end users, we will provide more convinient tools to run distributed jobs.

0 commit comments

Comments
 (0)