|
1 |
| -因为我们不提供非Ubuntu的bulid支持,所以如果用户用其他操作系统,比如CoreOS、CentOS、MacOS X、Windows,开发都得在docker里。所以需要能build本地修改后的代码。 |
| 1 | +We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md. |
2 | 2 |
|
3 |
| -我们可能需要两个 Docker images: |
| 3 | +I am recording some ideas here, and we should file a PR later. |
4 | 4 |
|
5 |
| -1. development image:不包括源码,但是包括开发环境(预先安装好各种工具),也就是说Dockerfile.dev里既不需要 COPY 也不需要 RUN git clone。虽然这个image和源码无关,但是不同版本的源码需要依赖不同的第三方库,所以这个image的tag里还是要包含git branch/tag name,比如叫做 `paddlepaddle/paddle:dev-0.10.0rc1`,这里的0.10.0.rc1是一个branch name,其中rc是release candidate的意思。正是发布之后就成了master branch里的一个tag,叫做0.10.0。 |
| 5 | +## Current Status |
6 | 6 |
|
7 |
| -1. production image: 不包括编译环境,也不包括源码,只包括build好的libpaddle.so和必要的Python packages,用于在Kubernetes机群上跑应用的image。比如叫做 `paddlepaddle/paddle:0.10.0rc1`。 |
| 7 | +Currently, we have four sets of Dockefiles: |
8 | 8 |
|
9 |
| -从1.生成2.的过程如下: |
| 9 | +1. Kubernetes examples: |
10 | 10 |
|
11 |
| -1. 在本机(host)上开发。假设源码位于 `~/work/paddle`。 |
| 11 | + ``` |
| 12 | + doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh |
| 13 | + doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh |
| 14 | + doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one. |
| 15 | + ``` |
| 16 | + |
| 17 | +1. Generate .deb packages: |
| 18 | + |
| 19 | + ``` |
| 20 | + paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory |
| 21 | + ``` |
| 22 | + |
| 23 | +1. In the `docker` directory: |
| 24 | + |
| 25 | + ``` |
| 26 | + paddle/scripts/docker/Dockerfile |
| 27 | + paddle/scripts/docker/Dockerfile.gpu |
| 28 | + ``` |
| 29 | + |
| 30 | +1. Document building |
| 31 | + |
| 32 | + ``` |
| 33 | + paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets. |
| 34 | + ``` |
| 35 | + |
| 36 | +## Goal |
| 37 | + |
| 38 | +We want two Docker images for each version of PaddlePaddle: |
| 39 | + |
| 40 | +1. `paddle:<version>-dev` |
| 41 | + |
| 42 | + This a development image contains only the development tools. This standardizes the building tools and procedure. Users include: |
| 43 | + |
| 44 | + - developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer). |
| 45 | + - release engineers -- use this to build the official release from certain branch/tag on Github.com. |
| 46 | + - document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages. |
| 47 | + |
| 48 | + So the development image must contain not only source code building tools, but also documentation tools: |
| 49 | + |
| 50 | + - gcc/clang |
| 51 | + - nvcc |
| 52 | + - Python |
| 53 | + - sphinx |
| 54 | + - woboq |
| 55 | + - sshd |
| 56 | + |
| 57 | + where `sshd` makes it easy for developers to have multiple terminals connecting into the container. |
| 58 | + |
| 59 | +1. `paddle:<version>` |
| 60 | + |
| 61 | + This is the production image, generated using the development image. This image might have multiple variants: |
| 62 | + |
| 63 | + - GPU/AVX `paddle:<version>-gpu` |
| 64 | + - GPU/no-AVX `paddle:<version>-gpu-noavx` |
| 65 | + - no-GPU/AVX `paddle:<version>` |
| 66 | + - no-GPU/no-AVX `paddle:<version>-noavx` |
| 67 | + |
| 68 | + We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version. |
| 69 | + |
| 70 | + We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs. |
| 71 | + |
| 72 | +## Dockerfile |
| 73 | + |
| 74 | +To realize above goals, we need only one Dockerfile for the development image. We can put it in the root source directory. |
| 75 | + |
| 76 | +Let us go over our daily development procedure to show how developers can use this file. |
| 77 | + |
| 78 | +1. Check out the source code |
12 | 79 |
|
13 |
| -1. 用dev image build 我们的源码: |
14 | 80 | ```bash
|
15 |
| - docker run -it -p 2022:22 -v $PWD:/paddle paddlepaddle/paddle:dev-0.10.0rc1 /paddle/build.sh |
16 |
| - ``` |
17 |
| - 注意,这里的 `-v ` 参数把host上的源码目录里的内容映射到了container里的`/paddle` 目录;而container里的 `/paddle/build.sh` 就是源码目录里的 `build.sh`。上述命令调用了本地源码中的 bulid.sh 来build了本地源码,结果在container里的 `/paddle/build` 目录里,也就是本地的源码目录里的 `build` 子目录。 |
| 81 | + git clone https://github.com/PaddlePaddle/Paddle paddle |
| 82 | + ``` |
| 83 | + |
| 84 | +1. Do something |
18 | 85 |
|
19 |
| -1. 我们希望上述 `build.sh` 脚本在 `build` 子目录里生成一个Dockerfile,使得我们可以运行: |
20 | 86 | ```bash
|
21 |
| - docker build -t paddle ./build |
| 87 | + cd paddle |
| 88 | + git checkout -b my_work |
| 89 | + Edit some files |
22 | 90 | ```
|
23 |
| - 来生成我们的production image。 |
24 |
| - |
25 |
| -1. 有了这个production image之后,我们可能会希望docker push 到dockerhub.com的我们自己的名下,然后可以用来启动本地或者远程(Kubernetes)jobs: |
| 91 | + |
| 92 | +1. Build/update the development image (if not yet) |
26 | 93 |
|
27 | 94 | ```bash
|
28 |
| - docker tag paddle yiwang/paddle:did-some-change |
29 |
| - docker push |
30 |
| - paddlectl run yiwang/paddle:did-some-change /paddle/demo/mnist/train.py |
| 95 | + docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory. |
| 96 | + ``` |
| 97 | + |
| 98 | +1. Build the source code |
| 99 | + |
| 100 | + ```bash |
| 101 | + docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev |
| 102 | + ``` |
| 103 | + |
| 104 | + This command maps the source directory on the host into `/paddle` in the container. |
| 105 | + |
| 106 | + Please be aware that the default entrypoint of `paddle:dev` is a shell script file `build.sh`, which builds the source code, and outputs to `/paddle/build` in the container, which is actually `$PWD/build` on the host. |
| 107 | + |
| 108 | + `build.sh` doesn't only build binaries, but also generates a `$PWD/build/Dockerfile` file, which can be used to build the production image. We will talk about it later. |
| 109 | + |
| 110 | +1. Run on the host (Not recommended) |
| 111 | + |
| 112 | + If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program. But the recommended way is to running in a production image. |
| 113 | + |
| 114 | +1. Run in the development container |
| 115 | + |
| 116 | + `build.sh` generates binary files and invokes `make install`. So we can run the built program within the development container. This is convenient for developers. |
| 117 | + |
| 118 | +1. Build a production image |
| 119 | + |
| 120 | + On the host, we can use the `$PWD/build/Dockerfile` to generate a production image. |
| 121 | + |
| 122 | + ```bash |
| 123 | + docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile . |
31 | 124 | ```
|
32 | 125 |
|
33 |
| - 其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。 |
| 126 | +1. Run the Paddle Book |
| 127 | + |
| 128 | + Once we have the production image, we can run [Paddle Book](http://book.paddlepaddle.org/) chapters in Jupyter Notebooks (if we chose to build them) |
34 | 129 |
|
| 130 | + ```bash |
| 131 | + docker run -it paddle |
| 132 | + ``` |
| 133 | + |
| 134 | + Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book. |
| 135 | + |
| 136 | +1. Run on Kubernetes |
| 137 | + |
| 138 | + We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster: |
| 139 | + |
| 140 | + ```bash |
| 141 | + docker tag paddle me/paddle |
| 142 | + docker push |
| 143 | + kubectl ... |
| 144 | + ``` |
35 | 145 |
|
36 |
| -曾经的讨论背景: |
37 |
| -["PR 1599"](https://github.com/PaddlePaddle/Paddle/pull/1599) |
38 |
| -["PR 1598"](https://github.com/PaddlePaddle/Paddle/pull/1598) |
| 146 | + For end users, we will provide more convinient tools to run distributed jobs. |
0 commit comments