Improve the design doc of Docker build

王益 · 王益 · commit db045acb2197 · 2017-03-15T17:35:49.000-07:00
diff --git a/paddle/scripts/docker/README.md b/paddle/scripts/docker/README.md
@@ -1,38 +1,146 @@
-因为我们不提供非Ubuntu的bulid支持，所以如果用户用其他操作系统，比如CoreOS、CentOS、MacOS X、Windows，开发都得在docker里。所以需要能build本地修改后的代码。
+We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md.
 
-我们可能需要两个 Docker images：
+I am recording some ideas here, and we should file a PR later.
 
-1. development image：不包括源码，但是包括开发环境（预先安装好各种工具），也就是说Dockerfile.dev里既不需要  COPY 也不需要 RUN git clone。虽然这个image和源码无关，但是不同版本的源码需要依赖不同的第三方库，所以这个image的tag里还是要包含git branch/tag name，比如叫做 `paddlepaddle/paddle:dev-0.10.0rc1`，这里的0.10.0.rc1是一个branch name，其中rc是release candidate的意思。正是发布之后就成了master branch里的一个tag，叫做0.10.0。
+## Current Status
 
-1. production image： 不包括编译环境，也不包括源码，只包括build好的libpaddle.so和必要的Python packages，用于在Kubernetes机群上跑应用的image。比如叫做 `paddlepaddle/paddle:0.10.0rc1`。
+Currently, we have four sets of Dockefiles:
 
-从1.生成2.的过程如下：
+1. Kubernetes examples:
 
-1. 在本机（host）上开发。假设源码位于 `~/work/paddle`。
+   ```
+   doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh
+   doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh
+   doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one.
+   ```
+
+1. Generate .deb packages:
+
+   ```
+   paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory
+   ```
+
+1. In the `docker` directory:
+
+   ```
+   paddle/scripts/docker/Dockerfile
+   paddle/scripts/docker/Dockerfile.gpu
+   ```
+
+1. Document building
+
+   ```
+   paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets.
+   ```
+
+## Goal
+
+We want two Docker images for each version of PaddlePaddle:
+
+1. `paddle:<version>-dev`
+
+   This a development image contains only the development tools.  This standardizes the building tools and procedure.  Users include:
+
+   - developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
+   - release engineers -- use this to build the official release from certain branch/tag on Github.com.
+   - document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code.  We need tools to extract the information, typeset, and generate Web pages.
+
+   So the development image must contain not only source code building tools, but also documentation tools:
+
+   - gcc/clang
+   - nvcc
+   - Python
+   - sphinx
+   - woboq
+   - sshd
+
+   where `sshd` makes it easy for developers to have multiple terminals connecting into the container.
+
+1. `paddle:<version>`
+
+   This is the production image, generated using the development image. This image might have multiple variants:
+
+   - GPU/AVX   `paddle:<version>-gpu`
+   - GPU/no-AVX  `paddle:<version>-gpu-noavx`
+   - no-GPU/AVX  `paddle:<version>`
+   - no-GPU/no-AVX  `paddle:<version>-noavx`
+
+   We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version.
+
+   We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.
+
+## Dockerfile
+
+To realize above goals, we need only one Dockerfile for the development image.  We can put it in the root source directory.
+
+Let us go over our daily development procedure to show how developers can use this file.
+
+1. Check out the source code
 
-1. 用dev image build 我们的源码：
    ```bash
-   docker run -it -p 2022:22 -v $PWD:/paddle paddlepaddle/paddle:dev-0.10.0rc1  /paddle/build.sh
-   ```  
-   注意，这里的 `-v ` 参数把host上的源码目录里的内容映射到了container里的`/paddle` 目录；而container里的 `/paddle/build.sh` 就是源码目录里的 `build.sh`。上述命令调用了本地源码中的 bulid.sh 来build了本地源码，结果在container里的 `/paddle/build` 目录里，也就是本地的源码目录里的 `build` 子目录。
+   git clone https://github.com/PaddlePaddle/Paddle paddle
+   ```
+
+1. Do something
 
-1. 我们希望上述 `build.sh` 脚本在 `build` 子目录里生成一个Dockerfile，使得我们可以运行：
    ```bash
-   docker build -t paddle  ./build
+   cd paddle
+   git checkout -b my_work
+   Edit some files
    ```
-   来生成我们的production image。
-   
-1. 有了这个production image之后，我们可能会希望docker push 到dockerhub.com的我们自己的名下，然后可以用来启动本地或者远程（Kubernetes）jobs：
+
+1. Build/update the development image (if not yet)
 
    ```bash
-   docker tag paddle yiwang/paddle:did-some-change
-   docker push
-   paddlectl run yiwang/paddle:did-some-change /paddle/demo/mnist/train.py
+   docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory.
+   ```
+
+1. Build the source code
+
+   ```bash
+   docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev
+   ```
+
+   This command maps the source directory on the host into `/paddle` in the container.
+
+   Please be aware that the default entrypoint of `paddle:dev` is a shell script file `build.sh`, which builds the source code, and outputs to `/paddle/build` in the container, which is actually `$PWD/build` on the host.
+
+   `build.sh` doesn't only build binaries, but also generates a `$PWD/build/Dockerfile` file, which can be used to build the production image.  We will talk about it later.
+
+1. Run on the host (Not recommended)
+
+   If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program.  But the recommended way is to running in a production image.
+
+1. Run in the development container
+
+   `build.sh` generates binary files and invokes `make install`.  So we can run the built program within the development container.  This is convenient for developers.
+
+1. Build a production image
+
+    On the host, we can use the `$PWD/build/Dockerfile` to generate a production image.
+
+   ```bash
+   docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile .
    ```
 
-   其中 paddlectl 应该是我们自己写的一个脚本，调用kubectl来在Kubernetes机群上启动一个job的。
+1. Run the Paddle Book
+
+   Once we have the production image, we can run [Paddle Book](http://book.paddlepaddle.org/) chapters in Jupyter Notebooks (if we chose to build them)
 
+   ```bash
+   docker run -it paddle
+   ```
+
+   Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book.
+
+1. Run on Kubernetes
+
+   We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:
+
+   ```bash
+   docker tag paddle me/paddle
+   docker push
+   kubectl ...
+   ```
 
-曾经的讨论背景：   
-["PR 1599"](https://github.com/PaddlePaddle/Paddle/pull/1599)  
-["PR 1598"](https://github.com/PaddlePaddle/Paddle/pull/1598)
+   For end users, we will provide more convinient tools to run distributed jobs.