Skip to content

Commit 313f845

Browse files
authored
Merge pull request #5430 from typhoonzero/enable_manylinux
Enable manylinux builds
2 parents 5f9f990 + 79e50c1 commit 313f845

File tree

3 files changed

+164
-134
lines changed

3 files changed

+164
-134
lines changed

paddle/scripts/deb/postinst

Lines changed: 0 additions & 6 deletions
This file was deleted.

paddle/scripts/docker/README.md

Lines changed: 128 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -2,178 +2,198 @@
22

33
## Goals
44

5-
We want the building procedure generates Docker images so that we can run PaddlePaddle applications on Kubernetes clusters.
5+
We want to make the building procedures:
66

7-
We want to build .deb packages so that enterprise users can run PaddlePaddle applications without Docker.
7+
1. Static, can reproduce easily.
8+
1. Generate python `whl` packages that can be widely use cross many distributions.
9+
1. Build different binaries per release to satisfy different environments:
10+
- Binaries for different CUDA and CUDNN versions, like CUDA 7.5, 8.0, 9.0
11+
- Binaries containing only capi
12+
- Binaries for python with wide unicode support or not.
13+
1. Build docker images with PaddlePaddle pre-installed, so that we can run
14+
PaddlePaddle applications directly in docker or on Kubernetes clusters.
815

9-
We want to minimize the size of generated Docker images and .deb packages so to reduce the download time.
16+
To achieve this, we created a repo: https://github.com/PaddlePaddle/buildtools
17+
which gives several docker images that are `manylinux1` sufficient. Then we
18+
can build PaddlePaddle using these images to generate corresponding `whl`
19+
binaries.
1020

11-
We want to encapsulate building tools and dependencies in a *development* Docker image so to ease the tools installation for developers.
21+
## Run The Build
1222

13-
Developers use various editors (emacs, vim, Eclipse, Jupyter Notebook), so the development Docker image contains only building tools, not editing tools, and developers are supposed to git clone source code into their development computers and map the code into the development container.
23+
### Build Evironments
1424

15-
We want the procedure and tools also work with testing, continuous integration, and releasing.
25+
The pre-built build environment images are:
1626

27+
| Image | Tag |
28+
| ----- | --- |
29+
| paddlepaddle/paddle_manylinux_devel | cuda7.5_cudnn5 |
30+
| paddlepaddle/paddle_manylinux_devel | cuda8.0_cudnn5 |
31+
| paddlepaddle/paddle_manylinux_devel | cuda7.5_cudnn7 |
32+
| paddlepaddle/paddle_manylinux_devel | cuda9.0_cudnn7 |
1733

18-
## Docker Images
19-
20-
So we need two Docker images for each version of PaddlePaddle:
21-
22-
1. `paddle:<version>-dev`
23-
24-
This a development image contains only the development tools and standardizes the building procedure. Users include:
34+
### Start Build
2535

26-
- developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
27-
- release engineers -- use this to build the official release from certain branch/tag on Github.com.
28-
- document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages.
36+
Choose one docker image that suit your environment and run the following
37+
command to start a build:
2938

30-
Of course, developers can install building tools on their development computers. But different versions of PaddlePaddle might require different set or version of building tools. Also, it makes collaborative debugging easier if all developers use a unified development environment.
31-
32-
The development image should include the following tools:
33-
34-
- gcc/clang
35-
- nvcc
36-
- Python
37-
- sphinx
38-
- woboq
39-
- sshd
39+
```bash
40+
git clone https://github.com/PaddlePaddle/Paddle.git
41+
cd Paddle
42+
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TESTING=OFF" -e "RUN_TEST=OFF" -e "PYTHON_ABI=cp27-cp27mu" paddlepaddle/paddle_manylinux_devel /paddle/paddle/scripts/docker/build.sh
43+
```
4044

41-
Many developers work on a remote computer with GPU; they could ssh into the computer and `docker exec` into the development container. However, running `sshd` in the container allows developers to ssh into the container directly.
45+
After the build finishes, you can get output `whl` package under
46+
`build/python/dist`.
4247

43-
1. `paddle:<version>`
48+
This command mounts the source directory on the host into `/paddle` in the container, then run the build script `/paddle/paddle/scripts/docker/build.sh`
49+
in the container. When it writes to `/paddle/build` in the container, it writes to `$PWD/build` on the host indeed.
4450

45-
This is the production image, generated using the development image. This image might have multiple variants:
51+
### Build Options
4652

47-
- GPU/AVX `paddle:<version>-gpu`
48-
- GPU/no-AVX `paddle:<version>-gpu-noavx`
49-
- no-GPU/AVX `paddle:<version>`
50-
- no-GPU/no-AVX `paddle:<version>-noavx`
53+
Users can specify the following Docker build arguments with either "ON" or "OFF" value:
5154

52-
We allow users to choose between GPU and no-GPU because the GPU version image is much larger than then the no-GPU version.
55+
| Option | Default | Description |
56+
| ------ | -------- | ----------- |
57+
| `WITH_GPU` | OFF | Generates NVIDIA CUDA GPU code and relies on CUDA libraries. |
58+
| `WITH_AVX` | OFF | Set to "ON" to enable AVX support. |
59+
| `WITH_TESTING` | ON | Build unit tests binaries. |
60+
| `WITH_MKLDNN` | ON | Build with [Intel® MKL DNN](https://github.com/01org/mkl-dnn) support. |
61+
| `WITH_MKLML` | ON | Build with [Intel® MKL](https://software.intel.com/en-us/mkl) support. |
62+
| `WITH_GOLANG` | ON | Build fault-tolerant parameter server written in go. |
63+
| `WITH_SWIG_PY` | ON | Build with SWIG python API support. |
64+
| `WITH_C_API` | OFF | Build capi libraries for inference. |
65+
| `WITH_PYTHON` | ON | Build with python support. Turn this off if build is only for capi. |
66+
| `WITH_STYLE_CHECK` | ON | Check the code style when building. |
67+
| `PYTHON_ABI` | "" | Build for different python ABI support, can be cp27-cp27m or cp27-cp27mu |
68+
| `RUN_TEST` | OFF | Run unit test immediently after the build. |
69+
| `WITH_DOC` | OFF | Build docs after build binaries. |
70+
| `WOBOQ` | OFF | Generate WOBOQ code viewer under `build/woboq_out` |
5371

54-
We allow users the choice between AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.
5572

73+
## Docker Images
5674

57-
## Development Environment
75+
You can get the latest PaddlePaddle docker images by
76+
`docker pull paddlepaddle/paddle:<version>` or build one by yourself.
5877

59-
Here we describe how to use above two images. We start from considering our daily development environment.
78+
### Official Docker Releases
6079

61-
Developers work on a computer, which is usually a laptop or desktop:
80+
Official docker images at
81+
[here](https://hub.docker.com/r/paddlepaddle/paddle/tags/),
82+
you can choose either latest or images with a release tag like `0.10.0`,
83+
Currently available tags are:
6284

63-
<img src="doc/paddle-development-environment.png" width=500 />
85+
| Tag | Description |
86+
| ------ | --------------------- |
87+
| latest | latest CPU only image |
88+
| latest-gpu | latest binary with GPU support |
89+
| 0.10.0 | release 0.10.0 CPU only binary image |
90+
| 0.10.0-gpu | release 0.10.0 with GPU support |
6491

65-
or, they might rely on a more sophisticated box (like with GPUs):
92+
### Build Your Own Image
6693

67-
<img src="doc/paddle-development-environment-gpu.png" width=500 />
94+
Build PaddlePaddle docker images are quite simple since PaddlePaddle can
95+
be installed by just running `pip install`. A sample `Dockerfile` is:
6896

69-
A principle here is that source code lies on the development computer (host) so that editors like Eclipse can parse the source code to support auto-completion.
97+
```dockerfile
98+
FROM nvidia/cuda:7.5-cudnn5-runtime-centos6
99+
RUN yum install -y centos-release-SCL
100+
RUN yum install -y python27
101+
# This whl package is generated by previous build steps.
102+
ADD python/dist/paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl /
103+
RUN pip install /paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl && rm -f /*.whl
104+
```
70105

106+
Then build the image by running `docker build -t [REPO]/paddle:[TAG] .` under
107+
the directory containing your own `Dockerfile`.
71108

72-
## Usages
109+
- NOTE: note that you can choose different base images for your environment, you can find all the versions [here](https://hub.docker.com/r/nvidia/cuda/).
73110

74-
### Build the Development Docker Image
111+
### Use Docker Images
75112

76-
The following commands check out the source code to the host and build the development image `paddle:dev`:
113+
Suppose that you have written an application program `train.py` using
114+
PaddlePaddle, we can test and run it using docker:
77115

78116
```bash
79-
git clone https://github.com/PaddlePaddle/Paddle paddle
80-
cd paddle
81-
docker build -t paddle:dev .
117+
docker run --rm -it -v $PWD:/work paddlepaddle/paddle /work/a.py
82118
```
83119

84-
The `docker build` command assumes that `Dockerfile` is in the root source tree. Note that in this design, this `Dockerfile` is this only one in our repo.
85-
86-
Users can specify a Ubuntu mirror server for faster downloading:
87-
88-
```bash
89-
docker build -t paddle:dev --build-arg UBUNTU_MIRROR=mirror://mirrors.ubuntu.com/mirrors.txt .
90-
```
120+
But this works only if all dependencies of `train.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
91121

92-
### Build PaddlePaddle from Source Code
122+
### Run PaddlePaddle Book In Docker
93123

94-
Given the development image `paddle:dev`, the following command builds PaddlePaddle from the source tree on the development computer (host):
124+
Our [book repo](https://github.com/paddlepaddle/book) also provide a docker
125+
image to start a jupiter notebook inside docker so that you can run this book
126+
using docker:
95127

96128
```bash
97-
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TESTING=OFF" -e "RUN_TEST=OFF" paddle:dev
129+
docker run -d -p 8888:8888 paddlepaddle/book
98130
```
99131

100-
This command mounts the source directory on the host into `/paddle` in the container, so the default entry point of `paddle:dev`, `build.sh`, could build the source code with possible local changes. When it writes to `/paddle/build` in the container, it writes to `$PWD/build` on the host indeed.
101-
102-
`build.sh` builds the following:
103-
104-
- PaddlePaddle binaries,
105-
- `$PWD/build/paddle-<version>.deb` for production installation, and
106-
- `$PWD/build/Dockerfile`, which builds the production Docker image.
132+
Please refer to https://github.com/paddlepaddle/book if you want to build this
133+
docker image by your self.
107134

108-
Users can specify the following Docker build arguments with either "ON" or "OFF" value:
109-
- `WITH_GPU`: ***Required***. Generates NVIDIA CUDA GPU code and relies on CUDA libraries.
110-
- `WITH_AVX`: ***Required***. Set to "OFF" prevents from generating AVX instructions. If you don't know what is AVX, you might want to set "ON".
111-
- `WITH_TEST`: ***Optional, default OFF***. Build unit tests binaries. Once you've built the unit tests, you can run these test manually by the following command:
112-
```bash
113-
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" paddle:dev sh -c "cd /paddle/build; make coverall"
114-
```
115-
- `RUN_TEST`: ***Optional, default OFF***. Run unit tests after building. You can't run unit tests without building it.
135+
### Run Distributed Applications
116136

117-
### Build the Production Docker Image
137+
In our [API design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/api.md#distributed-training), we proposed an API that starts a distributed training job on a cluster. This API need to build a PaddlePaddle application into a Docker image as above and calls kubectl to run it on the cluster. This API might need to generate a Dockerfile look like above and call `docker build`.
118138

119-
The following command builds the production image:
139+
Of course, we can manually build an application image and launch the job using the kubectl tool:
120140

121141
```bash
122-
docker build -t paddle -f build/Dockerfile ./build
142+
docker build -f some/Dockerfile -t myapp .
143+
docker tag myapp me/myapp
144+
docker push
145+
kubectl ...
123146
```
124147

125-
This production image is minimal -- it includes binary `paddle`, the shared library `libpaddle.so`, and Python runtime.
148+
## Docker Images for Developers
126149

127-
### Run PaddlePaddle Applications
150+
We have a special docker image for developers:
151+
`paddlepaddle/paddle:<version>-dev`. This image is also generated from
152+
https://github.com/PaddlePaddle/buildtools
128153

129-
Again the development happens on the host. Suppose that we have a simple application program in `a.py`, we can test and run it using the production image:
154+
This a development image contains only the
155+
development tools and standardizes the building procedure. Users include:
130156

131-
```bash
132-
docker run --rm -it -v $PWD:/work paddle /work/a.py
133-
```
157+
- developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
158+
- release engineers -- use this to build the official release from certain branch/tag on Github.com.
159+
- document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages.
134160

135-
But this works only if all dependencies of `a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
161+
Of course, developers can install building tools on their development computers. But different versions of PaddlePaddle might require different set or version of building tools. Also, it makes collaborative debugging easier if all developers use a unified development environment.
136162

137-
### Build and Run PaddlePaddle Applications
163+
The development image contains the following tools:
138164

139-
We need a Dockerfile in https://github.com/paddlepaddle/book that builds Docker image `paddlepaddle/book:<version>`, basing on the PaddlePaddle production image:
165+
- gcc/clang
166+
- nvcc
167+
- Python
168+
- sphinx
169+
- woboq
170+
- sshd
140171

141-
```
142-
FROM paddlepaddle/paddle:<version>
143-
RUN pip install -U matplotlib jupyter ...
144-
COPY . /book
145-
EXPOSE 8080
146-
CMD ["jupyter"]
147-
```
172+
Many developers work on a remote computer with GPU; they could ssh into the computer and `docker exec` into the development container. However, running `sshd` in the container allows developers to ssh into the container directly.
148173

149-
The book image is an example of PaddlePaddle application image. We can build it
150174

151-
```bash
152-
git clone https://github.com/paddlepaddle/book
153-
cd book
154-
docker build -t book .
155-
```
175+
### Development Workflow
156176

157-
### Build and Run Distributed Applications
177+
Here we describe how the workflow goes on. We start from considering our daily development environment.
158178

159-
In our [API design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/api.md#distributed-training), we proposed an API that starts a distributed training job on a cluster. This API need to build a PaddlePaddle application into a Docker image as above and calls kubectl to run it on the cluster. This API might need to generate a Dockerfile look like above and call `docker build`.
179+
Developers work on a computer, which is usually a laptop or desktop:
160180

161-
Of course, we can manually build an application image and launch the job using the kubectl tool:
181+
<img src="doc/paddle-development-environment.png" width=500 />
162182

163-
```bash
164-
docker build -f some/Dockerfile -t myapp .
165-
docker tag myapp me/myapp
166-
docker push
167-
kubectl ...
168-
```
183+
or, they might rely on a more sophisticated box (like with GPUs):
184+
185+
<img src="doc/paddle-development-environment-gpu.png" width=500 />
186+
187+
A principle here is that source code lies on the development computer (host) so that editors like Eclipse can parse the source code to support auto-completion.
169188

170189
### Reading source code with woboq codebrowser
190+
171191
For developers who are interested in the C++ source code, please use -e "WOBOQ=ON" to enable the building of C++ source code into HTML pages using [Woboq codebrowser](https://github.com/woboq/woboq_codebrowser).
172192

173193
- The following command builds PaddlePaddle, generates HTML pages from C++ source code, and writes HTML pages into `$HOME/woboq_out` on the host:
174194

175195
```bash
176-
docker run -v $PWD:/paddle -v $HOME/woboq_out:/woboq_out -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=ON" -e "WOBOQ=ON" paddle:dev
196+
docker run -v $PWD:/paddle -v $HOME/woboq_out:/woboq_out -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=ON" -e "WOBOQ=ON" paddlepaddle/paddle:latest-dev
177197
```
178198

179199
- You can open the generated HTML files in your Web browser. Or, if you want to run a Nginx container to serve them for a wider audience, you can run:

0 commit comments

Comments
 (0)