Skip to content

Commit 6266f89

Browse files
authored
Add deepspeed in docker (#3829)
* enable deepspeed in docker and update the llm readme accordingly * update compile_bundle and README.md remove DOCKER_BUILDKIT * Update Dockerfile remove BUILDKIT comments * Update env_setup.sh fix llm_eval package name * Update env_setup.sh set llm_eval version by dependency_version.yml * Update dependency_version.yml add llm_eval version 0.3.0 * update Dockerfile.compile from compile_bundle_main * add ccl in compile bundle, copy from compile_bundle_main
1 parent 6384575 commit 6266f89

File tree

8 files changed

+201
-240
lines changed

8 files changed

+201
-240
lines changed

dependency_version.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ transformers:
2828
commit: v4.31.0
2929
protobuf:
3030
version: 3.20.3
31+
llm_eval:
32+
version: 0.3.0
3133
basekit:
3234
dpcpp-cpp-rt:
3335
version: 2024.0.0

docker/Dockerfile.compile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ RUN cp ./intel-extension-for-pytorch/scripts/compile_bundle.sh ./ && \
4949
sed -i "s/VER_IPEX=.*/VER_IPEX=/" compile_bundle.sh
5050
RUN . ./miniconda3/bin/activate && \
5151
conda create -y -n compile_py310 python=3.10 && conda activate compile_py310 && \
52-
bash compile_bundle.sh /opt/intel/oneapi/compiler/latest /opt/intel/oneapi/mkl/latest pvc,ats-m150,acm-g11 && \
52+
bash compile_bundle.sh /opt/intel/oneapi/compiler/latest /opt/intel/oneapi/mkl/latest /opt/intel/oneapi/ccl/latest pvc,ats-m150,acm-g11 && \
5353
mkdir wheels && cp pytorch/dist/*.whl vision/dist/*.whl audio/dist/*.whl intel-extension-for-pytorch/dist/*.whl ./wheels
5454

5555
FROM base AS deploy

examples/gpu/inference/python/llm/Dockerfile

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,7 @@
1-
# NOTE: To build this you will need a docker version >= 19.03 and DOCKER_BUILDKIT=1
2-
#
3-
# If you do not use buildkit you are not going to have a good time
4-
#
5-
# For reference:
6-
# https://docs.docker.com/develop/develop-images/build_enhancements/
71

82
ARG BASE_IMAGE=ubuntu:22.04
93
FROM ${BASE_IMAGE} AS base
4+
SHELL ["/bin/bash", "-c"]
105
RUN if [ -f /etc/apt/apt.conf.d/proxy.conf ]; then rm /etc/apt/apt.conf.d/proxy.conf; fi && \
116
if [ ! -z ${HTTP_PROXY} ]; then echo "Acquire::http::Proxy \"${HTTP_PROXY}\";" >> /etc/apt/apt.conf.d/proxy.conf; fi && \
127
if [ ! -z ${HTTPS_PROXY} ]; then echo "Acquire::https::Proxy \"${HTTPS_PROXY}\";" >> /etc/apt/apt.conf.d/proxy.conf; fi

examples/gpu/inference/python/llm/README.md

Lines changed: 46 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -27,84 +27,75 @@ Here you can find the inference benchmarking scripts for large language models (
2727

2828
## Environment Setup
2929

30-
1. Get the Intel® Extension for PyTorch\* source code:
30+
31+
### [Recommended] Docker-based environment setup with compilation from source
32+
33+
3134

3235
```bash
36+
# Get the Intel® Extension for PyTorch* source code
3337
git clone https://github.com/intel/intel-extension-for-pytorch.git
3438
cd intel-extension-for-pytorch
35-
git checkout v2.1.10+xpu
39+
git checkout v2.1.20+xpu
3640
git submodule sync
3741
git submodule update --init --recursive
38-
```
39-
40-
2. Do one of the following:
4142

42-
If you are planning to use DeepSpeed for execution, please use a bare-metal environment directly and follow 2.b session for the environment setup. Otherwise, we recommend you follow 2.a session with Docker, where the environment is already configured.
43-
44-
a. (Recommended) Build a Docker container from the provided `Dockerfile` for single-instance executions.
43+
# Build an image with the provided Dockerfile by compiling Intel® Extension for PyTorch* from source
44+
docker build -f examples/gpu/inference/python/llm/Dockerfile --build-arg GID_RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,') --build-arg COMPILE=ON -t ipex-llm:2.1.20 .
4545

46-
```bash
47-
# Build an image with the provided Dockerfile by compiling Intel® Extension for PyTorch* from source
48-
DOCKER_BUILDKIT=1 docker build -f examples/gpu/inference/python/llm/Dockerfile --build-arg GID_RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,') --build-arg COMPILE=ON -t ipex-llm:2.1.10 .
46+
# Build an image with the provided Dockerfile by installing from Intel® Extension for PyTorch* prebuilt wheel files
47+
docker build -f examples/gpu/inference/python/llm/Dockerfile --build-arg GID_RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,') -t ipex-llm:2.1.20 .
4948

50-
# Build an image with the provided Dockerfile by installing from Intel® Extension for PyTorch* prebuilt wheel files
51-
DOCKER_BUILDKIT=1 docker build -f examples/gpu/inference/python/llm/Dockerfile --build-arg GID_RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,') -t ipex-llm:2.1.10 .
49+
# Run the container with command below
50+
docker run --privileged -it --rm --device /dev/dri:/dev/dri -v /dev/dri/by-path:/dev/dri/by-path \
51+
--ipc=host --net=host --cap-add=ALL -v /lib/modules:/lib/modules --workdir /workspace \
52+
--volume `pwd`/examples/gpu/inference/python/llm/:/workspace/llm ipex-llm:2.1.20 /bin/bash
5253

53-
# Run the container with command below
54-
docker run --rm -it --privileged --device=/dev/dri --ipc=host ipex-llm:2.1.10 bash
5554

56-
# When the command prompt shows inside the docker container, enter llm examples directory
57-
cd llm
58-
```
59-
b. Alternatively, use the provided environment configuration script to set up environment without using a docker container:
55+
# When the command prompt shows inside the docker container, enter llm examples directory
56+
cd llm
6057

61-
Make sure the driver and Base Toolkit are installed without using a docker container. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.10%2Bxpu&os=linux%2Fwsl2&package=source).
62-
63-
OneCCL is also required if you run with DeepSpeed. We recommend to use apt/yum/dnf to install the oneCCL package. Refer to [Base Toolkit Installation](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) for adding the APT/YUM/DNF key and sources for first-time users.
58+
# Activate environment variables
59+
source ./tools/env_activate.sh
60+
```
6461

65-
Example command:
62+
### Conda-based environment setup with compilation from source
6663

67-
```bash
68-
sudo apt install intel-oneapi-ccl-devel=2021.11.1-6
69-
sudo yum install intel-oneapi-ccl-devel=2021.11.1-6
70-
sudo dnf install intel-oneapi-ccl-devel=2021.11.1-6
71-
```
64+
Make sure the driver and Base Toolkit are installed without using a docker container. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.10%2Bxpu&os=linux%2Fwsl2&package=source).
7265

7366

74-
```bash
75-
# Make sure you have GCC >= 11 is installed on your system.
76-
# Create a conda environment
77-
conda create -n llm python=3.10 -y
78-
conda activate llm
7967

80-
# Setup the environment with the provided script
81-
cd examples/gpu/inference/python/llm
82-
# If you want to install Intel® Extension for PyTorch\* from prebuilt wheel files, use the command below:
83-
bash ./tools/env_setup.sh 7
84-
# If you want to install Intel® Extension for PyTorch\* from source, use the commands below:
85-
bash ./tools/env_setup.sh 3 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <AOT>
86-
export LD_PRELOAD=$(bash ../../../../../tools/get_libstdcpp_lib.sh)
87-
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
88-
source <DPCPP_ROOT>/env/vars.sh
89-
source <ONEMKL_ROOT>/env/vars.sh
90-
source <ONECCL_ROOT>/env/vars.sh
91-
source <MPI_ROOT>/env/vars.sh
92-
```
93-
where <br />
94-
- `DPCPP_ROOT` is the path to the DPC++ compiler. By default, it is `/opt/intel/oneapi/compiler/latest`.<br />
95-
- `ONEMKL_ROOT` is the path to oneMKL. By default, it is `/opt/intel/oneapi/mkl/latest`.<br />
96-
- `ONECCL_ROOT` is the path to oneCCL. By default, it is `/opt/intel/oneapi/ccl/latest`.<br />
97-
- `MPI_ROOT` is the path to oneAPI MPI library. By default, it is `/opt/intel/oneapi/mpi/latest`.<br />
98-
- `AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. Check [tutorial](../../../../../docs/tutorials/technical_details/AOT.md) for details.<br />
68+
```bash
9969

100-
3. Set necessary environment variables with the environment variables activation script.
70+
# Get the Intel® Extension for PyTorch* source code
71+
git clone https://github.com/intel/intel-extension-for-pytorch.git
72+
cd intel-extension-for-pytorch
73+
git checkout v2.1.20+xpu
74+
git submodule sync
75+
git submodule update --init --recursive
10176

102-
```bash
103-
# Activate environment variables
77+
# Make sure you have GCC >= 11 is installed on your system.
78+
# Create a conda environment
79+
conda create -n llm python=3.10 -y
80+
conda activate llm
81+
conda install pkg-config
82+
# Setup the environment with the provided script
83+
cd examples/gpu/inference/python/llm
84+
# If you want to install Intel® Extension for PyTorch\* from prebuilt wheel files, use the command below:
85+
bash ./tools/env_setup.sh 7
86+
# If you want to install Intel® Extension for PyTorch\* from source, use the commands below:
87+
bash ./tools/env_setup.sh 3 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <AOT>
88+
export LD_PRELOAD=$(bash ../../../../../tools/get_libstdcpp_lib.sh)
89+
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
10490
source ./tools/env_activate.sh
91+
10592
```
10693

94+
where <br />
95+
- `AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. Check [tutorial](../../../../../docs/tutorials/technical_details/AOT.md) for details.<br />
10796

97+
98+
10899
## Run Models Generation
109100

110101
| Benchmark mode | FP16 | Weight only quantization INT4 |
@@ -141,8 +132,6 @@ bash run_benchmark_ds.sh
141132
```
142133

143134
```bash
144-
# distributed env setting
145-
source ${ONECCL_ROOT}/env/setvars.sh
146135
# fp16 benchmark
147136
mpirun -np 2 --prepend-rank python -u run_generation_with_deepspeed.py --benchmark -m ${model} --num-beams ${beam} --num-iter ${iter} --batch-size ${bs} --input-tokens ${input} --max-new-tokens ${output} --device xpu --ipex --dtype float16 --token-latency
148137
```
@@ -190,9 +179,6 @@ LLM_ACC_TEST=1 python -u run_generation.py -m ${model} --ipex --dtype float16 --
190179
### Distributed Accuracy with DeepSpeed
191180

192181
```bash
193-
# Run distributed accuracy with 2 ranks of one node for float16 with ipex
194-
source ${ONECCL_ROOT}/env/setvars.sh
195-
196182
# one-click bash script
197183
bash run_accuracy_ds.sh
198184

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
11
#!/bin/bash
22

3+
ONEAPI_ROOT=${ONEAPI_ROOT:-/opt/intel/oneapi}
4+
if test -f ${ONEAPI_ROOT}/setvars.sh ; then
5+
source ${ONEAPI_ROOT}/setvars.sh
6+
else
7+
export LD_LIBRARY_PATH=/opt/intel/oneapi/redist/opt/mpi/libfabric/lib:$LD_LIBRARY_PATH
8+
export PATH=/opt/intel/oneapi/redist/bin:$PATH
9+
export I_MPI_ROOT=/opt/intel/oneapi/redist/lib
10+
export CCL_ROOT=/opt/intel/oneapi/redist
11+
export FI_PROVIDER_PATH=/opt/intel/oneapi/redist/opt/mpi/libfabric/lib/prov
12+
fi
13+
314
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2
15+
export ENABLE_SDP_FUSION=1
16+
export TORCH_LLM_ALLREDUCE=1
17+
18+

examples/gpu/inference/python/llm/tools/env_setup.sh

Lines changed: 7 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ fi
5050
# Save current directory path
5151
BASEFOLDER=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
5252
WHEELFOLDER=${BASEFOLDER}/../wheels
53-
TORCH_INSTALL_SCRIPT=${WHEELFOLDER}/torch_install.sh
5453
AUX_INSTALL_SCRIPT=${WHEELFOLDER}/aux_install.sh
5554
cd ${BASEFOLDER}/..
5655

@@ -99,6 +98,7 @@ if [ $((${MODE} & 0x02)) -ne 0 ]; then
9998
VER_TORCH=$(python tools/yaml_utils.py -f dependency_version.yml -d pytorch -k version)
10099
TRANSFORMERS_COMMIT=$(python tools/yaml_utils.py -f dependency_version.yml -d transformers -k commit)
101100
VER_PROTOBUF=$(python tools/yaml_utils.py -f dependency_version.yml -d protobuf -k version)
101+
VER_LLM_EVAL=$(python tools/yaml_utils.py -f dependency_version.yml -d llm_eval -k version)
102102
VER_IPEX_MAJOR=$(grep "VERSION_MAJOR" version.txt | cut -d " " -f 2)
103103
VER_IPEX_MINOR=$(grep "VERSION_MINOR" version.txt | cut -d " " -f 2)
104104
VER_IPEX_PATCH=$(grep "VERSION_PATCH" version.txt | cut -d " " -f 2)
@@ -122,114 +122,29 @@ if [ $((${MODE} & 0x02)) -ne 0 ]; then
122122
conda install -y cmake ninja
123123

124124
echo "#!/bin/bash" > ${AUX_INSTALL_SCRIPT}
125-
echo "#!/bin/bash" > ${TORCH_INSTALL_SCRIPT}
126125
if [ $((${MODE} & 0x04)) -ne 0 ]; then
127-
echo "python -m pip install torch==${VER_TORCH} intel-extension-for-pytorch==${VER_IPEX} oneccl-bind-pt==${VER_TORCHCCL} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/" >> ${TORCH_INSTALL_SCRIPT}
126+
echo "python -m pip install torch==${VER_TORCH} intel-extension-for-pytorch==${VER_IPEX} oneccl-bind-pt==${VER_TORCHCCL} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/" >> ${AUX_INSTALL_SCRIPT}
128127
python -m pip install torch==${VER_TORCH} intel-extension-for-pytorch==${VER_IPEX} oneccl-bind-pt==${VER_TORCHCCL} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
129128
else
130129
if [ ! -f ${ONECCL_ROOT}/env/vars.sh ]; then
131130
echo "oneCCL environment ${ONECCL_ROOT} doesn't seem to exist."
132131
exit 6
133132
fi
134-
ONEAPIROOT=${ONEMKL_ROOT}/../..
135133

136134
# Install PyTorch and Intel® Extension for PyTorch*
137135
cp intel-extension-for-pytorch/scripts/compile_bundle.sh .
138136
sed -i "s/VER_IPEX=.*/VER_IPEX=/" compile_bundle.sh
139-
bash compile_bundle.sh ${DPCPP_ROOT} ${ONEMKL_ROOT} ${AOT} 0
137+
bash compile_bundle.sh ${DPCPP_ROOT} ${ONEMKL_ROOT} ${ONECCL_ROOT} ${AOT} 1
140138
cp pytorch/dist/*.whl ${WHEELFOLDER}
141139
cp intel-extension-for-pytorch/dist/*.whl ${WHEELFOLDER}
142-
rm -rf compile_bundle.sh llvm-project llvm-release pytorch
140+
cp torch-ccl/dist/*.whl ${WHEELFOLDER}
141+
rm -rf compile_bundle.sh llvm-project llvm-release pytorch torch-ccl
143142
export LD_PRELOAD=$(bash intel-extension-for-pytorch/tools/get_libstdcpp_lib.sh)
144-
145-
# The following is only for DeepSpeed case
146-
#Install oneccl-bind-pt(also named torch-ccl)
147-
set +e
148-
function env_backup() {
149-
key=$1
150-
env | grep ${key} > /dev/null
151-
if [ $? -gt 0 ]; then
152-
echo "unset"
153-
else
154-
value=$(env | grep "^${key}=")
155-
echo ${value#"${key}="}
156-
fi
157-
}
158-
function env_recover() {
159-
key=$1
160-
value=$2
161-
if [ "$value" == "unset" ]; then
162-
unset ${key}
163-
else
164-
export ${key}=${value}
165-
fi
166-
}
167-
168-
PKG_CONFIG_PATH_BK=$(env_backup PKG_CONFIG_PATH)
169-
ACL_BOARD_VENDOR_PATH_BK=$(env_backup ACL_BOARD_VENDOR_PATH)
170-
FPGA_VARS_DIR_BK=$(env_backup FPGA_VARS_DIR)
171-
DIAGUTIL_PATH_BK=$(env_backup DIAGUTIL_PATH)
172-
MANPATH_BK=$(env_backup MANPATH)
173-
CMAKE_PREFIX_PATH_BK=$(env_backup CMAKE_PREFIX_PATH)
174-
CMPLR_ROOT_BK=$(env_backup CMPLR_ROOT)
175-
FPGA_VARS_ARGS_BK=$(env_backup FPGA_VARS_ARGS)
176-
LIBRARY_PATH_BK=$(env_backup LIBRARY_PATH)
177-
OCL_ICD_FILENAMES_BK=$(env_backup OCL_ICD_FILENAMES)
178-
INTELFPGAOCLSDKROOT_BK=$(env_backup INTELFPGAOCLSDKROOT)
179-
LD_LIBRARY_PATH_BK=$(env_backup LD_LIBRARY_PATH)
180-
MKLROOT_BK=$(env_backup MKLROOT)
181-
NLSPATH_BK=$(env_backup NLSPATH)
182-
PATH_BK=$(env_backup PATH)
183-
CPATH_BK=$(env_backup CPATH)
184-
set -e
185-
source ${DPCPP_ROOT}/env/vars.sh
186-
source ${ONEMKL_ROOT}/env/vars.sh
187-
188-
if [ -d torch-ccl ]; then
189-
rm -rf torch-ccl
190-
fi
191-
git clone ${TORCHCCL_REPO}
192-
cd torch-ccl
193-
git checkout ${TORCHCCL_COMMIT}
194-
git submodule sync
195-
git submodule update --init --recursive
196-
if [ -d ${CONDA_PREFIX}/lib/gcc/x86_64-conda-linux-gnu ]; then
197-
export DPCPP_GCC_INSTALL_DIR="${CONDA_PREFIX}/lib/gcc/x86_64-conda-linux-gnu/12.3.0"
198-
fi
199-
export INTELONEAPIROOT=${ONEAPIROOT}
200-
USE_SYSTEM_ONECCL=ON COMPUTE_BACKEND=dpcpp python setup.py bdist_wheel
201-
unset INTELONEAPIROOT
202-
if [ -d ${CONDA_PREFIX}/lib/gcc/x86_64-conda-linux-gnu ]; then
203-
unset DPCPP_GCC_INSTALL_DIR
204-
fi
205-
cp dist/*.whl ${WHEELFOLDER}
206-
python -m pip install dist/*.whl
207-
cd ..
208-
rm -rf torch-ccl
209-
210-
set +e
211-
env_recover PKG_CONFIG_PATH ${PKG_CONFIG_PATH_BK}
212-
env_recover ACL_BOARD_VENDOR_PATH ${ACL_BOARD_VENDOR_PATH_BK}
213-
env_recover FPGA_VARS_DIR ${FPGA_VARS_DIR_BK}
214-
env_recover DIAGUTIL_PATH ${DIAGUTIL_PATH_BK}
215-
env_recover MANPATH ${MANPATH_BK}
216-
env_recover CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH_BK}
217-
env_recover CMPLR_ROOT ${CMPLR_ROOT_BK}
218-
env_recover FPGA_VARS_ARGS ${FPGA_VARS_ARGS_BK}
219-
env_recover LIBRARY_PATH ${LIBRARY_PATH_BK}
220-
env_recover OCL_ICD_FILENAMES ${OCL_ICD_FILENAMES_BK}
221-
env_recover INTELFPGAOCLSDKROOT ${INTELFPGAOCLSDKROOT_BK}
222-
env_recover LD_LIBRARY_PATH ${LD_LIBRARY_PATH_BK}
223-
env_recover MKLROOT ${MKLROOT_BK}
224-
env_recover NLSPATH ${NLSPATH_BK}
225-
env_recover PATH ${PATH_BK}
226-
env_recover CPATH ${CPATH_BK}
227-
set -e
228143
fi
229144

230145
echo "python -m pip install impi-devel" >> ${AUX_INSTALL_SCRIPT}
231-
echo "python -m pip install cpuid accelerate datasets sentencepiece protobuf==${VER_PROTOBUF} huggingface_hub mpi4py mkl" >> ${AUX_INSTALL_SCRIPT}
232-
echo "python -m pip install lm_eval" >> ${AUX_INSTALL_SCRIPT}
146+
echo "python -m pip install cpuid accelerate datasets sentencepiece diffusers protobuf==${VER_PROTOBUF} huggingface_hub mpi4py mkl" >> ${AUX_INSTALL_SCRIPT}
147+
echo "python -m pip install llm_eval==${VER_LLM_EVAL}" >> ${AUX_INSTALL_SCRIPT}
233148

234149

235150
# Install Transformers
@@ -277,14 +192,7 @@ if [ $((${MODE} & 0x02)) -ne 0 ]; then
277192
rm -rf DeepSpeed
278193
fi
279194
if [ $((${MODE} & 0x01)) -ne 0 ]; then
280-
bash ${TORCH_INSTALL_SCRIPT}
281195
python -m pip install ${WHEELFOLDER}/*.whl
282196
bash ${AUX_INSTALL_SCRIPT}
283197
rm -rf ${WHEELFOLDER}
284-
if [ -f ${TORCH_INSTALL_SCRIPT} ]; then
285-
rm ${TORCH_INSTALL_SCRIPT}
286-
fi
287-
if [ -f ${AUX_INSTALL_SCRIPT} ]; then
288-
rm ${AUX_INSTALL_SCRIPT}
289-
fi
290198
fi

0 commit comments

Comments
 (0)