Dockerfile for Running BitNet-b1.58-2B-4T on ARM

I tested building and running the Dockerfile using a MacBook Pro M4 Max running Rancher Desktop. I wasn't able to convert to TL1 for a Tensor-optimized look-up table so I used I2_S (Integer 2-bit Symmetric).

You might need to increase the resources available to Rancher Desktop (or Docker Desktop) to see a decent amount of performance. I was seeing ~20-30 token/s. I used QEMU emulation. I haven't yet tested using the Apple Virtualization framework.

I put together the steps to get this working on ARM from Bjan Bowen's Blog.

Clone from GitHub:

git clone https://github.com/ajsween/bitnet-b1-58-arm-docker.git

Build Docker container:

cd bitnet-b1-58-arm-docker
docker build -t bitnet-b1.58-2b-4t-arm:latest .

Run interactive:

docker run -it --rm bitnet-b1.58-2b-4t-arm:latest

Run noninteractive with arguments:

docker run --rm bitnet-b1.58-2b-4t-arm:latest \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "How do I change a tire?\n" \
  -t 4 \
  -c 4096 \
  --temp 0.4 \
  -n 1024 2&>/dev/null

Run interactive without STDERR:

I find the statistics that are visible through including STDERR in STDOUT to be useful but if you want to only see the Prompts and Responses remove the pseudo-TTY option (-t).

docker run -i --rm bitnet-b1.58-2b-4t-arm:latest

Reference for run_interference.py (ENTRYPOINT):

usage: run_inference.py [-h] [-m MODEL] [-n N_PREDICT] -p PROMPT [-t THREADS] [-c CTX_SIZE] [-temp TEMPERATURE] [-cnv]

Run inference

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to model file
  -n N_PREDICT, --n-predict N_PREDICT
                        Number of tokens to predict when generating text
  -p PROMPT, --prompt PROMPT
                        Prompt to generate text from
  -t THREADS, --threads THREADS
                        Number of threads to use
  -c CTX_SIZE, --ctx-size CTX_SIZE
                        Size of the prompt context
  -temp TEMPERATURE, --temperature TEMPERATURE
                        Temperature, a hyperparameter that controls the randomness of the generated text
  -cnv, --conversation  Whether to enable chat mode or not (for instruct models.)
                        (When this option is turned on, the prompt specified by -p will be used as the system prompt.)

bash Commands I based the Dockerfile on:

apt update && apt install -y \
  python3-pip python3-dev cmake build-essential \
  git software-properties-common wget

wget -O - https://apt.llvm.org/llvm.sh | bash -s 18

git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
pip install -r requirements.txt

python utils/codegen_tl1.py \
  --model bitnet_b1_58-3B \
  --BM 160,320,320 \
  --BK 64,128,64 \
  --bm 32,64,32

export CC=clang-18 CXX=clang++-18
rm -rf build && mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
cd ..

huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
  --local-dir models/BitNet-b1.58-2B-4T

python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "Hello from BitNet on Pi4!" -cnv

python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "Hello from BitNet running on ARM in a container on a Apple M4 Max!" \
  -cnv -t 4 -c 2048

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
dockerfile.bak		dockerfile.bak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dockerfile for Running BitNet-b1.58-2B-4T on ARM

Clone from GitHub:

Build Docker container:

Run interactive:

Run noninteractive with arguments:

Run interactive without STDERR:

Reference for run_interference.py (ENTRYPOINT):

bash Commands I based the Dockerfile on:

References

About

Uh oh!

Releases

Packages

Languages

ajsween/bitnet-b1-58-arm-docker

Folders and files

Latest commit

History

Repository files navigation

Dockerfile for Running BitNet-b1.58-2B-4T on ARM

Clone from GitHub:

Build Docker container:

Run interactive:

Run noninteractive with arguments:

Run interactive without STDERR:

Reference for run_interference.py (ENTRYPOINT):

bash Commands I based the Dockerfile on:

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages