Skip to content

ajsween/bitnet-b1-58-arm-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Dockerfile for Running BitNet-b1.58-2B-4T on ARM

I tested building and running the Dockerfile using a MacBook Pro M4 Max running Rancher Desktop. I wasn't able to convert to TL1 for a Tensor-optimized look-up table so I used I2_S (Integer 2-bit Symmetric).

You might need to increase the resources available to Rancher Desktop (or Docker Desktop) to see a decent amount of performance. I was seeing ~20-30 token/s. I used QEMU emulation. I haven't yet tested using the Apple Virtualization framework.

I put together the steps to get this working on ARM from Bjan Bowen's Blog.

Clone from GitHub:

git clone https://github.com/ajsween/bitnet-b1-58-arm-docker.git

Build Docker container:

cd bitnet-b1-58-arm-docker
docker build -t bitnet-b1.58-2b-4t-arm:latest .

Run interactive:

docker run -it --rm bitnet-b1.58-2b-4t-arm:latest

Run noninteractive with arguments:

docker run --rm bitnet-b1.58-2b-4t-arm:latest \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "How do I change a tire?\n" \
  -t 4 \
  -c 4096 \
  --temp 0.4 \
  -n 1024 2&>/dev/null

Run interactive without STDERR:

I find the statistics that are visible through including STDERR in STDOUT to be useful but if you want to only see the Prompts and Responses remove the pseudo-TTY option (-t).

docker run -i --rm bitnet-b1.58-2b-4t-arm:latest

Reference for run_interference.py (ENTRYPOINT):

usage: run_inference.py [-h] [-m MODEL] [-n N_PREDICT] -p PROMPT [-t THREADS] [-c CTX_SIZE] [-temp TEMPERATURE] [-cnv]

Run inference

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to model file
  -n N_PREDICT, --n-predict N_PREDICT
                        Number of tokens to predict when generating text
  -p PROMPT, --prompt PROMPT
                        Prompt to generate text from
  -t THREADS, --threads THREADS
                        Number of threads to use
  -c CTX_SIZE, --ctx-size CTX_SIZE
                        Size of the prompt context
  -temp TEMPERATURE, --temperature TEMPERATURE
                        Temperature, a hyperparameter that controls the randomness of the generated text
  -cnv, --conversation  Whether to enable chat mode or not (for instruct models.)
                        (When this option is turned on, the prompt specified by -p will be used as the system prompt.)

bash Commands I based the Dockerfile on:

apt update && apt install -y \
  python3-pip python3-dev cmake build-essential \
  git software-properties-common wget

wget -O - https://apt.llvm.org/llvm.sh | bash -s 18

git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
pip install -r requirements.txt

python utils/codegen_tl1.py \
  --model bitnet_b1_58-3B \
  --BM 160,320,320 \
  --BK 64,128,64 \
  --bm 32,64,32

export CC=clang-18 CXX=clang++-18
rm -rf build && mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
cd ..

huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
  --local-dir models/BitNet-b1.58-2B-4T

python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "Hello from BitNet on Pi4!" -cnv

python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "Hello from BitNet running on ARM in a container on a Apple M4 Max!" \
  -cnv -t 4 -c 2048

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published