Add RaspberryPi Tutorials to deploy & infer llama model

psiddh · psiddh · commit 3742017b4663 · 2025-10-14T08:11:54.000-07:00
diff --git a/docs/source/edge-platforms-section.md b/docs/source/edge-platforms-section.md
@@ -12,6 +12,7 @@ Deploy ExecuTorch on Android devices with hardware acceleration support.
 **→ {doc}`android-section` — Complete Android deployment guide**
 
 Key features:
+
 - Hardware acceleration support (CPU, GPU, NPU)
 - Multiple backend options (XNNPACK, Vulkan, Qualcomm, MediaTek, ARM, Samsung)
 - Comprehensive examples and demos
diff --git a/docs/source/embedded-section.md b/docs/source/embedded-section.md
@@ -25,7 +25,7 @@ Start here for C++ development with ExecuTorch runtime APIs and essential tutori
 ## Tutorials
 
 - {doc}`tutorial-arm-ethos-u` — Export a simple PyTorch model for the ExecuTorch Ethos-U backend
-
+- {doc}`raspberry_pi_llama_tutorial` — Deploy a LLaMA model on a Raspberry Pi with the ExecuTorch Ethos-U backend
 
 ```{toctree}
 :hidden:
@@ -37,3 +37,4 @@ using-executorch-cpp
 using-executorch-building-from-source
 embedded-backends
 tutorial-arm-ethos-u
+raspberry_pi_llama_tutorial
diff --git a/docs/source/raspberry_pi_llama_tutorial.md b/docs/source/raspberry_pi_llama_tutorial.md
@@ -0,0 +1,345 @@
+# ExecuTorch on Raspberry Pi
+
+## TLDR
+
+This tutorial demonstrates how to deploy **Llama models on Raspberry Pi 4/5 devices** using ExecuTorch:
+
+- **Prerequisites**: Linux host machine, Python 3.10-3.12, conda environment, Raspberry Pi 4/5
+- **Setup**: Automated cross-compilation using `setup.sh` script for ARM toolchain installation
+- **Export**: Convert Llama models to optimized `.pte` format with quantization options
+- **Deploy**: Transfer binaries to Raspberry Pi and configure runtime libraries
+- **Optimize**: Build optimization and performance tuning techniques
+- **Result**: Efficient on-device Llama inference
+
+## Prerequisites and Hardware Requirements
+
+### Host Machine Requirements
+
+**Operating System**: Linux x86_64 (Ubuntu 20.04+ or CentOS Stream 9+)
+
+**Software Dependencies**:
+
+- **Python 3.10-3.12** (ExecuTorch requirement)
+- **conda** or **venv** for environment management
+- **CMake 3.29.6+** for cross-compilation
+- **Git** for repository cloning
+
+### Target Device Requirements
+
+**Supported Devices**: **Raspberry Pi 4** and **Raspberry Pi 5** with **64-bit OS**
+
+**Memory Requirements**:
+
+- **Minimum 4GB RAM** (8GB recommended for larger models)
+- **8GB+ storage** for model files and binaries
+- **64-bit Raspberry Pi OS** (Bullseye or newer)
+
+### Verification Commands
+
+Verify your host machine compatibility:
+```bash
+# Check OS and architecture
+uname -s  # Should output: Linux
+uname -m  # Should output: x86_64
+
+# Check Python version
+python3 --version  # Should be 3.10-3.12
+
+# Check required tools
+which cmake git md5sum
+cmake --version  # Should be 3.29.6+ at minimum
+
+## Development Environment Setup
+
+### Clone ExecuTorch Repository
+
+First, clone the ExecuTorch repository with the Raspberry Pi support:
+
+```bash
+# Create project directory
+mkdir ~/executorch-rpi && cd ~/executorch-rpi
+
+# Clone ExecuTorch repository
+git clone -b release/1.0 https://github.com/pytorch/executorch.git
+cd executorch
+```
+
+### Create Conda Environment
+
+```bash
+# Create conda environment
+conda create -yn executorch python=3.10.0
+conda activate executorch
+
+# Upgrade pip
+pip install --upgrade pip
+```
+
+Alternative: Virtual Environment
+If you prefer Python's built-in virtual environment:
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install --upgrade pip
+```
+
+Refer to → {doc}`getting-started` for more details.
+
+## Cross-Compilation Toolchain Setup
+
+Run the following automated cross compile script on your Linux host machine:
+
+```bash
+# Run the Raspberry Pi setup script for Pi 5
+examples/raspberry_pi/setup.sh pi5
+
+[100%] Linking CXX executable llama_main
+[100%] Built target llama_main
+[SUCCESS] LLaMA runner built successfully
+
+==== Verifying Build Outputs ====
+[SUCCESS] ✓ llama_main (6.1M)
+[SUCCESS] ✓ libllama_runner.so (4.0M)
+[SUCCESS] ✓ libextension_module.a (89K) - static library
+
+✓ ExecuTorch cross-compilation setup completed successfully!
+```
+
+## Model Preparation and Export
+
+### Download Llama Models
+
+Download the Llama model from Hugging Face or any other source, and make sure that following files exist.
+
+- consolidated.00.pth (model weights)
+- params.json (model config)
+- tokenizer.model (tokenizer)
+
+### Export Llama to ExecuTorch Format
+
+After downloading the Llama model, export it to ExecuTorch format using the provided script:
+
+```bash
+
+#### Set these paths to point to the exported files. Following is an example instruction to export a llama model
+
+LLAMA_QUANTIZED_CHECKPOINT=path/to/consolidated.00.pth
+LLAMA_PARAMS=path/to/params.json
+
+python -m extension.llm.export.export_llm \
+  --config examples/models/llama/config/llama_xnnpack_spinquant.yaml \
+  +base.model_class="llama3_2" \
+  +base.checkpoint="${LLAMA_QUANTIZED_CHECKPOINT:?}" \
+  +base.params="${LLAMA_PARAMS:?}"
+```
+
+The file llama3_2.pte will be generated at the place where you run the command
+
+## Raspberry Pi Deployment
+
+### Transfer Binaries to Raspberry Pi
+
+After successful cross-compilation, transfer the required files:
+
+```bash
+##### Set Raspberry Pi details
+export RPI_UN="pi"  # Your Raspberry Pi username
+export RPI_IP="your-rpi-ip-address"
+
+##### Create deployment directory on Raspberry Pi
+ssh $RPI_UN@$RPI_IP 'mkdir -p ~/executorch-deployment'
+##### Copy main executable
+scp cmake-out/examples/models/llama/llama_main $RPI_UN@$RPI_IP:~/executorch-deployment/
+##### Copy runtime library
+scp cmake-out/examples/models/llama/runner/libllama_runner.so $RPI_UN@$RPI_IP:~/executorch-deployment/
+##### Copy model file
+scp llama3_2.pte $RPI_UN@$RPI_IP:~/executorch-deployment/
+scp ./tokenizer.model $RPI_UN@$RPI_IP:~/executorch-deployment/
+```
+
+### Configure Runtime Libraries on Raspberry Pi
+
+SSH into your Raspberry Pi and configure the runtime:
+
+#### Set up library environment
+
+```bash
+cd ~/executorch-deployment
+echo 'export LD_LIBRARY_PATH=$(pwd):$LD_LIBRARY_PATH' > setup_env.sh
+chmod +x setup_env.sh
+
+#### Make executable
+
+chmod +x llama_main
+```
+
+## Dry Run
+
+```bash
+source setup_env.sh
+./llama_main --help
+```
+
+Make sure that the output does not have any GLIBC / other library mismatch errors in the output. If you see any, follow the troubleshooting steps below.
+
+## Troubleshooting
+
+### Issue 1: GLIBC Version Mismatch
+
+**Problem:** The binary was compiled with a newer GLIBC version (2.38) than what's available on your Raspberry Pi (2.36).
+
+**Error Symptoms:**
+
+```bash
+./llama_main: /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by ./llama_main)
+./llama_main: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ./llama_main)
+./llama_main: /lib/aarch64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./llama_main)
+./llama_main: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /lib/libllama_runner.so)
+```
+
+#### Solution A: Upgrade GLIBC on Raspberry Pi (Recommended)
+
+1. **Check your current GLIBC version:**
+
+```bash
+ldd --version
+# Output: ldd (Debian GLIBC 2.36-9+rpt2+deb12u12) 2.36
+```
+
+2. **Upgrade to newer GLIBC:**
+
+```bash
+# Add Debian unstable repository
+echo "deb http://deb.debian.org/debian sid main contrib non-free" | sudo tee -a /etc/apt/sources.list
+
+# Update package lists
+sudo apt update
+
+# Install newer GLIBC packages
+sudo apt-get -t sid install libc6 libstdc++6
+
+# Reboot system
+sudo reboot
+```
+
+3. **Test the fix:**
+
+```bash
+cd ~/executorch-deployment
+source setup_env.sh
+./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "Hello"
+```
+
+**Important Notes:**
+
+- Select "Yes" when prompted to restart services
+- Press Enter to keep current version for configuration files
+- Backup important data before upgrading
+
+#### Solution B: Rebuild with Raspberry Pi's GLIBC (Advanced)
+
+If you prefer not to upgrade your Raspberry Pi system:
+
+1. **Copy Pi's filesystem to host machine:**
+```bash
+# On Raspberry Pi - install rsync
+ssh pi@<your-rpi-ip>
+sudo apt update && sudo apt install rsync
+exit
+
+# On host machine - copy Pi's filesystem
+mkdir -p ~/rpi5-sysroot
+rsync -aAXv --exclude={"/proc","/sys","/dev","/run","/tmp","/mnt","/media","/lost+found"} \
+    pi@<your-rpi-ip>:/ ~/rpi5-sysroot
+```
+
+2. **Update CMake toolchain file:**
+```bash
+# Edit arm-toolchain-pi5.cmake
+# Replace this line:
+# set(CMAKE_SYSROOT "${TOOLCHAIN_PATH}/aarch64-none-linux-gnu/libc")
+
+# With this:
+set(CMAKE_SYSROOT "/home/yourusername/rpi5-sysroot")
+set(CMAKE_FIND_ROOT_PATH "${CMAKE_SYSROOT}")
+```
+
+3. **Rebuild binaries:**
+```bash
+# Clean and rebuild
+rm -rf cmake-out
+./examples/raspberry_pi/rpi_setup.sh pi5 --force-rebuild
+
+# Verify GLIBC version
+strings ./cmake-out/examples/models/llama/llama_main | grep GLIBC_
+# Should show max GLIBC_2.36 (matching your Pi)
+```
+
+---
+
+### Issue 2: Library Not Found
+
+**Problem:** Required libraries are not found at runtime.
+
+**Error Symptoms:**
+```bash
+./llama_main: error while loading shared libraries: libllama_runner.so: cannot open shared object file
+```
+
+**Solution:**
+```bash
+# Ensure you're in the correct directory and environment is set
+cd ~/executorch-deployment
+source setup_env.sh
+./llama_main --help
+```
+
+**Root Cause:** Either `LD_LIBRARY_PATH` is not set or you're not in the deployment directory.
+
+---
+
+### Issue 3: Tokenizer JSON Parsing Warnings
+
+**Problem:** Warning messages about JSON parsing errors after running the llama_main binary.
+
+**Error Symptoms:**
+
+```bash
+E tokenizers:hf_tokenizer.cpp:60] Error parsing json file: [json.exception.parse_error.101]
+```
+
+**Solution:** These warnings can be safely ignored. They don't affect model inference.
+
+---
+
+
+## Quick Test Command
+
+After resolving issues, test with:
+
+```bash
+cd ~/executorch-deployment
+source setup_env.sh
+./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "What is the meaning of life?"
+```
+
+## Debugging Tools
+
+Enable ExecuTorch logging:
+
+```bash
+# Set log level for debugging
+export ET_LOG_LEVEL=Info
+./llama_main --model_path ./model.pte --verbose
+```
+
+## Final Run command
+
+```bash
+cd ~/executorch-deployment
+source setup_env.sh
+./llama_main --model_path ./llama3_2.pte --tokenizer_path ./tokenizer.model --seq_len 128 --prompt "What is the meaning of life?"
+```
+
+Happy Inferencing!