-
Notifications
You must be signed in to change notification settings - Fork 248
Bug: Step 3.5 Flash higher VRAM usage and loading issues after #1307 #1324
Copy link
Copy link
Open
Description
What happened?
After #1307 , loading Step 3.5 Flash with sm graph takes more VRAM than it does before #1309 , and the distribution is different, leading to issues loading unless I reduce the number of GPU layers.
https://streamable.com/6tsws9
https://streamable.com/1kbey0
My build script:
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status
cd /home/quair/Downloads/ik_llama.cpp/
rm -rf build
cmake -B build \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES=native
cmake --build build --config Release -j$(nproc)My load script:
#!/bin/bash
llama-server \
--host 0.0.0.0 \
--port 8085 \
--model "/mnt/Speed/AI/Models/AesSedai/Step-3.5-Flash-GGUF/Step-3.5-Flash-IQ4_XS-00001-of-00003.gguf" \
-a Step3.5 \
-b 8192 \
-ub 8192 \
--threads 16 \
--ctx-size 36864 \
--n-gpu-layers 999 \
-ot "(1[4-9]|[0-9][0-9])\..*_exps.*=CPU" \
--no-mmap \
-fa on \
-sm graph \
-ts 180,191 \
-np 1 \
-smgs \
-cram 0 \
-cuda fusion=1Name and Version
version: 4225 (7065488)
built with cc (GCC) 15.2.1 20260209 for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels