-
-
Notifications
You must be signed in to change notification settings - Fork 323
Description
OS
Windows
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.7.0
Model
Llama-3.1-8B-Instruct-exl2 (turboderp) - but occurs with all I've tested
Describe the bug
Clearly something I've done, but I don't know what - and other CUDA apps are working fine still (llama.cpp server)
Exits without error during model load - initially noticed with TabbyAPI but it also occurs with minimal_chat.py from the examples dir
Running in a debugger it appears to get to line 583 of model.py (module.load()) and then exits, no error is reported
module is ExLlamaV2Embedding, and within that it seems to be line 49 of embedding.py (w = self.load_weight() ),
then 143 of module.py (tensors = self.load_multi(key, ["weight"],cpu=cpu)
The exit seems to occur at line 96 of module.py (stfile.get_tensor())
I noticed that on "Building C++/CUDA extension" it never goes above 0% although the software continues to load. I read about an issue with setuptools so downgraded to 75.8.2 in case that was the issue, but no result.
Reproduction steps
Windows 10, NVIDIA A6000, CUDA 12.4 - upgraded to 12.8 which was when this issue developed (also upgraded exllamav2 from 0.2.8 to 0.2.9, but downgrading doesn't fix this). I've since upgraded to CUDA 12.9
minimal_chat.py is sufficient to exhibit this behaviour (loading the turboderp version of Llama-3.1-8B-Instruct-exl2)
Expected behavior
Model loads, or at least an error message appears
Logs
Package Version
------------------------- -----------
aiofiles 24.1.0
aiohappyeyeballs 2.6.1
aiohttp 3.11.18
aiosignal 1.3.2
annotated-types 0.7.0
anyio 4.9.0
async-lru 2.0.5
attrs 25.3.0
certifi 2025.4.26
charset-normalizer 3.4.2
click 8.1.8
colorama 0.4.6
cramjam 2.10.0
einops 0.8.1
exllamav2 0.2.9
fastapi-slim 0.115.12
fastparquet 2024.11.0
filelock 3.18.0
flash_attn 2.7.4.post1
formatron 0.4.11
frozendict 2.4.6
frozenlist 1.6.0
fsspec 2025.3.2
general_sam 1.0.2
h11 0.16.0
httptools 0.6.4
huggingface-hub 0.30.2
idna 3.10
Jinja2 3.1.6
jsonschema 4.23.0
jsonschema-specifications 2025.4.1
kbnf 0.4.1
loguru 0.7.3
markdown-it-py 3.0.0
MarkupSafe 3.0.2
mdurl 0.1.2
mpmath 1.3.0
multidict 6.4.3
networkx 3.4.2
ninja 1.11.1.4
numpy 2.2.5
packaging 25.0
pandas 2.2.3
pillow 11.2.1
pip 24.0
propcache 0.3.1
psutil 7.0.0
pydantic 2.11.4
pydantic_core 2.33.2
Pygments 2.19.1
python-dateutil 2.9.0.post0
pytz 2025.2
PyYAML 6.0.2
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rich 14.0.0
rpds-py 0.24.0
ruamel.yaml 0.18.10
ruamel.yaml.clib 0.2.12
safetensors 0.5.3
sentencepiece 0.2.0
setuptools 75.8.2
six 1.17.0
sniffio 1.3.1
sse-starlette 2.3.4
starlette 0.46.2
sympy 1.14.0
tabbyAPI 0.0.1
tokenizers 0.21.1
torch 2.7.0+cu128
tqdm 4.67.1
typing_extensions 4.13.2
typing-inspection 0.4.0
tzdata 2025.2
urllib3 2.4.0
uvicorn 0.34.2
websockets 15.0.1
win32_setctime 1.2.0
winloop 0.1.8
yarl 1.20.0
Additional context
No response
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.