Skip to content

[BUG] Quits without obvious error during model load #787

@thigger

Description

@thigger

OS

Windows

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.7.0

Model

Llama-3.1-8B-Instruct-exl2 (turboderp) - but occurs with all I've tested

Describe the bug

Clearly something I've done, but I don't know what - and other CUDA apps are working fine still (llama.cpp server)

Exits without error during model load - initially noticed with TabbyAPI but it also occurs with minimal_chat.py from the examples dir

Running in a debugger it appears to get to line 583 of model.py (module.load()) and then exits, no error is reported
module is ExLlamaV2Embedding, and within that it seems to be line 49 of embedding.py (w = self.load_weight() ),
then 143 of module.py (tensors = self.load_multi(key, ["weight"],cpu=cpu)

The exit seems to occur at line 96 of module.py (stfile.get_tensor())

I noticed that on "Building C++/CUDA extension" it never goes above 0% although the software continues to load. I read about an issue with setuptools so downgraded to 75.8.2 in case that was the issue, but no result.

Reproduction steps

Windows 10, NVIDIA A6000, CUDA 12.4 - upgraded to 12.8 which was when this issue developed (also upgraded exllamav2 from 0.2.8 to 0.2.9, but downgrading doesn't fix this). I've since upgraded to CUDA 12.9

minimal_chat.py is sufficient to exhibit this behaviour (loading the turboderp version of Llama-3.1-8B-Instruct-exl2)

Expected behavior

Model loads, or at least an error message appears

Logs

Package                   Version
------------------------- -----------
aiofiles                  24.1.0
aiohappyeyeballs          2.6.1
aiohttp                   3.11.18
aiosignal                 1.3.2
annotated-types           0.7.0
anyio                     4.9.0
async-lru                 2.0.5
attrs                     25.3.0
certifi                   2025.4.26
charset-normalizer        3.4.2
click                     8.1.8
colorama                  0.4.6
cramjam                   2.10.0
einops                    0.8.1
exllamav2                 0.2.9
fastapi-slim              0.115.12
fastparquet               2024.11.0
filelock                  3.18.0
flash_attn                2.7.4.post1
formatron                 0.4.11
frozendict                2.4.6
frozenlist                1.6.0
fsspec                    2025.3.2
general_sam               1.0.2
h11                       0.16.0
httptools                 0.6.4
huggingface-hub           0.30.2
idna                      3.10
Jinja2                    3.1.6
jsonschema                4.23.0
jsonschema-specifications 2025.4.1
kbnf                      0.4.1
loguru                    0.7.3
markdown-it-py            3.0.0
MarkupSafe                3.0.2
mdurl                     0.1.2
mpmath                    1.3.0
multidict                 6.4.3
networkx                  3.4.2
ninja                     1.11.1.4
numpy                     2.2.5
packaging                 25.0
pandas                    2.2.3
pillow                    11.2.1
pip                       24.0
propcache                 0.3.1
psutil                    7.0.0
pydantic                  2.11.4
pydantic_core             2.33.2
Pygments                  2.19.1
python-dateutil           2.9.0.post0
pytz                      2025.2
PyYAML                    6.0.2
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.3
rich                      14.0.0
rpds-py                   0.24.0
ruamel.yaml               0.18.10
ruamel.yaml.clib          0.2.12
safetensors               0.5.3
sentencepiece             0.2.0
setuptools                75.8.2
six                       1.17.0
sniffio                   1.3.1
sse-starlette             2.3.4
starlette                 0.46.2
sympy                     1.14.0
tabbyAPI                  0.0.1
tokenizers                0.21.1
torch                     2.7.0+cu128
tqdm                      4.67.1
typing_extensions         4.13.2
typing-inspection         0.4.0
tzdata                    2025.2
urllib3                   2.4.0
uvicorn                   0.34.2
websockets                15.0.1
win32_setctime            1.2.0
winloop                   0.1.8
yarl                      1.20.0

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions