-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Describe the bug
On any model, and any # of nodes, EXO will active its "LOADING" step, then switch to "WARMING UP"... On Ubuntu 25, which natively ships with and runs Python 3.13, Python process during the "WARMING UP" stage will break and lose all RAM assigned to it. The Python task will then become zombie.
To Reproduce
Steps to reproduce the behavior:
- Launch EXO, load model on Ubuntu 25 (Python 3.13)
- Wait for WARMING UP stage
- You should get a FAILED message after the Python task has failed and possibly a "Python 3.13 has quit unexpectedly" pop-up.
Expected behavior
EXO should correctly load the model as it does with Python 3.11 on all connected nodes.
Actual behavior
EXO starts, begins to load model (ex. Llama3.1:8B) 1 or 2+ nodes. EXO successfully loads model into RAM on both/one node(s). 5s after, python task held by EXO quits due to incompatible memory manager causing a crash at the C-extension level.
Environment
- macOS Version: N/A
- EXO Version: Latest (Official)
- Hardware:
- Device 1: Dell PowerEdge (64GB) x2 Intel Xeon
- Device 2: Dell PowerEdge (128GB) x2 AMD
- Additional devices:
- Interconnection:
- 1GbE Ethernet between all devices.
Additional context
ALL packages up to date, many models attempted, Python reinstalled. I understand this is likely an already known issue, but would like to express my concerns/experience to better improve the software.
Add any other context about the problem here.