-
Notifications
You must be signed in to change notification settings - Fork 580
Troubleshooting
This page lists common errors and ways to address them.
-
I get this error:
hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of otherson WSL. What should I do?Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server:
sudo apt install ntpdate sudo ntpdate pool.ntp.org
-
The server starts loading blocks and then prints:
Killed. What should I do?This happens since Windows doesn't allocate much RAM to WSL by default, so the server gets OOM-killed.
To increase the memory limit, go to
C:/Users/usernameand create the.wslconfigwith this contents:[wsl2] memory=12GB
Then reboot WSL (run
sudo rebootin the WSL console) and it should work fine. -
I get this error:
torch.cuda.OutOfMemoryError: CUDA out of memory. What should I do?If you use an Anaconda env, run this before starting the server:
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128If you use Docker, add this argument after
--rmin the Docker command:-e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128" -
WSL clock tends to get out of synch, which prevents Petals server launch with error
hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others.To sync WSL clock run
sudo ntpdate pool.ntp.org. See more fixes discussed at stackverflow.
If your error is not covered there, let us know in Discord and we will help!