Troubleshooting

Jump to bottom Edit New page

Alexander Borzunov edited this page Aug 6, 2023 · 3 revisions

This page lists common errors and ways to address them.

I get this error: hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others on WSL. What should I do?

Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server:
```
sudo apt install ntpdate
sudo ntpdate pool.ntp.org
```
The server starts loading blocks and then prints: Killed. What should I do?

This happens since Windows doesn't allocate much RAM to WSL by default, so the server gets OOM-killed.

To increase the memory limit, go to C:/Users/username and create the .wslconfig with this contents:
```
[wsl2]
memory=12GB
```
Then reboot WSL (run sudo reboot in the WSL console) and it should work fine.
I get this error: torch.cuda.OutOfMemoryError: CUDA out of memory. What should I do?

If you use an Anaconda env, run this before starting the server:
```
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
```
If you use Docker, add this argument after --rm in the Docker command:
```
-e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"
```

This project is a part of the BigScience research workshop.