Device Assignment Problem for model.eval() in TGN #7008
-
I augmented this TGN example to work on my dataset. My dataset is composed of 36 CSVs, each 37.3MB. When I run the original TGN on cuda there are no issues. When I run my augmented network on CPU it works. I would keep it on CPU, but it will take over 18 hours to train & test running on CPU over just 10 epochs, so I need to utilize the Nvidia RTX A6000 GPUs with 48GB available through my university. Unfortunately, when I switch the device to cuda, I get the error below when I have searched the internet high and low to find how to place all components on the cuda device and have tried many fixes, to no avail. I now have many device assignment redundancies in my code, but I have not been able to make the devices match. Could the error be in how I created the datasets? (see "2. Temporal Dataset creation:" code block) Since I have separate datasets for training, testing, and validation I also moved the code that calls I am happy to provide any further information. This is my first time posting a help discussion post, so any feedback on clarity of asking questions is also welcomed. Thank you in advance!
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I pushed a fix here: #7028 With this, you should be able to run memory = TGNMemory(...).to(device)
memory.reset_state()
memory.eval() |
Beta Was this translation helpful? Give feedback.
I pushed a fix here: #7028
With this, you should be able to run