Replies: 1 comment 1 reply
-
Hi @tristansdj - you should try using the KeOps integration instead: https://docs.gpytorch.ai/en/latest/examples/02_Scalable_Exact_GPs/KeOps_GP_Regression.html. It is able to precompute caches and do training on a single GPU, often with much more efficiency than our multi-gpu implementation. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello GPyTorchers!
I have implemented a script that trains scalable GP models as shown in this webpage.
I am training my model on a single GPU with 200,000 data points. This takes about 20 minutes which is fine since the model is then saved in a '.pth' file for later reuse.
However, after loading the trained parameters to a new instance of the model, it takes a long time for the cache to be computed on a single GPU before being able to do almost instantaneous predictions. In the title of Table 1 of this paper, the authors mention that they compute the one-time pre-computed cache using 8 GPUs, but I only have a single GPU available on my computer and my application of GPs is rather time-critical. To be more specific, I am building several GP surrogates that I want to load before running a script that evaluates those surrogates repeatedly. And I would like this loading process (which includes the cache pre-computation) to be run as fast as possible.
For example right now, for a model built with 200,000 data points, it takes 900s to pre-compute the cache of my single GPU. If I have, say, 5 such models that I want to load, that would mean more than an hour to pre-compute the cache of my GPU. Is there a way to accelerate this pre-computation of the cache on a single GPU?
Two related question:
Thanks a lot for your insights!
Beta Was this translation helpful? Give feedback.
All reactions