How to define customized keops.kernels to enable GPU computation? #2408

cookbook-ms · 2023-09-18T21:54:41Z

cookbook-ms
Sep 18, 2023

Below is how I define my customized kernel. It works on GPU now.
But it has memory issues when the size of the laplacians are big as 15k.

import numpy as np
from scipy import sparse
from scipy.sparse import linalg
import scipy
import torch 
import math 
from gpytorch.kernels import Kernel, ScaleKernel 
from gpytorch.constraints import Positive, Interval

class Kernel(Kernel):
    def __init__(self, laplacians, kappa_bounds=(1e-5,1e5)): 
        super().__init__()
        self.L1, self.L1_down, self.L1_up = laplacians
        
        # register the raw parameters
        self.register_parameter(
            name='raw_kappa_down', parameter=torch.nn.Parameter(torch.zeros(1,1))
        )
        self.register_parameter(
            name='raw_kappa_up', parameter=torch.nn.Parameter(torch.zeros(1,1))
        )
        # set the kappa constraints
        self.register_constraint(
            'raw_kappa_down', Interval(*kappa_bounds)
        )
        self.register_constraint(
            'raw_kappa_up', Interval(*kappa_bounds)
        )
        # we do not set the prior on the parameters 

    # set up the actual parameters 
    @property
    def kappa_down(self):
        return self.raw_kappa_down_constraint.transform(self.raw_kappa_down)

    @kappa_down.setter
    def kappa_down(self, value):
        self._set_kappa_down(value)

    def _set_kappa_down(self, value):
        if not torch.is_tensor(value):
            value = torch.as_tensor(value).to(self.raw_kappa_down)
        self.initialize(raw_kappa_down=self.raw_kappa_down_constraint.inverse_transform(value))

    @property
    def kappa_up(self):
        return self.raw_kappa_up_constraint.transform(self.raw_kappa_up)
    
    @kappa_up.setter
    def kappa_up(self, value):
        self._set_kappa_up(value)

    def _set_kappa_up(self, value):
        if not torch.is_tensor(value):
            value = torch.as_tensor(value).to(self.raw_kappa_up)
        self.initialize(raw_kappa_up=self.raw_kappa_up_constraint.inverse_transform(value))
 
    def _eval_covar_matrix(self):
        """Define the full covariance matrix -- full kernel matrix as a property to avoid repeative computation of the kernel matrix"""
        K1 = torch.linalg.matrix_exp(- (self.kappa_down*self.L1_down + self.kappa_up*self.L1_up))
        return K1
    
    @property
    def covar_matrix(self):
        return self._eval_covar_matrix()
        
    # define the kernel function 
    def forward(self, x1, x2=None, **params):
        x1, x2 = x1.long(), x2.long()
        x1 = x1.squeeze(-1)
        x2 = x2.squeeze(-1)
        # compute the kernel matrix
        if x2 is None: 
            x2 = x1
        # c = self.covar_matrix
        # a = c[x1,:]
        # b = a[:,x2] 
        return self.covar_matrix[x1,:][:,x2]

What I get about the memory issue is :

File ~/miniconda3/lib/python3.11/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py:397, in LazyEvaluatedKernelTensor.representation(self)
393 return super().representation()
394 # Otherwise, we'll evaluate the kernel (or at least its LinearOperator representation) and use its
395 # representation
396 else:
--> 397 return self.evaluate_kernel().representation()

File ~/miniconda3/lib/python3.11/site-packages/gpytorch/utils/memoize.py:59, in _cached..g(self, *args, **kwargs)
57 kwargs_pkl = pickle.dumps(kwargs)
58 if not _is_in_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl):
---> 59 return _add_to_cache(self, cache_name, method(self, *args, **kwargs), *args, kwargs_pkl=kwargs_pkl)
60 return _get_from_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl)

File ~/miniconda3/lib/python3.11/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py:25, in recall_grad_state..wrapped(self, args, **kwargs)
22 @functools.wraps(method)
...
180 # K2 = torch.linalg.matrix_exp(-self.kappa_upself.L1_up)
181 # This is equivalent to K1+K2-h_0 * I (remove the repeated identity part)
182 return K1

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.35 GiB. GPU 0 has a total capacty of 9.77 GiB of which 833.75 MiB is free. Process 27433 has 348.00 MiB memory in use. Process 27952 has 1.25 GiB memory in use. Including non-PyTorch memory, this process has 6.99 GiB memory in use. Process 28537 has 352.00 MiB memory in use. Of the allocated memory 6.74 GiB is allocated by PyTorch, and 9.38 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answered by gpleiss

Sep 21, 2023

@cookbook-ms if you use keops to define your kernel you should avoid OOM errors. I am about to push a fix to #2363 , and then look at the RBF/Matern/Periodic examples that we have in GPyTorch to see how to define your kernel as a keops kernel.

Otherwise, there will be no way to avoid OOM errors without using a scalable method.

View full answer

Balandat · 2023-09-19T13:42:14Z

Balandat
Sep 19, 2023
Maintainer

but it doesn't on GPU

Can you be more specific? Can you provide a stack trace of the error you're running into?

4 replies

cookbook-ms Sep 19, 2023
Author

Hi Balandat, I just solved the issue. I will mark it as answered. Maybe better remove the discussion.

cookbook-ms Sep 20, 2023
Author

Hi Balandat, I wonder if you could have a look at how I can write a keops version of the above kernel?

Or is there some workaround such that I can avoid the memory issue as above?

gpleiss Sep 21, 2023
Maintainer

@cookbook-ms if you use keops to define your kernel you should avoid OOM errors. I am about to push a fix to #2363 , and then look at the RBF/Matern/Periodic examples that we have in GPyTorch to see how to define your kernel as a keops kernel.

Otherwise, there will be no way to avoid OOM errors without using a scalable method.

Answer selected by cookbook-ms

cookbook-ms Sep 21, 2023
Author

Thanks for your reply! I figured out a way to avoid OOM by slicing the eigenvector matrix from left and right. This hugely reduces the matrix multiplication dimension. Of course, it still depends on your training size and test size. But now it works on a GPU of 10GB memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to define customized keops.kernels to enable GPU computation? #2408

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to define customized keops.kernels to enable GPU computation? #2408

Uh oh!

Uh oh!

cookbook-ms Sep 18, 2023

Replies: 1 comment · 4 replies

Uh oh!

Balandat Sep 19, 2023 Maintainer

Uh oh!

Uh oh!

cookbook-ms Sep 19, 2023 Author

Uh oh!

cookbook-ms Sep 20, 2023 Author

Uh oh!

gpleiss Sep 21, 2023 Maintainer

Uh oh!

cookbook-ms Sep 21, 2023 Author

cookbook-ms
Sep 18, 2023

Replies: 1 comment 4 replies

Balandat
Sep 19, 2023
Maintainer

cookbook-ms Sep 19, 2023
Author

cookbook-ms Sep 20, 2023
Author

gpleiss Sep 21, 2023
Maintainer

cookbook-ms Sep 21, 2023
Author