Skip to content

Out of memory error using Matrix.get with many observations #465

@nikobenho

Description

@nikobenho

I ran into an out of memory error using the Matrix.get method in a case where I have a lot of zero-weighted observations that I would like to carry through history-matching as metadata. It looks like the error occurs when np.diag is called:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[59], line 6
      3 print(len(obs_cov.x))
      5 # reduce cov down to only include non-zero obsverations
----> 6 obs_cov = obs_cov.get(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )

File ~\Documents\Development\repos\GMDSI_notebooks\dependencies\pyemu\pyemu\mat\mat_handler.py:1686, in Matrix.get(self, row_names, col_names, drop)
   1684     return Cov(x=extract, names=names, isdiagonal=self.isdiagonal)
   1685 if self.isdiagonal:
-> 1686     extract = np.diag(self.__x[:, 0])
   1687 else:
   1688     extract = self.__x.copy()

File ~\AppData\Local\anaconda3\envs\gmdsitut\lib\site-packages\numpy\lib\twodim_base.py:293, in diag(v, k)
    291 if len(s) == 1:
    292     n = s[0]+abs(k)
--> 293     res = zeros((n, n), v.dtype)
    294     if k >= 0:
    295         i = k

MemoryError: Unable to allocate 110. GiB for an array with shape (121252, 121252) and data type float64

It seems modifying the method to use scipy.sparse.diags can offer a workaround:

    def spget(self, row_names=None, col_names=None, drop=False):

        if row_names is None and col_names is None:
            raise Exception(
                "Matrix.get(): must pass at least" + " row_names or col_names"
            )

        if row_names is not None and not isinstance(row_names, list):
            row_names = [row_names]
        if col_names is not None and not isinstance(col_names, list):
            col_names = [col_names]

        if isinstance(self, Cov) and (row_names is None or col_names is None):
            if row_names is not None:
                idxs = self.indices(row_names, axis=0)
                names = row_names
            else:
                idxs = self.indices(col_names, axis=1)
                names = col_names

            if self.isdiagonal:
                extract = self.__x[idxs].copy()
            else:
                extract = self.__x[idxs, :].copy()
                extract = extract[:, idxs]
            if drop:
                self.drop(names, 0)
            return Cov(x=extract, names=names, isdiagonal=self.isdiagonal)
        if self.isdiagonal:
            extract = sp.diags(self.__x[:, 0], offsets=0, format='csr') # switch from np to sp
        else:
            extract = self.__x.copy()

        if row_names is not None:
            row_idxs = self.indices(row_names, axis=0)
            extract = extract[row_idxs, :]  # this row is modified
            if drop:
                self.drop(row_names, axis=0)
        else:
            row_names = self.row_names

        if col_names is not None:
            col_idxs = self.indices(col_names, axis=1)
            extract = extract[:, col_idxs]  # this row is also modified
            if drop:
                self.drop(col_names, axis=1)
        else:
            col_names = copy.deepcopy(self.col_names)

        return type(self)(x=extract.toarray(), row_names=row_names, col_names=col_names) # modified to use .toarray()

Comparing the methods using a smaller amount of observations (allowing the use of both get and spget):

obs_cov_reduced = obs_cov.get(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )
obs_cov_reduced_sp = obs_cov.spget(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )
print(np.array_equal(obs_cov_reduced.x, obs_cov_reduced_sp.x))

True

I wanted to bring this to attention because quite a lot of use-cases seem to rely on this method. Although I'm not sure whether this modification has any unforseen impact further down the line.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions