Out of memory error using Matrix.get with many observations

I ran into an out of memory error using the `Matrix.get` method in a case where I have a lot of zero-weighted observations that I would like to carry through history-matching as metadata. It looks like the error occurs when np.diag is called:

```
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[59], line 6
      3 print(len(obs_cov.x))
      5 # reduce cov down to only include non-zero obsverations
----> 6 obs_cov = obs_cov.get(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )

File ~\Documents\Development\repos\GMDSI_notebooks\dependencies\pyemu\pyemu\mat\mat_handler.py:1686, in Matrix.get(self, row_names, col_names, drop)
   1684     return Cov(x=extract, names=names, isdiagonal=self.isdiagonal)
   1685 if self.isdiagonal:
-> 1686     extract = np.diag(self.__x[:, 0])
   1687 else:
   1688     extract = self.__x.copy()

File ~\AppData\Local\anaconda3\envs\gmdsitut\lib\site-packages\numpy\lib\twodim_base.py:293, in diag(v, k)
    291 if len(s) == 1:
    292     n = s[0]+abs(k)
--> 293     res = zeros((n, n), v.dtype)
    294     if k >= 0:
    295         i = k

MemoryError: Unable to allocate 110. GiB for an array with shape (121252, 121252) and data type float64
```

It seems modifying the method to use scipy.sparse.diags can offer a workaround:

```
    def spget(self, row_names=None, col_names=None, drop=False):

        if row_names is None and col_names is None:
            raise Exception(
                "Matrix.get(): must pass at least" + " row_names or col_names"
            )

        if row_names is not None and not isinstance(row_names, list):
            row_names = [row_names]
        if col_names is not None and not isinstance(col_names, list):
            col_names = [col_names]

        if isinstance(self, Cov) and (row_names is None or col_names is None):
            if row_names is not None:
                idxs = self.indices(row_names, axis=0)
                names = row_names
            else:
                idxs = self.indices(col_names, axis=1)
                names = col_names

            if self.isdiagonal:
                extract = self.__x[idxs].copy()
            else:
                extract = self.__x[idxs, :].copy()
                extract = extract[:, idxs]
            if drop:
                self.drop(names, 0)
            return Cov(x=extract, names=names, isdiagonal=self.isdiagonal)
        if self.isdiagonal:
            extract = sp.diags(self.__x[:, 0], offsets=0, format='csr') # switch from np to sp
        else:
            extract = self.__x.copy()

        if row_names is not None:
            row_idxs = self.indices(row_names, axis=0)
            extract = extract[row_idxs, :]  # this row is modified
            if drop:
                self.drop(row_names, axis=0)
        else:
            row_names = self.row_names

        if col_names is not None:
            col_idxs = self.indices(col_names, axis=1)
            extract = extract[:, col_idxs]  # this row is also modified
            if drop:
                self.drop(col_names, axis=1)
        else:
            col_names = copy.deepcopy(self.col_names)

        return type(self)(x=extract.toarray(), row_names=row_names, col_names=col_names) # modified to use .toarray()
```

Comparing the methods using a smaller amount of observations (allowing the use of both `get` and `spget`):
```
obs_cov_reduced = obs_cov.get(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )
obs_cov_reduced_sp = obs_cov.spget(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )
print(np.array_equal(obs_cov_reduced.x, obs_cov_reduced_sp.x))

True
```

I wanted to bring this to attention because quite a lot of use-cases seem to rely on this method. Although I'm not sure whether this modification has any unforseen impact further down the line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory error using Matrix.get with many observations #465

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out of memory error using Matrix.get with many observations #465

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions