Skip to content

WCS pixelToPosition gives very large number #129

@craigwarner-ufastro

Description

@craigwarner-ufastro

This is a CPU bug - in gpumode 0 (Factored CPU version) in galaxy.py.

We would get a crash due to an excessive amount of memory requested, e.g.

  File "/global/cfs/cdirs/desi/users/cdwarner/code/Tractor/legacypipe/py/legacypipe/image.py", line 2299, in getFourierTransform
    fft, (cx,cy), shape, (v,w) = super().getFourierTransform(px, py, radius)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tractor/psf.py", line 333, in tractor.psf.PixelizedPSF.getFourierTransform
  File "tractor/psf.py", line 308, in tractor.psf.PixelizedPSF._padInImage
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 4.00 EiB for an array with shape (1073741824, 1073741824) and data type float32

I traced this down to bad values of px, py, and halfsize in galaxy.py:

px0=508422395.67614156 py0=145391495.9114129
px=508422395.67614156 py=145391495.9114129 halfsize=np.float64(508422204.67614156)
H=1073741824 W=1073741824

It seems to originate in radec2pixelxy from within class ConstantFitsWcs(ParamList, ducks.WCS)::

    def positionToPixel(self, pos, src=None):
        '''
        Converts an :class:`tractor.RaDecPos` to a pixel position.
        Returns: tuple of floats ``(x, y)``
        '''
        X = self.wcs.radec2pixelxy(pos.ra, pos.dec)
        # handle X = (ok,x,y) and X = (x,y) return values
        if len(X) == 3:
            ok, x, y = X
        else:
            assert(len(X) == 2)
            x, y = X
        print (f'PTP3 {x=} {y=} {pos.ra=} {pos.dec=}')
        # MAGIC: subtract 1 to convert from FITS to zero-indexed pixels.
        return x - 1 - self.x0, y - 1 - self.y0

At the moment I have this fix in galaxy.py:

        if halfsize > 32768:
            print (f"WARNING: Bad positionToPixel results {px=} {py=} {halfsize=}")
            return None

However reproducing this error has proved troublesome.

[container] cdwarner@nid001112:/global/cfs/cdirs/desi/users/cdwarner/code/Tractor/legacypipe/bin$ grep positionToPixel $SCRATCH/dr11-gpu-*/logs/*/*.log
/pscratch/sd/c/cdwarner/dr11-gpu-test-cw-gpu0only-redux/logs/014/0145p340.log:WARNING: Bad positionToPixel results px=-29740530052122.24 py=-1271579979548.0696 halfsize=29740530052261.24
/pscratch/sd/c/cdwarner/dr11-gpu-test-cw-gpu0only/logs/014/0145p340.log:WARNING: Bad positionToPixel results px=-29740530052122.24 py=-1271579979548.0696 halfsize=29740530052261.24

I have been able to get it consistently to occur in 0145p340 only when running --gpumode 0 and when running in the current versions of tractor and legacypipe in my environment. It does not occur when running in the image legacypipe:gpu-1.4.3 or legacypipe-gpu:1.4.4 despite both being up to date in legacypipe branch gpu-powered and tractor branch craig_factored_merge. And even when I tell it to use my versions of legacypipe and tractor in its PYTHONPATH I still can't reproduce this error inside the container.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions