Similarities not reproducible - likely due to changes in tr object as operated on by code

I have every source that has a similarity < 0.999 or > 1.0001 saved as a pickle file.  I also include in the code now saving an individual image when its similarity is outside of that range.  I created a new method `outputPickle()` to make this easier that replaces actually dumping `tr` into a pickle.  Its first argument is a filename fragment to identify what type of pickle it is.  It also uses a counter that's appended.

```
        if s < 0.999 or s > 1.0001:
            src = tr.catalog[0]
            print('Source:', src)
            self.outputPickle("full", tr)
            for i,((x_cpu,ic_cpu), (x_gpu,ic_gpu)) in enumerate(zip(R_cpu, R_gpu)):
                print('  CPU:', x_cpu)
                print('  GPU:', x_gpu)
                s = np.sum(x_cpu * x_gpu) / np.sqrt(np.sum(x_cpu**2) * np.sum(x_gpu**2))
                print('  similarity:', s)
                if s < 0.999 or s > 1.0001:
                  try:
                    print ("I", i, len(tr.images))
                    tim = tr.images[i]
                    print('Tim:', tim)

                    num = tt2[0]
                    fname = os.getenv('SCRATCH')+'/pickles/bad_x_'+str(num)+'.pickle'
                    while os.access(fname, os.F_OK):
                        num += 100
                        fname = os.getenv('SCRATCH')+'/pickles/bad_x_'+str(num)+'.pickle'
                    z = tr.images
                    tr.images = [tim]
                    self.outputPickle("x", tr, num=num)
                    tr.images = z

```

There are a couple issues:
1. Sometimes the first conditional - similarity for the entire source - is hit but all of the images look within tolerance.
```
CPU: [-1.94855789e-09 -2.37200258e-09 -1.00845262e-04 -5.95055369e-04
 -9.88460304e-04 -1.11022299e-03 -3.07939059e-04  1.07036752e-04
  2.11255839e-05 -9.09598352e-04]
GPU: [-1.9429132e-09 -2.3678488e-09 -2.6965767e-04 -1.1411997e-03
 -1.8144025e-03 -2.1322942e-03 -4.6383368e-04  1.2251147e-04
  2.4668923e-05 -3.1230543e-03]
GPU V: [-1.9429132e-09 -2.3678488e-09 -2.6965767e-04 -1.1411997e-03
 -1.8144025e-03 -2.1322942e-03 -4.6383368e-04  1.2251147e-04
  2.4668923e-05 -3.1230543e-03]
Similarity CPU/GPU: 0.958294125564936
Similarity GPU/V: 1.0
Times: [4.91667554e+01 2.63725516e+02 2.26906826e+02 2.73396142e+02
 8.18582860e+02 8.19188472e+02 2.72948265e-01 3.17515744e+02] [1380 3122 3122 3122 4502 4502    4 3446    0]
Source: SersicGalaxy at RaDecPos: RA, Dec = (0.16020, -0.06799) with NanoMaggies: g=21.8, r=20.5, i=20.1, z=19.8 and EllipseWithPriors(0.25): log r_e=-0.526873, ee1=-0.119724, ee2=-0.0363989, Sersic index 3.675
OUTPUTTING /pscratch/sd/c/cdwarner/pickles/bad_full_5.pickle
```
In this instance, 0.958 is pretty off - but all of the individual images has a similarity similar to
```
  similarity: 0.9999925982017274
```
2. More insidiously, sometimes the results in the log are not reproducible and are different when loading the actual pickle.  For instance:
```
CPU: [ 1.71827953e-07  7.14116073e-08  1.68543393e-04  2.05189181e-04
  3.71118189e-04  6.91424516e-04 -1.03329863e-04 -5.95335048e-04
  4.03377133e-04  2.96176253e-02]
GPU: [ 1.7212375e-07  7.2333719e-08 -2.8993364e-04 -3.7468699e-04
 -7.0493174e-04 -6.7323109e-04  2.9682931e-02 -6.1853853e-04
  4.1152083e-04 -5.6388748e-01]
GPU V: [ 1.7212375e-07  7.2333719e-08 -2.8993364e-04 -3.7468699e-04
 -7.0493174e-04 -6.7323109e-04  2.9682931e-02 -6.1853853e-04
  4.1152083e-04 -5.6388748e-01]
Similarity CPU/GPU: -0.9981266378101821
Similarity GPU/V: 1.0
Times: [3.42996228e+01 1.71076887e+02 1.47010379e+02 1.76862300e+02
 5.32594621e+02 5.32957772e+02 2.72948265e-01 2.02855842e+02] [ 934 1970 1970 1970 2904 2904    4 2118    0]
Source: SersicGalaxy at RaDecPos: RA, Dec = (0.15381, -0.07981) with NanoMaggies: g=23.3, r=23.1, i=22.4, z=22 and EllipseWithPriors(0.25): log r_e=-1.96401, ee1=0.0245634, ee2=-0.0118898, Sersic index 4.000
OUTPUTTING /pscratch/sd/c/cdwarner/pickles/bad_full_1.pickle
  CPU: [-3.53870942e-04  2.82380381e-04 -4.38462347e-01  0.00000000e+00
  0.00000000e+00  0.00000000e+00  5.49891624e+01 -2.66686901e-02
  4.85001272e-03 -9.09358765e+02]
  GPU: [-3.6021651e-04  2.8019722e-04 -3.5751897e-01  0.0000000e+00
  0.0000000e+00  0.0000000e+00 -7.0184013e+01 -2.6615649e-02
  4.9438118e-03  9.2109912e+02]
  similarity: -0.9998771358298645
I 0 36
Tim: Image c4d_171012_022428_ooi_g_ls9-S16 g
OUTPUTTING /pscratch/sd/c/cdwarner/pickles/bad_x_2.pickle
```

Here it says that bad_full_1.pickle has a very bad similarity of -0.998!  The first image in it, ` Image c4d_171012_022428_ooi_g_ls9-S16 g` has a similarity of -0.9998.

However:
```
>>> import pickle
>>> tr = pickle.load(open('/pscratch/sd/c/cdwarner/pickles/bad_full_1.pickle','rb'))
GPU POWERED
>>> tr.model_kwargs = {}
>>> print('Got', tr)
Got Tractor with 1 sources and 36 images (c4d_171012_022428_ooi_g_ls9-S16 g, c4d_171016_050038_ooi_g_ls9-S16 g, c4d_171017_042743_ooi_r_ls9-S16 r, c4d_171011_044524_ooi_z_ls9-S16 z, c4d_171011_044723_ooi_i_ls10-S16 i, c4d_180914_071654_ooi_i_ls10-N3 i, c4d_180914_071456_ooi_r_ls9-N3 r, c4d_181001_055110_ooi_z_ls9-N3 z, c4d_181109_033212_ooi_g_ls9-N3 g, c4d_161105_012819_ooi_i_ls10-S31 i, c4d_161105_013016_ooi_z_ls9-S31 z, c4d_161105_012618_ooi_r_ls9-S31 r, c4d_170918_052316_ooi_g_ls9-S31 g, c4d_141028_042632_ooi_r_ls9-S10 r, c4d_141002_055234_ooi_g_ls9-S10 g, c4d_141027_043046_ooi_r_ls9-S10 r, c4d_141115_022545_ooi_z_ls9-S10 z, c4d_141021_045954_ooi_i_ls10-S10 i, c4d_171109_012149_ooi_g_ls9-N1 g, c4d_160911_043845_ooi_i_ls10-S3 i, c4d_160911_043642_ooi_z_ls9-S3 z, c4d_160922_024456_ooi_g_ls9-S3 g, c4d_160922_024102_ooi_i_ls10-S3 i, c4d_160922_024301_ooi_r_ls9-S3 r, c4d_141029_024719_ooi_z_ls9-N31 z, c4d_141021_045554_ooi_g_ls9-N31 g, c4d_171028_034537_ooi_i_ls10-N1 i, c4d_171109_011950_ooi_r_ls9-N1 r, c4d_131012_025041_ooi_i_ls11-N6 i, c4d_131013_023631_ooi_i_ls11-N15 i, c4d_131028_014505_ooi_i_ls11-N31 i, c4d_131013_013510_ooi_z_ls9-N15 z, c4d_131123_020645_ooi_r_ls9-N6 r, c4d_130911_063001_ooi_g_ls9-N6 g, c4d_131122_012923_ooi_r_ls9-N31 r, c4d_131028_014102_ooi_r_ls9-N31 r)
>>> x = tr.optimizer.getSingleImageUpdateDirections(tr, shared_params=False)
GPU getSingleImageUpdateDirections
Running GPU code...
...
Running CPU code for comparison...
CPU time 0.20320940017700195
CPU: [ 1.72213781e-07  7.23783357e-08 -3.23055266e-04 -4.16843673e-04
 -7.82671772e-04 -7.74270657e-04  3.16002615e-02 -6.20008822e-04
  4.13360365e-04 -6.15027482e-01]
GPU: [ 1.7213013e-07  7.2359242e-08 -3.0045255e-04 -3.8802138e-04
 -7.2966097e-04 -7.0428784e-04  3.0196032e-02 -6.1912311e-04
  4.1174897e-04 -5.8618325e-01]
GPU V: [ 1.7213013e-07  7.2359242e-08 -3.0045255e-04 -3.8802138e-04
 -7.2966097e-04 -7.0428784e-04  3.0196032e-02 -6.1912311e-04
  4.1174897e-04 -5.8618325e-01]
Similarity CPU/GPU: 1.0000000042354398
Similarity GPU/V: 1.0
```
And when I select out only the first image, it is, as in the log `c4d_171012_022428_ooi_g_ls9-S16 g`

```
>>> tim = tr.images[0]
>>> tr.images = [tim]
>>> tr.model_kwargs = {}
>>> print('Got', tr)
Got Tractor with 1 sources and 1 images (c4d_171012_022428_ooi_g_ls9-S16 g)
>>> x = tr.optimizer.getSingleImageUpdateDirections(tr, shared_params=False)
GPU getSingleImageUpdateDirections
Running GPU code...
...
Running CPU code for comparison...
CPU time 0.004518985748291016
CPU: [-3.60891514e-04  2.80291628e-04 -3.58753234e-01 -6.55961561e-08
 -6.61225386e-09 -6.37968201e-08 -6.75175476e+01 -2.66080126e-02
  4.93564643e-03  8.99177917e+02]
GPU: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
GPU V: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
Similarity CPU/GPU: 0.9999999905698144
Similarity GPU/V: 1.0
```

And when I try `bad_x_2.pickle`, which is the single image version that was created the same way as above but gave -0.9998 in the log:
```
>>> tr = pickle.load(open('/pscratch/sd/c/cdwarner/pickles/bad_x_2.pickle','rb'))
>>> tr.model_kwargs = {}
>>> print('Got', tr)
Got Tractor with 1 sources and 1 images (c4d_171012_022428_ooi_g_ls9-S16 g)
>>> x = tr.optimizer.getSingleImageUpdateDirections(tr, shared_params=False)
GPU getSingleImageUpdateDirections
Running GPU code...
...
Running CPU code for comparison...
CPU time 0.004496097564697266
CPU: [-3.60891514e-04  2.80291628e-04 -3.58753234e-01 -6.55961561e-08
 -6.61225386e-09 -6.37968201e-08 -6.75175476e+01 -2.66080126e-02
  4.93564643e-03  8.99177917e+02]
GPU: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
GPU V: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
Similarity CPU/GPU: 0.9999999905698144
Similarity GPU/V: 1.0
```

Notice that the GPU results for the individual image are close to what is in the log but not exactly the same, with one major difference - in the log version, indices 3, 4, and 5 are all 0!  In the CPU results this is also true but the values that are nonzero are also more different than the GPU version.  For the `bad_full_1.pickle` there are no zeros in either the command line or the log but the values are definitely a bit different, both GPU and CPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarities not reproducible - likely due to changes in tr object as operated on by code #122

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Similarities not reproducible - likely due to changes in tr object as operated on by code #122

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions