Skip to content

Similarities not reproducible - likely due to changes in tr object as operated on by code #122

@craigwarner-ufastro

Description

@craigwarner-ufastro

I have every source that has a similarity < 0.999 or > 1.0001 saved as a pickle file. I also include in the code now saving an individual image when its similarity is outside of that range. I created a new method outputPickle() to make this easier that replaces actually dumping tr into a pickle. Its first argument is a filename fragment to identify what type of pickle it is. It also uses a counter that's appended.

        if s < 0.999 or s > 1.0001:
            src = tr.catalog[0]
            print('Source:', src)
            self.outputPickle("full", tr)
            for i,((x_cpu,ic_cpu), (x_gpu,ic_gpu)) in enumerate(zip(R_cpu, R_gpu)):
                print('  CPU:', x_cpu)
                print('  GPU:', x_gpu)
                s = np.sum(x_cpu * x_gpu) / np.sqrt(np.sum(x_cpu**2) * np.sum(x_gpu**2))
                print('  similarity:', s)
                if s < 0.999 or s > 1.0001:
                  try:
                    print ("I", i, len(tr.images))
                    tim = tr.images[i]
                    print('Tim:', tim)

                    num = tt2[0]
                    fname = os.getenv('SCRATCH')+'/pickles/bad_x_'+str(num)+'.pickle'
                    while os.access(fname, os.F_OK):
                        num += 100
                        fname = os.getenv('SCRATCH')+'/pickles/bad_x_'+str(num)+'.pickle'
                    z = tr.images
                    tr.images = [tim]
                    self.outputPickle("x", tr, num=num)
                    tr.images = z

There are a couple issues:

  1. Sometimes the first conditional - similarity for the entire source - is hit but all of the images look within tolerance.
CPU: [-1.94855789e-09 -2.37200258e-09 -1.00845262e-04 -5.95055369e-04
 -9.88460304e-04 -1.11022299e-03 -3.07939059e-04  1.07036752e-04
  2.11255839e-05 -9.09598352e-04]
GPU: [-1.9429132e-09 -2.3678488e-09 -2.6965767e-04 -1.1411997e-03
 -1.8144025e-03 -2.1322942e-03 -4.6383368e-04  1.2251147e-04
  2.4668923e-05 -3.1230543e-03]
GPU V: [-1.9429132e-09 -2.3678488e-09 -2.6965767e-04 -1.1411997e-03
 -1.8144025e-03 -2.1322942e-03 -4.6383368e-04  1.2251147e-04
  2.4668923e-05 -3.1230543e-03]
Similarity CPU/GPU: 0.958294125564936
Similarity GPU/V: 1.0
Times: [4.91667554e+01 2.63725516e+02 2.26906826e+02 2.73396142e+02
 8.18582860e+02 8.19188472e+02 2.72948265e-01 3.17515744e+02] [1380 3122 3122 3122 4502 4502    4 3446    0]
Source: SersicGalaxy at RaDecPos: RA, Dec = (0.16020, -0.06799) with NanoMaggies: g=21.8, r=20.5, i=20.1, z=19.8 and EllipseWithPriors(0.25): log r_e=-0.526873, ee1=-0.119724, ee2=-0.0363989, Sersic index 3.675
OUTPUTTING /pscratch/sd/c/cdwarner/pickles/bad_full_5.pickle

In this instance, 0.958 is pretty off - but all of the individual images has a similarity similar to

  similarity: 0.9999925982017274
  1. More insidiously, sometimes the results in the log are not reproducible and are different when loading the actual pickle. For instance:
CPU: [ 1.71827953e-07  7.14116073e-08  1.68543393e-04  2.05189181e-04
  3.71118189e-04  6.91424516e-04 -1.03329863e-04 -5.95335048e-04
  4.03377133e-04  2.96176253e-02]
GPU: [ 1.7212375e-07  7.2333719e-08 -2.8993364e-04 -3.7468699e-04
 -7.0493174e-04 -6.7323109e-04  2.9682931e-02 -6.1853853e-04
  4.1152083e-04 -5.6388748e-01]
GPU V: [ 1.7212375e-07  7.2333719e-08 -2.8993364e-04 -3.7468699e-04
 -7.0493174e-04 -6.7323109e-04  2.9682931e-02 -6.1853853e-04
  4.1152083e-04 -5.6388748e-01]
Similarity CPU/GPU: -0.9981266378101821
Similarity GPU/V: 1.0
Times: [3.42996228e+01 1.71076887e+02 1.47010379e+02 1.76862300e+02
 5.32594621e+02 5.32957772e+02 2.72948265e-01 2.02855842e+02] [ 934 1970 1970 1970 2904 2904    4 2118    0]
Source: SersicGalaxy at RaDecPos: RA, Dec = (0.15381, -0.07981) with NanoMaggies: g=23.3, r=23.1, i=22.4, z=22 and EllipseWithPriors(0.25): log r_e=-1.96401, ee1=0.0245634, ee2=-0.0118898, Sersic index 4.000
OUTPUTTING /pscratch/sd/c/cdwarner/pickles/bad_full_1.pickle
  CPU: [-3.53870942e-04  2.82380381e-04 -4.38462347e-01  0.00000000e+00
  0.00000000e+00  0.00000000e+00  5.49891624e+01 -2.66686901e-02
  4.85001272e-03 -9.09358765e+02]
  GPU: [-3.6021651e-04  2.8019722e-04 -3.5751897e-01  0.0000000e+00
  0.0000000e+00  0.0000000e+00 -7.0184013e+01 -2.6615649e-02
  4.9438118e-03  9.2109912e+02]
  similarity: -0.9998771358298645
I 0 36
Tim: Image c4d_171012_022428_ooi_g_ls9-S16 g
OUTPUTTING /pscratch/sd/c/cdwarner/pickles/bad_x_2.pickle

Here it says that bad_full_1.pickle has a very bad similarity of -0.998! The first image in it, Image c4d_171012_022428_ooi_g_ls9-S16 g has a similarity of -0.9998.

However:

>>> import pickle
>>> tr = pickle.load(open('/pscratch/sd/c/cdwarner/pickles/bad_full_1.pickle','rb'))
GPU POWERED
>>> tr.model_kwargs = {}
>>> print('Got', tr)
Got Tractor with 1 sources and 36 images (c4d_171012_022428_ooi_g_ls9-S16 g, c4d_171016_050038_ooi_g_ls9-S16 g, c4d_171017_042743_ooi_r_ls9-S16 r, c4d_171011_044524_ooi_z_ls9-S16 z, c4d_171011_044723_ooi_i_ls10-S16 i, c4d_180914_071654_ooi_i_ls10-N3 i, c4d_180914_071456_ooi_r_ls9-N3 r, c4d_181001_055110_ooi_z_ls9-N3 z, c4d_181109_033212_ooi_g_ls9-N3 g, c4d_161105_012819_ooi_i_ls10-S31 i, c4d_161105_013016_ooi_z_ls9-S31 z, c4d_161105_012618_ooi_r_ls9-S31 r, c4d_170918_052316_ooi_g_ls9-S31 g, c4d_141028_042632_ooi_r_ls9-S10 r, c4d_141002_055234_ooi_g_ls9-S10 g, c4d_141027_043046_ooi_r_ls9-S10 r, c4d_141115_022545_ooi_z_ls9-S10 z, c4d_141021_045954_ooi_i_ls10-S10 i, c4d_171109_012149_ooi_g_ls9-N1 g, c4d_160911_043845_ooi_i_ls10-S3 i, c4d_160911_043642_ooi_z_ls9-S3 z, c4d_160922_024456_ooi_g_ls9-S3 g, c4d_160922_024102_ooi_i_ls10-S3 i, c4d_160922_024301_ooi_r_ls9-S3 r, c4d_141029_024719_ooi_z_ls9-N31 z, c4d_141021_045554_ooi_g_ls9-N31 g, c4d_171028_034537_ooi_i_ls10-N1 i, c4d_171109_011950_ooi_r_ls9-N1 r, c4d_131012_025041_ooi_i_ls11-N6 i, c4d_131013_023631_ooi_i_ls11-N15 i, c4d_131028_014505_ooi_i_ls11-N31 i, c4d_131013_013510_ooi_z_ls9-N15 z, c4d_131123_020645_ooi_r_ls9-N6 r, c4d_130911_063001_ooi_g_ls9-N6 g, c4d_131122_012923_ooi_r_ls9-N31 r, c4d_131028_014102_ooi_r_ls9-N31 r)
>>> x = tr.optimizer.getSingleImageUpdateDirections(tr, shared_params=False)
GPU getSingleImageUpdateDirections
Running GPU code...
...
Running CPU code for comparison...
CPU time 0.20320940017700195
CPU: [ 1.72213781e-07  7.23783357e-08 -3.23055266e-04 -4.16843673e-04
 -7.82671772e-04 -7.74270657e-04  3.16002615e-02 -6.20008822e-04
  4.13360365e-04 -6.15027482e-01]
GPU: [ 1.7213013e-07  7.2359242e-08 -3.0045255e-04 -3.8802138e-04
 -7.2966097e-04 -7.0428784e-04  3.0196032e-02 -6.1912311e-04
  4.1174897e-04 -5.8618325e-01]
GPU V: [ 1.7213013e-07  7.2359242e-08 -3.0045255e-04 -3.8802138e-04
 -7.2966097e-04 -7.0428784e-04  3.0196032e-02 -6.1912311e-04
  4.1174897e-04 -5.8618325e-01]
Similarity CPU/GPU: 1.0000000042354398
Similarity GPU/V: 1.0

And when I select out only the first image, it is, as in the log c4d_171012_022428_ooi_g_ls9-S16 g

>>> tim = tr.images[0]
>>> tr.images = [tim]
>>> tr.model_kwargs = {}
>>> print('Got', tr)
Got Tractor with 1 sources and 1 images (c4d_171012_022428_ooi_g_ls9-S16 g)
>>> x = tr.optimizer.getSingleImageUpdateDirections(tr, shared_params=False)
GPU getSingleImageUpdateDirections
Running GPU code...
...
Running CPU code for comparison...
CPU time 0.004518985748291016
CPU: [-3.60891514e-04  2.80291628e-04 -3.58753234e-01 -6.55961561e-08
 -6.61225386e-09 -6.37968201e-08 -6.75175476e+01 -2.66080126e-02
  4.93564643e-03  8.99177917e+02]
GPU: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
GPU V: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
Similarity CPU/GPU: 0.9999999905698144
Similarity GPU/V: 1.0

And when I try bad_x_2.pickle, which is the single image version that was created the same way as above but gave -0.9998 in the log:

>>> tr = pickle.load(open('/pscratch/sd/c/cdwarner/pickles/bad_x_2.pickle','rb'))
>>> tr.model_kwargs = {}
>>> print('Got', tr)
Got Tractor with 1 sources and 1 images (c4d_171012_022428_ooi_g_ls9-S16 g)
>>> x = tr.optimizer.getSingleImageUpdateDirections(tr, shared_params=False)
GPU getSingleImageUpdateDirections
Running GPU code...
...
Running CPU code for comparison...
CPU time 0.004496097564697266
CPU: [-3.60891514e-04  2.80291628e-04 -3.58753234e-01 -6.55961561e-08
 -6.61225386e-09 -6.37968201e-08 -6.75175476e+01 -2.66080126e-02
  4.93564643e-03  8.99177917e+02]
GPU: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
GPU V: [-3.6021951e-04  2.8020644e-04 -3.5781381e-01  1.1527768e-07
  1.1815857e-08  2.2702565e-07 -6.9467804e+01 -2.6615916e-02
  4.9437745e-03  9.2840247e+02]
Similarity CPU/GPU: 0.9999999905698144
Similarity GPU/V: 1.0

Notice that the GPU results for the individual image are close to what is in the log but not exactly the same, with one major difference - in the log version, indices 3, 4, and 5 are all 0! In the CPU results this is also true but the values that are nonzero are also more different than the GPU version. For the bad_full_1.pickle there are no zeros in either the command line or the log but the values are definitely a bit different, both GPU and CPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions