Skip to content

Running multiple instances of Pymc3 scripts simultaneously causes error!Β #1463

@parashardhapola

Description

@parashardhapola

Hi,

Please see the error log below.

INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '16768')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '31361' (I am process '16768')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '31931' (I am process '16768')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '81671' (I am process '16768')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by process '81592' (I am process '16768')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/lock_dir
Traceback (most recent call last):
  File "G4_Seq_QG_overlap_switchpoint.py", line 83, in <module>
    traces.append(get_switchpoint(read_level_data[l.name][1].copy()))
  File "G4_Seq_QG_overlap_switchpoint.py", line 43, in get_switchpoint
    step = pm.Metropolis([early_rate, late_rate, switchpoint])
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/pymc3/step_methods/arraystep.py", line 60, in __new__
    step.__init__([var], *args, **kwargs)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/pymc3/step_methods/metropolis.py", line 110, in __init__
    self.delta_logp = delta_logp(model.logpt, vars, shared)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/pymc3/step_methods/metropolis.py", line 310, in delta_logp
    f = theano.function([inarray1, inarray0], logp1 - logp0)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/compile/function.py", line 320, in function
    output_keys=output_keys)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/compile/pfunc.py", line 479, in pfunc
    output_keys=output_keys)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 1777, in orig_function
    defaults)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 1641, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/link.py", line 690, in make_thunk
    storage_map=storage_map)[:3]
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/vm.py", line 1003, in make_all
    no_recycling))
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/op.py", line 970, in make_thunk
    no_recycling)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/op.py", line 879, in make_c_thunk
    output_storage=node_output_storage)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cc.py", line 1200, in make_thunk
    keep_lock=keep_lock)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cc.py", line 1143, in __compile__
    keep_lock=keep_lock)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cc.py", line 1595, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 1108, in module_from_key
    module = self._get_from_hash(module_hash, key, keep_lock=keep_lock)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 1008, in _get_from_hash
    key_data.add_key(key, save_pkl=bool(key[0]))
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 483, in add_key
    self.save_pkl()
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 504, in save_pkl
    with open(self.key_pkl, 'wb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/tmpi22zbwyw/key.pkl'
WARNING (theano.gof.cmodule): Removing key file /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/tmpbnxffomo/key.pkl because the corresponding module is gone from the file system.
WARNING (theano.gof.cmodule): A module that was loaded by this ModuleCache can no longer be read from file /home/parashar/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.7-Santiago-x86_64-3.5.2-64/tmpi22zbwyw/m5080109b7465bc969faf7603bf21e896.so... this could lead to problems.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 1466, in _on_atexit
    self.clear_old()
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 1277, in clear_old
    cleanup=False)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 945, in refresh
    key_data.delete_keys_from(self.entry_from_key)
  File "/home/parashar/anaconda3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 535, in delete_keys_from
    del entry_from_key[key]
KeyError: (((12, (3, (3, (4,), (4,)), (4,), (4,), (4,), (4,), (4,), (4,), (4,), (4,), (4,), (4,)), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), (13, '1.11.1'), ('openmp', False)), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1'), (11, 13, '1.11.1')), ('CLinker.cmodule_key', ('--param', '--param', '--param', '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION', '-O3', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fPIC', '-fno-math-errno', '-m64', '-maes', '-march=core2', '-mavx', '-mcx16', '-mpclmul', '-mpopcnt', '-msahf', '-mtune=generic', 'l1-cache-line-size=64', 'l1-cache-size=32', 'l2-cache-size=20480'), (), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=/usr/bin/g++ 4.4.7', 'md5:mcf7a57c9bb90efec81411a07c50224ca', (<theano.tensor.elemwise.Elemwise object at 0x2b2dabf61f60>, ((TensorType(int64, (True,)), ((-1, 0), False)), (TensorType(int64, vector), (('m983fd54d4e98388c1368723aa0e707a7', 0, 1), False)), (TensorType(float64, (True,)), ((-1, 2), False)), (TensorType(float64, (True,)), ((-1, 3), False)), (TensorType(int8, (True,)), (('m64cd427e9256099cfe5adc179cd8bf82', 0, 4), False)), (TensorType(int8, vector), (('m9f7ab43a62804b796c006032065d30e1', 0, 5), False)), (TensorType(float32, (True,)), (('m04dd9722ae525447134fc924204422a5', 0, 6), False)), (TensorType(float64, vector), (('mb71ebc0435c9bedc241effb12d3fcca8', 0, 7), False)), (TensorType(float64, vector), (('mc8711b416fa30c1b1eaa5fea9e6da412', 0, 8), False))), (1, (False,)))))

The function containing pymc3 code I'm using in my script:

def get_switchpoint(data):
    data[data < 0] = 0
    bases = np.array(list(range(len(data))))
    with pm.Model(verbose=False) as phredscore_model:
        switchpoint = pm.DiscreteUniform('switchpoint', lower=bases.min(),
                                         upper=bases.max())
        early_rate = pm.Exponential('early_rate', 1)
        late_rate = pm.Exponential('late_rate', 1)
        rate = pm.math.switch(switchpoint >= bases, early_rate, late_rate)
        phredscore = pm.Poisson('phredscore', rate, observed=data)
        step = pm.Metropolis([early_rate, late_rate, switchpoint])
        trace = pm.sample(4000, step=[step], progressbar=False)
    return np.array([
            trace['switchpoint'][-1000:],
            trace['early_rate'][-1000:],
            trace['late_rate'][-1000:]
    ])

I'm trying to run the script in an HPC environment and run it multiple times (approx 10K) using a wrapper script supplying arguments to it. Hence #1174 wont apply to my case. Each run instance of script iterates the pymc function thousands of time. As a quick fix can somebody also show me how to transpile this code to pymc2.

Thank you

Best regards,
Parashar

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions