-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi. I'm seeing divide by zero warning and overflow warning under certain circumstances. It can happen if you have a data column with all zeros and run Corex with a gaussian marginal and smooth marginals turned off.
Eg:
X = np.array([[0,0,0,1,1], # A matrix with rows as samples and columns as variables.
[0,0,0,1,1],
[0,1,1,0,0],
[0,1,1,0,0]])
layer1 = Corex(n_hidden=2, dim_hidden=2, marginal_description='gaussian', smooth_marginals=False, n_repeat=10, n_cpu=10 )
layer1.fit(my_data) # Fit on data.
This produces following output (this doesn't happen every run depending on random numbers):

I had a look through the code and did some testing and this error seems to happen in the marginal_p function when sig = 0. Therefore as proposed solution I suggest the clip the minimum value of sig_ml to 0.5 in the estimate parameters function. I don't have a good a-priori reason to pick the value 0.5, however it seems to work well in tests. I have enacted this solution in this fork: https://github.com/jpkrooney/bio_corex
Note also that the issue seems to also allow negative TCS values to occur:

After implementing the 0.5 mimimum value for sig:

The second run with the 0.5 limit in place also converges in less iterations.
I will submit my fork for a PR but I encourage you to do more testing also!
Greg, I wonder if this might partly explain the issues you noticed with zero-inflated data ?