Skip to content

Divide by zero and overflow bug #19

@jpkrooney

Description

@jpkrooney

Hi. I'm seeing divide by zero warning and overflow warning under certain circumstances. It can happen if you have a data column with all zeros and run Corex with a gaussian marginal and smooth marginals turned off.
Eg:

X = np.array([[0,0,0,1,1], # A matrix with rows as samples and columns as variables.
              [0,0,0,1,1],
              [0,1,1,0,0],
              [0,1,1,0,0]])

layer1 = Corex(n_hidden=2, dim_hidden=2, marginal_description='gaussian', smooth_marginals=False, n_repeat=10, n_cpu=10 )  
layer1.fit(my_data)  # Fit on data. 

This produces following output (this doesn't happen every run depending on random numbers):
Screenshot 2020-04-25 at 10 04 47

I had a look through the code and did some testing and this error seems to happen in the marginal_p function when sig = 0. Therefore as proposed solution I suggest the clip the minimum value of sig_ml to 0.5 in the estimate parameters function. I don't have a good a-priori reason to pick the value 0.5, however it seems to work well in tests. I have enacted this solution in this fork: https://github.com/jpkrooney/bio_corex

Note also that the issue seems to also allow negative TCS values to occur:
Screenshot 2020-04-25 at 10 18 08
After implementing the 0.5 mimimum value for sig:
Screenshot 2020-04-25 at 10 19 38
The second run with the 0.5 limit in place also converges in less iterations.

I will submit my fork for a PR but I encourage you to do more testing also!
Greg, I wonder if this might partly explain the issues you noticed with zero-inflated data ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions