Skip to content

[FIX] Fix Chi2 computation for variables with values with no instances#2031

Merged
astaric merged 1 commit intobiolab:masterfrom
jerneju:ghissue-2020
Feb 21, 2017
Merged

[FIX] Fix Chi2 computation for variables with values with no instances#2031
astaric merged 1 commit intobiolab:masterfrom
jerneju:ghissue-2020

Conversation

@jerneju
Copy link
Contributor

@jerneju jerneju commented Feb 20, 2017

Chi-squared test is nan when there are attributes which are not in the data. It is caused by division by zero because code does not calculate limits. It actually suppose to be 0.

  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Feb 20, 2017

Codecov Report

Merging #2031 into master will decrease coverage by -1.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2031      +/-   ##
==========================================
- Coverage    70.7%   69.69%   -1.01%     
==========================================
  Files         343      343              
  Lines       54469    54478       +9     
==========================================
- Hits        38510    37967     -543     
- Misses      15959    16511     +552

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f7a94a...0d67dc1. Read the comment docs.

self.residuals = \
(self.observed - self.expected) / np.sqrt(self.expected)
where_are_NaNs = np.isnan(self.residuals)
self.residuals[where_are_NaNs] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh-2031
Check if it can calculate chi square when there are no attributes which suppose to be.
"""
tempdir = tempfile.mkdtemp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating files in code, you could either commit the file along the test, or even better create the table directly with something along the lines of:

a, b = Orange.data.DiscreteVariable("a", values=["y", "n"]), Orange.data.DiscreteVariable("b", values=["y", "n", "other"])
t = Orange.data.Table(Orange.data.Domain([a, b], list(zip("yynny", "ynyyn"))))

…which suppose to be

Chi-squared test is nan when there are attributes which are not in the data. It is caused by division by zero because  code does not calculate limits. It actually suppose to be 0.

Check if there is NaN in the array and then change that value to 0.

- [X] Code changes
- [X] Tests
- [ ] Documentation
@astaric astaric changed the title [FIX] ghissue-2020 Chisq not calculated when there are no attributes … [FIX] Fix Chi2 computation for variables with values with no instances Feb 21, 2017
@astaric astaric merged commit 4a22791 into biolab:master Feb 21, 2017
@jerneju jerneju deleted the ghissue-2020 branch April 20, 2017 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants