Skip to content

[ENH] CN2 Rules: prefer equality (with optional restriction) for categorical variables#7128

Merged
markotoplak merged 2 commits intobiolab:masterfrom
janezd:cn2-restrict-equality
Jul 25, 2025
Merged

[ENH] CN2 Rules: prefer equality (with optional restriction) for categorical variables#7128
markotoplak merged 2 commits intobiolab:masterfrom
janezd:cn2-restrict-equality

Conversation

@janezd
Copy link
Contributor

@janezd janezd commented Jul 13, 2025

Issue

Fixes #7120.

Description of changes

I haven't written any tests -- because there are none so far, so I would consider this as a separate project.

As for documentation: can do, but after we agree about this PR.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Jul 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.77%. Comparing base (5eb97b8) to head (e4b9fff).
⚠️ Report is 60 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7128   +/-   ##
=======================================
  Coverage   88.77%   88.77%           
=======================================
  Files         334      334           
  Lines       73671    73695   +24     
=======================================
+ Hits        65402    65425   +23     
- Misses       8269     8270    +1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@janezd janezd force-pushed the cn2-restrict-equality branch from 3d3209c to db3d9a2 Compare July 18, 2025 08:51
@markotoplak
Copy link
Member

markotoplak commented Jul 18, 2025

Some tests are in Orange/tests/test_rules.py and Orange/widgets/model/tests/test_owrulesclassification.py

@markotoplak
Copy link
Member

markotoplak commented Jul 18, 2025

I think I may have a more elegant solution to the original problem. When the best rule among a set of rules is chosen, it is, among the ones with the same quality score, always the first.

Now, if we just change the rule generation order... Currently it loops per value, and then per operator. If we change it to loop per operator and then per value, the selection algorithm downstream will prefer the == operator.

This approach gives the same result for the zoo dataset as with the new option selected. Here is my changed code:

-                for val in np.unique(X[:, i]):
+                for op in disc_operators:
                     possible_selectors += (
                         Selector(column=i, op=op, value=val)
-                        for op in disc_operators)
+                        for val in np.unique(X[:, i]))

The bad thing is that this approach it is not backward compatible (different rules would be induced with the same settings).

@janezd
Copy link
Contributor Author

janezd commented Jul 20, 2025

Thanks for a great find. I'd keep both.

  • Per OP, rules using inequality are less "actionable"; even if the attribute has five values, we should let the user prohibit inequality.
  • But your suggestion would give priority to equality, and should always be enabled. I don't care too much about the lost backward compatibility here.

@markotoplak
Copy link
Member

* But your suggestion would give priority to equality, and should always be enabled. I don't care too much about the lost backward compatibility here.

I also don't think compatibility is an issue here.

@markotoplak markotoplak changed the title CN2 Rules: Add restriction to == for categorical variables [ENH] CN2 Rules: prefer equality (with optional restriction) for categorical variables Jul 25, 2025
@markotoplak markotoplak merged commit 8dedb48 into biolab:master Jul 25, 2025
22 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Option to disable "not equals" conditions in CN2 Rule Induction

2 participants