Skip to content

[RFC] Domain: return all variables#2706

Closed
jerneju wants to merge 2 commits intobiolab:masterfrom
jerneju:domain-variables
Closed

[RFC] Domain: return all variables#2706
jerneju wants to merge 2 commits intobiolab:masterfrom
jerneju:domain-variables

Conversation

@jerneju
Copy link
Contributor

@jerneju jerneju commented Oct 24, 2017

Issue

Due to historical background Orange Data Mining has issues:

  1. domain.variables are attributes and classes but not metas
  2. There is no such thing as domain.something where that would be attributes, classes, and metas.
  3. Yes, some_meta in domain may return True, but when one would iterate over domain it does not iterate over metas.
  4. Iteration over attributes, classes, and metas or over all primitive features including metas would be useful in many widgets.

See also:
#2699 (comment)
#2699 (comment)

Description of changes
  • PrimitiveVariable (created only to be used for separating primitive and non primitive variables).

  • Domain method which returns list of all features. This method can be called with an argument which specifies which features to return (all by default).

Includes
  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Oct 24, 2017

Codecov Report

Merging #2706 into master will decrease coverage by <.01%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master    #2706      +/-   ##
==========================================
- Coverage    75.9%   75.89%   -0.01%     
==========================================
  Files         338      338              
  Lines       59594    59598       +4     
==========================================
- Hits        45233    45232       -1     
- Misses      14361    14366       +5

@astaric
Copy link
Member

astaric commented Oct 24, 2017

Great. Now please update the description of the pull request with the description of the problem (what are we trying to solve) and the proposed way to fix it. If there are alternative ways we could do this, list those as well so we can discuss and find out which of them should be implemented.

@astaric
Copy link
Member

astaric commented Oct 24, 2017

Which of the issues does this PR solve? If I understand correctly, problems 1. and 3. remain even after the fix.

Furthermore, I do not like the addition of a new "base class". Variable already has method is_primitive. Why not use that?

Are there any other ways this could be solved? What are their (dis)advantages? Why should use the fix in this PR and not one of the alternative approaches?

@janezd
Copy link
Contributor

janezd commented Oct 24, 2017

Furthermore, I do not like the addition of a new "base class". Variable already has method is_primitive. Why not use that?

So you can call domain.all(PrimitiveVariable). (Or domain.whatever(PrimitiveVariable).) That's the whole purpose - so you no longer need (var for var in chain(domain, domain.metas) if var.is_primitive).

@astaric
Copy link
Member

astaric commented Oct 24, 2017

domain.all(Variable.is_primitive)?

@janezd
Copy link
Contributor

janezd commented Oct 24, 2017

class Domain:
    ...
    def all(self, pred):
       return [var for var in chain(self, self.metas) if pred(var)]

Interesting. And it can be called all.

@astaric
Copy link
Member

astaric commented Oct 24, 2017

I would still like to know what are the alternatives. The correct way IMHO would be to fix __iter__.

@kernc
Copy link
Contributor

kernc commented Oct 24, 2017

👍² for fixing __iter__.

The predicate nature of input parameter is precisely why .filter() or .where() could work.

👎 for .all(), which has a well-established meaning throughout.

@janezd
Copy link
Contributor

janezd commented Oct 24, 2017

Fixing __iter__: I enthusiastically agree, but it will be difficult to find all places that need to be adapted.

all: sorry, I meant filter. If __iter__ is fixed, there may be no need for a filter method. If __iter__ stays broken, filter method would help.

@jerneju
Copy link
Contributor Author

jerneju commented Oct 25, 2017

Yes, fixing __iter__ would be great. @janezd ?

@janezd
Copy link
Contributor

janezd commented Oct 25, 2017

Yes, fixing iter would be great. @janezd?

Me? No chance. Too risky. I've no idea how to scan through all current iterations through Domain. Any place I miss would add metas in a place where only variables and classes are expected, possibly adding non-primitive attributes to a method that deals only with primitive ones.

Besides, add-ons do this, too.

The first step towards this would be adding a warning to the existing Domain.__iter__ and leaving it there for half a year.

@jerneju
Copy link
Contributor Author

jerneju commented Oct 26, 2017

I propose:

  1. Adding filter method.
  2. Changing iteration over domain to iteration over domain.variables wherever we can find these occurrences.
  3. Adding warning to the __iter__ method.
  4. Changing __iter__ someday in the future.

@astaric
Copy link
Member

astaric commented Oct 26, 2017

I am not too fond of adding methods we plan to remove in half a year (once __iter__ starts working the way it is supposed to).

@astaric
Copy link
Member

astaric commented Oct 26, 2017

@janezd: #2714 replaces all (tested) iterations over domain with iteration over domain.variables

@janezd
Copy link
Contributor

janezd commented Oct 27, 2017

I still like filter. It's not a temporary solution: domain.filter(DiscreteVariable) is so much shorter and simpler than (var for var in domain if isinstance(var, DiscreteVariable).

I don't mind if it's called domain.get_variables_by_type(DiscreteVariable) (OK, this is stupid), but I think a function like this would be generally useful.

@kernc
Copy link
Contributor

kernc commented Oct 27, 2017

for var in domain if var.is_discrete.

To be more generally useful, .filter() would need to accept a predicate.(builtins.filter())

@janezd
Copy link
Contributor

janezd commented Oct 27, 2017

filter(Variable.is_discrete, domain)? Not too happy, but OK. We close this PR?

@kernc
Copy link
Contributor

kernc commented Oct 27, 2017

Why aren't you happy? What's wrong with for var in domain if var.is_discrete?

@janezd
Copy link
Contributor

janezd commented Oct 27, 2017

You mean (var for var in domain if var.is_discrete)? domain.filter(DiscreteVariable) is shorter and more readable (under reasonable assumptions that one can easily guess what Domain.filter does).

But I'm OK with (builtin) filter(Variable.is_discrete, domain) if you don't want to multiply the number of Domain's methods.

@astaric
Copy link
Member

astaric commented Oct 27, 2017

I really really really dislike the artificial PrimitiveVariable base class. Does domain.filter make sense without it?

@kernc
Copy link
Contributor

kernc commented Oct 27, 2017

We already have Variable.is_primitive ...

@janezd
Copy link
Contributor

janezd commented Oct 27, 2017

I really really really dislike the artificial PrimitiveVariable base class. Does domain.filter make sense without it?

Not much. But it's not so artificial: Variable can be either PrimitiveVariable or NonPrimitiveVariable. Primitive can be Discrete or Continuous. There is a hierarchy. There are just no specific common methods for PrimitiveVariable. Think about PrimitiveVariable as an ABC.

We already have Variable.is_primitive

... which can be used as a predicate if filter also supports predicates -- which it should, anyway.

@janezd
Copy link
Contributor

janezd commented Oct 27, 2017

Assuming we will some day fix Domain.__iter__, we will be able to use filter(Variable.is_primitive, domain).

With this PR we can already have domain.filter(Variable.is_primitive) or domain.filter(PrimitiveVariable).

Without any obvious advantage of the latter over the former, I'm applying Occam here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants