[ENH] Enable multitarget problem types for OWTestAndScore and OWPredictions#5848
[ENH] Enable multitarget problem types for OWTestAndScore and OWPredictions#5848markotoplak merged 4 commits intobiolab:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5848 +/- ##
==========================================
- Coverage 86.29% 86.28% -0.02%
==========================================
Files 315 315
Lines 66830 66884 +54
==========================================
+ Hits 57674 57712 +38
- Misses 9156 9172 +16 |
janezd
left a comment
There was a problem hiding this comment.
I read the code to see the idea. My comments refer to what I spotted, and do not mean I like the idea. (I don't. :)
The problem is not the idea itself. I dislike is that it is rather a patch over bad overall design. We should consider some deeper changes, though I fear they lead towards discussing finally moving to pandas.
I think that in light of survival analysis and similar problems, we should consider multiple roles of variables. Currently, a variable can be an independent variable, a dependent variable or a meta, and they are stored in different matrices. We had another type (a weight), which was supposed to be unique (e.g. you cannot choose between multiple weight variables on the same data). You propose to add another role...
Going pandas would mean abandoning X, Y and metas as permanently materialized, and instead having column-based representation. At the same time, every column could be assigned a role. class_var would then be a property that would return the variable that is assigned a "target role".
We need to decide whether to continue patching or bite into pandas.
fdcee4b to
f76ef62
Compare
d9b8d98 to
63b7129
Compare
50eb9a0 to
5d4e3aa
Compare
ba37414 to
adc46c7
Compare
83163ea to
04cf2e3
Compare
8b327bd to
cc5b1be
Compare
cc5b1be to
e80e45f
Compare
b29e861 to
b9ee08b
Compare
b9ee08b to
04937ae
Compare
04937ae to
f820dea
Compare
Issue
In survival analysis, it is expected to have two target variables. First is the duration of time until the event of interest, and the second is the indicator of censorship.
Preferably, the Survival Analysis add-on would then use the same infrastructure for testing and scoring survival models as it is currently in place for classification and regression related problems. To achieve this, we need to loosen the constraint of a single target variable for the input data.
Description of changes
With this pull request, the task was not to change the interface to accommodate for all future tasks one would like to support but to:
Since the registration of Scorers is already implemented, the next step was how to find the usable scorers given the input data. The Scorer base class now holds additional [information](https://github.com/biolab/orange3/compare/master...JakaKokosar:multi_target?expand=1#diff-ebac791194327a764153704a5e2567261585ad3971623c2138be81e8c02b8da5R69) to recognise Scorers that are built-in and those implemented through add-ons.If Scorer is defined as built-in, nothing changes. For non-built-in scorers, we look into Table attributes to determine the 'problem type' of input data. For example, for the survival analysis, As Survival data widget will set class variables to the output table and set attributes of the table as follows:
{ ..., 'problem_type': 'time_to_event' }Usable scorers are those that match the same problem_type with input data. This is not necessarily the best solution and could use further debate. At this stage, no significant changes to the code-base were needed. In theory, everything else should be handled by Learners, Models and Scorers defined for related tasks biolab/orange3-survival-analysis#27.
Some examples:

Includes