-
Notifications
You must be signed in to change notification settings - Fork 34
Description
The current major blocker for getting rid of TableStatisticalModel
wrappers (#32) and the ModelFrame
/ModelMatrix
structs is that they also keep a mask for the rows that are actually included from the underlying table. It's necessary to keep these around if (possibly among other things) you want to predict
back into the original table, since you need to know where the missing values were. So just keeping the FormulaTerm
around isn't enough to have a completely stand-alone table-to-matrix transformation which can replace ModelFrame
.
A related issue is that missing values in the output are now also introduced by terms themselves, because of the lead
/lag
support.
So my current thinking is that we ought to give up on trying to never generate missing values in the output (consistent with #153) and set rows to missing where any missing value is encountered in any column of the table, and provide some kind of functionality to compute the missings mask if necessary so that consumers can decide what to do. That leads to a situation where the consumer has to do a bit more fiddly book-keeping and it also means that the terms are not fully stand-alone but I'm not sure I see an alternative at this point (except for some kind of extremely lazy architecture where the terms actually hold views of the underlying table or something like that, I think @nalimilan were talking about this at ZiF...)