-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Hi,
As I understand it (now), providing multiple dataframes (i.e. "sequences") is for instance useful when training on a (semantically unique) sequence, over multiple, not contiguous periods (let's say because of periods with missing data). Then, state posterior (transition) probabilities are computed independently for each dataframe / period, and only then are those probabilities concatenated into 1 sample for estimating emissions and transition models parameters during de M_step.
What I don't understand though is why, if providing the exact same dataframe multiple times (.set_data([df] * nb_repetition)), I do NOT get the same fitting results than when providing it once (I checked by comparing the sequence's state posterior probabilities after training). The various regressions done during M_step use (or not) repeated samples. How could it lead to different results ? Is it somehow linked to the optimization algorithms used during fitting ?
(I reckon that's a weird test I stepped into by accident).
Any insight would be appreciated.
Thanks a lot for this and the great work !