Training on multiple IDENTICAL sequences / dataframes


Hi,

As I understand it (now), providing multiple dataframes (i.e. "sequences") is for instance useful when training on a (semantically unique) sequence, over multiple, not contiguous periods (let's say because of periods with missing data). Then, state posterior (transition) probabilities are computed independently for each dataframe / period, and only then are those probabilities concatenated into 1 sample for estimating emissions and transition models parameters during de M_step.

What I don't understand though is why, if providing the exact same dataframe multiple times (.set_data([df] * nb_repetition)), I do NOT get the same fitting results than when providing it once (I checked by comparing the sequence's state posterior probabilities after training). The various regressions done during M_step use (or not) repeated samples. How could it lead to different results ? Is it somehow linked to the optimization algorithms used during fitting ?

(I reckon that's a weird test I stepped into by accident).

Any insight would be appreciated.
Thanks a lot for this and the great work !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training on multiple IDENTICAL sequences / dataframes #43

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training on multiple IDENTICAL sequences / dataframes #43

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions