Skip to content

Conversation

@hv10
Copy link

@hv10 hv10 commented Nov 22, 2024

This is an attempt to fix #291

The solution I came up with after adding a test for my specific issue is to avoid the division by zero by adding OnlineStats.$\epsilon$ to the offending lines.

Only downside:
The basic test has to be modified to accept a solution which is slightly wrong (by $\epsilon$) instead of an exact match.

Aside from this it came up green for any other tests.

@joshday
Copy link
Owner

joshday commented Jan 14, 2025

This is probably fine, but why do NaNs occur when fitting the same vector? I might want this behavior to be opt-in if fitting the same vector is breaking assumptions.

@hv10
Copy link
Author

hv10 commented Jan 14, 2025

When fitting the same vector twice the variance between them is $0.0$ so all variance can be explained with just repeating the original vector as the principal component (when starting from a freshly initialized CCIPCA).
This should not be an issue for datasets where we expect at least some level of measurement noise (leading to a natural spread in the variables - and therefore variance) even if we measure the same underlying value twice.
In my case though one of my timeseries is without any measurement noise and holds a value for an amount of timesteps - leading to this issue.
As the CCIPCA method has a parameter describing it's forgetfullness it can even happen after a bunch of observations have already been fitted if we observe the same value for long enough.

I think the current implementation makes no assumption over the underlying data that would make it disallowed to fit to repeated observations.

Also: I have not checked if the issue also occurs when only one variable is of variance $0$.

Alternatively one could think about only "applying" an update to the CCIPCA when it does not lead to a eigenvalue of $0.0$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encountering NaN when fitting Vectors with CCIPCA

2 participants