Skip to content

Commit a36bf0b

Browse files
authored
Added open issue of TS Goodness Of Fit calculations
1 parent cd90f7c commit a36bf0b

File tree

1 file changed

+40
-4
lines changed

1 file changed

+40
-4
lines changed

docs/Mathematical_Foundations.md

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -165,10 +165,6 @@ Further improvements to the implementation of the Theil-Sen is an active topic o
165165

166166
See [`/app/engine/utils/MovingWindowRegressor.js`](../app/engine/utils/MovingWindowRegressor.js)
167167

168-
## Open design issues
169-
170-
@@@@@
171-
172168
### Choices for the specific algorithms
173169

174170
#### Implementation choices in the Theil-Sen regression
@@ -198,6 +194,46 @@ Comparison across these tables shows that using the Goodness Of Fit is needed to
198194

199195
Finding a better approximation algorithm that ingores outlying values while maintaining the true data responsiveness is a subject for further improvement.
200196

197+
## Open Issues, Known problems and Regrettable design decissions
198+
199+
### Using iteration instead of running sums for Theil-Sen Goodness Of Fit
200+
201+
Currently, both the Theil-Sen regressors (i.e. `TSLinearSeries.js` and `TSQuadraticSeries.js`) iterate over the datapoints in the flank, and determine the sse and sst for the Goodness of Fit calculation. There is an alternative approach, using running sums. THis approach has the huge benefit that running sums are less CPU intensive to maintain, and don't force a sudden large calculation. Key concern here is the dragfactor calculation at the end of a recovery that iterates over a large collection (often over 200 datapoints for a Concept2 RowErg), resulting in a significant workload at one specific moment. When using running sums that are maintained throughout the recovery, it should maintain a lower profile as much of the work is done in small pieces throughout the recovery phase.
202+
203+
For linear regression, it is defined as:
204+
205+
$$ sse = \left( \sum_{i=1}^n weight_i y_i^2 \right)
206+
- \left( 2b \sum_{i=1}^n weight_i y_i \right)
207+
- \left( 2a \sum_{i=1}^n weight_i x_i y_i \right)
208+
+ \left( b^2 \sum_{i=1}^n weight_i \right)
209+
+ \left( 2ab \sum_{i=1}^n weight_i x_i \right)
210+
+ \left( a^2 \sum_{i=1}^n weight_i x_i^2 \right)
211+
212+
sst = \left( \sum_{i=1}^n weight_i y_i^2 \right) - \left( 2 * \overline{y} * \sum_{i=1}^n weight_i y_i \right) + \left( \overline{y}^2 * \sum_{i=1}^n weight_i \right)$$
213+
214+
Where $(x_i, y_i)$ is the i-th datapoint in the flank, and $weight_i$ its weight. $\overline{Y}$ is the weighted average of the entire flank in the y axis. a and b are the coefficients in $y = a x + b$
215+
216+
$$ sse = \left( \sum_{i=1}^n weight_i y_i^2 \right)
217+
- \left( 2c \sum_{i=1}^n weight_i y_i \right)
218+
- \left( 2b \sum_{i=1}^n weight_i x_i y_i \right)
219+
- \left( 2a \sum_{i=1}^n weight_i x_i^2 y_i \right)
220+
+ \left( 2bc \sum_{i=1}^n weight_i x_i \right)
221+
+ \left( 2ac \sum_{i=1}^n weight_i x_i^2 \right)
222+
+ \left( b^2 \sum_{i=1}^n weight_i x_i^2 \right)
223+
+ \left( 2ab \sum_{i=1}^n weight_i x_i^3 \right)
224+
+ \left( a^2 \sum_{i=1}^n weight_i x_i^4 \right)
225+
+ \left( c^2 \sum_{i=1}^n weight_i \right)
226+
227+
sst = \left( \sum_{i=1}^n weight_i y_i^2 \right) - \left( 2 * \overline{y} * \sum_{i=1}^n weight_i y_i \right) + \left( \overline{y}^2 * \sum_{i=1}^n weight_i \right)$$
228+
229+
Where $(x_i, y_i)$ is the i-th datapoint in the flank, and $weight_i$ its weight. $\overline{Y}$ is the weighted average of the entire flank in the y axis. a, b and c are the coefficients in $y = a x^2 + b x + c$
230+
231+
However, these implementations suffered from numerical instability. This exposed itself ar relatively small sessions (a 2500 meter row on a Concept2 RowErg) where Goodness Of Fit started to drift, and error between the iteration and running sum started to grow from $10^-15$ to $10^-2$. This latter disturbs the functioning of OpenRowingMonitor. As in the running sum variation a Goodness Of Fit over 1 was frequently encountered, we considered it very likely that it is faulty. Making the underlying `Series.js` object, that is responsible for maintaing these running sums, much more robust by forcing continuous recalculations of these running sums did not resolve this issue.
232+
233+
The current implementation thus relies on the iterative approach, despite the running sum being computationally much more efficient.
234+
235+
@@@@@
236+
201237
## References
202238

203239
<a id="1">[1]</a> Anu Dudhia, "The Physics of ErgoMeters" <http://eodg.atm.ox.ac.uk/user/dudhia/rowing/physics/ergometer.html>

0 commit comments

Comments
 (0)