-
Notifications
You must be signed in to change notification settings - Fork 1
Description
One thing that may overshadow all of this -- the zero RMSE problem.
I decided to look at the .csvs from this run. I'm looking at the TCW one, and I filtered it down to just our favorite cluster 37. And lo and behold, there are a ton of runs that have RMSE zero because it was just a straight line. Worse, the reason those are zero is because the recovery threshold is set to 0.25 or 0.5. I think roughly half the fits for cluster 37 are like this!!!!
In the TCW spreadsheet across all 200 points and 144 runs, only 700 out of more than 31,000 runs have the Zero RMSE problem. BUT, that RMSE issue is going to show up in the exactly the situation where we are judging our outputs -- where the water comes in for a reservoir! Because in those situations, the apparently signal is one of very fast recovery -- steep increase TCW. And so everywhere that sees a steep increase will preferentially have a straight line when the recovery threshold is set low, and that will result in the unfortunate RMSE of zero.
Thus, these runs with 0.25 will be preferentially chosen because they have a really good RMSE.
IN a quick check of the TCG spreadsheet, the problem seems to be even worse. Adn again, MOST of the times where the RMSE is showing up, it's where the recovery_threshold is 0.25.
Which means -- we NEED to fix the RMSE problem, no matter what else we find. We can calculate it from the fitted and source values -- we just need to find where in the workflow would be easiest to do that.
I suspect the easiest place would be in ltop_lt_paramater_scoring_01.py in the dfprep function. WE would need to read in each line and check for the RMSE.
Originally posted by @ramblingrek in #6 (comment)