You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-28062][ML] Avoid unnecessary copy of coefficients vector in HuberAggregator
## What changes were proposed in this pull request?
Modifies the HuberAggregator class so that a copy of the coefficients vector isn't created every time that an instance is added. Follows the approach of LeastSquaresAggregator and uses transient lazy class variable to store the reused quantities. (See apache#14109 for explanation of the use of transient lazy variables)
On the test case in the linked JIRA, this change gives an order of magnitude performance improvement reducing the time taken to fit the model from 540 to 47 seconds.
## How was this patch tested?
Existing unit tests.
See https://issues.apache.org/jira/browse/SPARK-28062 for results from running a benchmark script.
Closesapache#24880 from Andrew-Crosby/spark-28062.
Authored-by: Andrew-Crosby <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
0 commit comments