Skip to content

Conversation

@aherbert
Copy link
Contributor

@aherbert aherbert commented Apr 3, 2025

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2025

Codecov Report

Attention: Patch coverage is 96.58120% with 12 lines in your changes missing coverage. Please review.

Project coverage is 87.12%. Comparing base (f554608) to head (bb2ec67).
Report is 151 commits behind head on master.

Files with missing lines Patch % Lines
.../legacy/ml/clustering/KMeansPlusPlusClusterer.java 0.00% 3 Missing ⚠️
...acy/optim/nonlinear/scalar/SimulatedAnnealing.java 0.00% 3 Missing ⚠️
...mons/math4/legacy/stat/descriptive/Statistics.java 97.05% 3 Missing ⚠️
...rg/apache/commons/math4/legacy/stat/StatUtils.java 97.33% 1 Missing and 1 partial ⚠️
.../legacy/stat/descriptive/moment/VectorialMean.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #260      +/-   ##
============================================
+ Coverage     86.54%   87.12%   +0.57%     
+ Complexity     9787       89    -9698     
============================================
  Files           532      504      -28     
  Lines         35516    33488    -2028     
  Branches       6194     5831     -363     
============================================
- Hits          30738    29175    -1563     
+ Misses         3518     3192     -326     
+ Partials       1260     1121     -139     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aherbert aherbert force-pushed the refactor-descriptive branch from 1684862 to 5fd1193 Compare April 5, 2025 06:28
aherbert added 10 commits April 5, 2025 08:09
Removed percentile classes from descriptive.rank:

CentralPivotingStrategy
KthSelector
Median
MedianOf3PivotingStrategy
Percentile
PivotingStrategy
RandomPivotingStrategy
Refactor the SummaryStatistics default implementations to use
DoubleStatistics.

Removes method:

getPopulationVariance
getSecondMoment

The population variance relies the second moment implementation which
computes a statistic related to the central second moment. It is not a
standard statistic and is not supported in Commons Statistics.

Updates the min/max implementations to use Math.min/max. Previous
behaviour ignored NaN values. The change now matches with JDK stream
behaviour.
Refactor the DescriptiveStatistics default implementations to use
DoubleStatistics.

Removes method:

getPopulationVariance

The variance implementation can be overridden if desired.

Updates the min/max implementations to use Math.min/max. Previous
behaviour ignored NaN values. The change now matches with JDK stream
behaviour.
Removes redundant classes.

descriptive.moment:

- FirstMoment
- FourthMoment
- GeometricMean
- Kurtosis
- SecondMoment
- Skewness
- StandardDeviation
- ThirdMoment

Mean + Variance have been changed to only implement the weighted
evaluation interface.

descriptive.rank:
- Min
- Max

descriptive.summary:
- Sum
- SumOfLogs
- SumOfSquares

Product has been changed to only implement the weighted evaluation
interface.

The utility class StatUtils has been updated to delegate all calls to
Commons Statistics. Legacy Math exceptions have been preserved. Removes
methods to compute the variance using an existing mean:

public static double variance(double[] values, double mean, int begin,
int length)
public static double variance(double[] values, double mean)
public static double populationVariance(double[] values, double mean,
int begin, int length)
public static double populationVariance(double[] values, double mean)

Note: StatUtils has inconsistent documentation of what to return for an
empty array. The documentation states NaN but StatUtilsTest requires
otherwise:

Sum-of-squares = 0
Product = 1
Sum-of-logs = 0

This is inconsistent and has been updated to NaN for all statistics.

The class MultivariateSummaryStatistics has been updated with partial
implementations of StorelessUnivariateStatistic that delegate to Commons
Statistics.

Some test classes have been updated to pass the build after removal of
the statistic implementations.
Remove sum-of-logs, geometric mean and sum-of-squares from
SummaryStatistics for performance reasons.
The SemiVariance had many untested methods and the implementation was
bugged. This change corrects the implementation:

- from using (i=start; i<length; i++) to use i<start+length
- the use of arguments 0 and values.length to start and length for the
array sub-range method

Tests have been added for sub-range evaluation and to complete code
coverage for the class.
@aherbert aherbert force-pushed the refactor-descriptive branch from 5fd1193 to bb2ec67 Compare April 5, 2025 07:09
@aherbert aherbert merged commit bad259a into apache:master Apr 5, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants