You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: Implement MaxAbsScaler and QuantileTransformer normalizers (#317)
Implements two new specialized data normalization techniques:
**MaxAbsScaler (13 points)**
- Scales features to [-1, 1] range based on maximum absolute value
- Preserves zeros and maintains sign of values (important for sparse data)
- Formula: scaled_value = value / max(|values|)
- Includes comprehensive unit tests covering:
- Dense and sparse data
- Positive, negative, and mixed values
- Edge cases (all zeros, single values)
- Matrix and Tensor support
- Float and double type support
- Round-trip normalization/denormalization
**QuantileTransformer (21 points)**
- Non-linear transformation mapping data to uniform or normal distributions
- Robust against outliers using quantile computation
- Configurable output distribution (uniform/normal) and number of quantiles
- Formula: Maps values through empirical CDF to target distribution
- Includes comprehensive unit tests covering:
- Uniform and normal output distributions
- Skewed data and outliers
- Column-wise matrix normalization
- Rank-order preservation
- Repeated values handling
- Float and double type support
**Architecture Updates**
- Added MaxAbsScaler and QuantileTransformer to NormalizationMethod enum
- Extended NormalizationParameters with:
- MaxAbs property for MaxAbsScaler
- Quantiles list for QuantileTransformer
- OutputDistribution property for target distribution
- All implementations follow project patterns:
- Use INumericOperations<T> for arithmetic
- Use NumOps.Zero instead of default(T)
- Generic inheritance pattern
- Complete XML documentation with "For Beginners" sections
- Support for Vector, Matrix, and Tensor data structures
Resolves#317
* fix: replace linear search with binary search and add division-by-zero protection
Resolves review comments on QuantileTransformer.cs:
- Lines 406-414: Replaced O(n) linear search with O(log n) binary search
for finding quantile position. With default 1000 quantiles, this
improves performance from 1000 comparisons to ~10 comparisons per value.
- Lines 431-450: Added division-by-zero protection when consecutive
quantiles have equal values (occurs with duplicate values in data).
Returns midpoint percentile when upperValue == lowerValue.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* fix: correct INumericOperations method names in QuantileTransformer
This commit fixes pre-existing build errors in QuantileTransformer.cs by
correcting method names to match the actual INumericOperations interface:
Changes:
- Replace NumOps.Compare (doesn't exist) with inline comparator using
LessThan/GreaterThan for Array.Sort calls (lines 106-111, 144-149,
213-218, 267-272)
- Replace NumOps.LessThanOrEqual with NumOps.LessThanOrEquals (note the 's')
- Replace NumOps.GreaterThanOrEqual with NumOps.GreaterThanOrEquals (note the 's')
- Replace NumOps.ToDouble (doesn't exist) with Convert.ToDouble((object)value!)
for T to double conversions (lines 508, 530, 622)
These errors were blocking the build and are now fixed, allowing the
QuantileTransformer to compile successfully.
* refactor: fix 9 unresolved review comments in PR #411
This commit resolves all remaining unresolved review comments:
Test file improvements (7 fixes):
- MaxAbsScalerTests.cs:223,260: Replace unused `normalized` with `_` discard
- QuantileTransformerTests.cs:113,282,296,336,354: Replace unused variables with `_` discard
- Remove redundant test for invalid outputDistribution (now enforced by enum type safety)
Source file improvements (2 fixes):
- QuantileTransformer.cs:473: Simplify if/else to ternary operator for output distribution
- QuantileTransformer.cs:481: Simplify if/else to ternary operator for percentile calculation
Note: One test case uses normalized so it wasn't discarded (MaxAbsScalerTests line 109)
* feat: replace string outputDistribution with type-safe enum
This commit improves code quality and production readiness by replacing
the string-based outputDistribution parameter with a type-safe enum.
Changes:
- Created OutputDistribution enum with Uniform and Normal values
- Updated NormalizationParameters.OutputDistribution from string to enum
- Updated QuantileTransformer constructor to accept enum instead of string
- Updated all string comparisons to use enum comparisons
- Removed redundant validation code (enum provides compile-time type safety)
- Updated all test files to use OutputDistribution.Uniform/Normal
Benefits:
- Compile-time type safety (prevents typos like "unifrom")
- IntelliSense support for valid values
- Better refactoring support
- Self-documenting code
- No runtime string validation needed
* fix: handle degenerate distributions and tensor constructors
- Add degenerate distribution check in QuantileTransformer when all quantiles are identical
- Fix Tensor constructor calls in tests to use Vector instead of double[]
- Map constant features to midpoint (0.5) to avoid skewing to extreme tails
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* refactor: move outputdistribution enum to enums folder
- Move OutputDistribution.cs from src/Normalizers to src/Enums
- Update namespace from AiDotNet.Normalizers to AiDotNet.Enums
- Add using AiDotNet.Enums to NormalizationParameters.cs and QuantileTransformer.cs
- Update property type references to use unqualified OutputDistribution
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
---------
Co-authored-by: Claude <[email protected]>
0 commit comments