-
Notifications
You must be signed in to change notification settings - Fork 75
acknowledgements #1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
acknowledgements #1158
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implement fixest's Irons-Tuck-Grand acceleration algorithm for high-dimensional fixed effects demeaning in Rust. This is a coefficient-space iterative method that provides significant speedups over naive alternating projections. Key features: - Irons-Tuck acceleration with grand acceleration steps - Support for 2-FE and 3+ FE cases with optimized projectors - Algorithm aligned with R fixest implementation - Auto-vectorized loops (no explicit SIMD dependencies) Reference: https://github.com/lrberge/fixest (CCC_demean.cpp)
Performance improvements to the accelerated demeaning implementation: - Optimize memory layout and share FEInfo across columns - Add SSR (sum of squared residuals) stopping criterion for 2-FE - Loop unrolling for 3-FE projection hot paths - Align tolerance default with fixest (1e-6 instead of 1e-8)
Restructure the Rust demeaning code for clarity and maintainability: - Introduce Projector trait for FE-specific projection strategies - Introduce Demeaner trait for high-level solver strategies - Unified DemeanBuffers struct for scratch space management - Replace unsafe pointer code with safe iterator-based implementations - Move related functions into appropriate impl blocks
Eliminate Python/numba overhead in the estimation pipeline: - Implement detect_singletons in Rust to avoid numba JIT compilation - Add Python wrapper maintaining API compatibility - Optimize factorize() using pd.factorize instead of category conversion - Replace slow df.isin() with np.isinf() for infinite value detection
Testing and code quality improvements: - Add edge case tests for demean_accelerated - Implement buffer reuse via for_each_init pattern - Extract MultiFEBuffers struct for better readability - Refactor Demeaner trait to own context and config references
Remove unnecessary abstractions after experimentation phase: - Remove Accelerator trait in favor of direct IronsTuckGrand impl - Move config into IronsTuckGrand struct - Consolidate ConvergenceState and related types - Update to PyO3 0.26 API (allow_threads -> detach)
Connect the new demean_accelerated module to Python and polish: - Wire rust backend to use demean_accelerated instead of simple demean - Fix MultiFE early convergence bug in 3+ FE demeaning - Rename scatter/gather to apply_design_matrix for clarity - Avoid per-column copy for Fortran-ordered input arrays - Add type cast guard and #[inline(always)] on hot methods
Replace the simple alternating projections implementation with the accelerated Irons-Tuck algorithm as the sole Rust demean backend. Changes: - Remove src/demean.rs (old simple implementation) - Update demean.py to call _demean_accelerated_rs - Remove demean_accelerated.py (was only needed during development) - Update backends.py and demean_.py imports - Clean up tests to remove redundant fixtures The public Python API is unchanged - users calling demean() or using the "rust" backend get the accelerated implementation transparently. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Reorder fixed effects by number of groups (largest first) to match fixest's default `fixef.reorder = TRUE` behavior. This improves convergence for 3+ FE cases by making the 2-FE sub-convergence phase work on the largest FEs first. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add coef_sums_buffer to SingleFEDemeaner, TwoFEDemeaner, and MultiFEBuffers - Change apply_design_matrix_t to write to caller-provided buffer - Remove unnecessary in_out_2fe.to_vec() copy in MultiFEDemeaner - Rename in_out to coef_sums/coef_sums_buffer for clarity This eliminates per-column allocations: 1 for 2FE, 4 for 3+FE cases. Benchmarks show 4-12% improvement for medium-sized datasets (100K obs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Unroll the accumulate_fe_contributions loop 4x to enable better instruction-level parallelism. This produces paired loads (ldp) and reduces loop overhead, providing ~7% speedup on large 3FE demeaning workloads. Also refactor compute_ssr to reuse the optimized accumulate method. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixed effects are now always sorted by number of groups (largest first), matching fixest's default behavior. This simplifies the API and ensures optimal convergence properties. Changes: - Remove `reorder_fe` field from `FixestConfig` - Remove `with_reorder` method from `FixedEffectsIndex` - Remove `with_config` method from `DemeanContext` - Simplify `FixedEffectsIndex::new()` to always reorder Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace `convergence_len() -> usize` with `convergence_range() -> Range<usize>` in the Projector trait. This makes the accelerator fully generic over any Projector implementation, not just FE-specific ones that check a prefix. The accelerator extracts (start, end) from the range to avoid cloning overhead. Following fixest's approach, FE projectors exclude the last FE (smallest after reordering) from convergence checking. At a fixed point, if (n_fe - 1) FEs have converged, the remaining one must also have converged. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add original_to_reordered mapping to FixedEffectsIndex for tracking how FEs are reordered internally (by size for optimal convergence) - Add fe_coefficients field to DemeanResult - Add reorder_coefficients_to_original() method to restore coefficients to the user's original FE order - Add total_coef buffer to MultiFEBuffers for accumulating coefficients across all demeaning phases (warmup, two_fe_convergence, reacceleration) - Update all demeaners to populate and return FE coefficients Co-Authored-By: Claude Opus 4.5 <[email protected]>
Test cases: - Single FE coefficient correctness - Two FE coefficient correctness - Three FE coefficient correctness (random order) - Coefficient ordering preservation (verifies coefficients match original FE order, not internal reordered order) - Weighted demeaning with coefficient extraction Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rust changes: - DemeanContext now has weights: Option<ObservationWeights> - When None, uses group_counts for denominators (no per-obs multiplication) - _demean_rs binding takes weights=None by default Python changes: - demean() wrapper detects uniform weights (all equal) via np.allclose - Passes None to Rust when weights are uniform, enabling fast path - Public API unchanged (weights parameter still required) This saves memory (no per-obs weight storage) and computation (no weight multiplication in scatter operations) for unweighted regression. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Refactor/demean accelerated
Remove manual 4x loop unrolling from compute_ssr methods in TwoFEProjector and MultiFEProjector. LLVM auto-vectorizes simple loops effectively, making manual unrolling unnecessary complexity. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Previously, fixed effects were always reordered by size (largest first) during demeaning. This adds a `reorder_fe` boolean parameter that allows users to control this behavior. Default is `false` (no reordering). Co-Authored-By: Claude Opus 4.5 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 13 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of other software packages that have influenced pyfixest.