acknowledgements #1158

s3alfisc · 2026-02-01T20:21:28Z

Description of other software packages that have influenced pyfixest.

Implement fixest's Irons-Tuck-Grand acceleration algorithm for high-dimensional fixed effects demeaning in Rust. This is a coefficient-space iterative method that provides significant speedups over naive alternating projections. Key features: - Irons-Tuck acceleration with grand acceleration steps - Support for 2-FE and 3+ FE cases with optimized projectors - Algorithm aligned with R fixest implementation - Auto-vectorized loops (no explicit SIMD dependencies) Reference: https://github.com/lrberge/fixest (CCC_demean.cpp)

Performance improvements to the accelerated demeaning implementation: - Optimize memory layout and share FEInfo across columns - Add SSR (sum of squared residuals) stopping criterion for 2-FE - Loop unrolling for 3-FE projection hot paths - Align tolerance default with fixest (1e-6 instead of 1e-8)

Restructure the Rust demeaning code for clarity and maintainability: - Introduce Projector trait for FE-specific projection strategies - Introduce Demeaner trait for high-level solver strategies - Unified DemeanBuffers struct for scratch space management - Replace unsafe pointer code with safe iterator-based implementations - Move related functions into appropriate impl blocks

Eliminate Python/numba overhead in the estimation pipeline: - Implement detect_singletons in Rust to avoid numba JIT compilation - Add Python wrapper maintaining API compatibility - Optimize factorize() using pd.factorize instead of category conversion - Replace slow df.isin() with np.isinf() for infinite value detection

Testing and code quality improvements: - Add edge case tests for demean_accelerated - Implement buffer reuse via for_each_init pattern - Extract MultiFEBuffers struct for better readability - Refactor Demeaner trait to own context and config references

Remove unnecessary abstractions after experimentation phase: - Remove Accelerator trait in favor of direct IronsTuckGrand impl - Move config into IronsTuckGrand struct - Consolidate ConvergenceState and related types - Update to PyO3 0.26 API (allow_threads -> detach)

Connect the new demean_accelerated module to Python and polish: - Wire rust backend to use demean_accelerated instead of simple demean - Fix MultiFE early convergence bug in 3+ FE demeaning - Rename scatter/gather to apply_design_matrix for clarity - Avoid per-column copy for Fortran-ordered input arrays - Add type cast guard and #[inline(always)] on hot methods

Replace the simple alternating projections implementation with the accelerated Irons-Tuck algorithm as the sole Rust demean backend. Changes: - Remove src/demean.rs (old simple implementation) - Update demean.py to call _demean_accelerated_rs - Remove demean_accelerated.py (was only needed during development) - Update backends.py and demean_.py imports - Clean up tests to remove redundant fixtures The public Python API is unchanged - users calling demean() or using the "rust" backend get the accelerated implementation transparently. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Reorder fixed effects by number of groups (largest first) to match fixest's default `fixef.reorder = TRUE` behavior. This improves convergence for 3+ FE cases by making the 2-FE sub-convergence phase work on the largest FEs first. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add coef_sums_buffer to SingleFEDemeaner, TwoFEDemeaner, and MultiFEBuffers - Change apply_design_matrix_t to write to caller-provided buffer - Remove unnecessary in_out_2fe.to_vec() copy in MultiFEDemeaner - Rename in_out to coef_sums/coef_sums_buffer for clarity This eliminates per-column allocations: 1 for 2FE, 4 for 3+FE cases. Benchmarks show 4-12% improvement for medium-sized datasets (100K obs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Unroll the accumulate_fe_contributions loop 4x to enable better instruction-level parallelism. This produces paired loads (ldp) and reduces loop overhead, providing ~7% speedup on large 3FE demeaning workloads. Also refactor compute_ssr to reuse the optimized accumulate method. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

… nice to have

Fixed effects are now always sorted by number of groups (largest first), matching fixest's default behavior. This simplifies the API and ensures optimal convergence properties. Changes: - Remove `reorder_fe` field from `FixestConfig` - Remove `with_reorder` method from `FixedEffectsIndex` - Remove `with_config` method from `DemeanContext` - Simplify `FixedEffectsIndex::new()` to always reorder Co-Authored-By: Claude Opus 4.5 <[email protected]>

Replace `convergence_len() -> usize` with `convergence_range() -> Range<usize>` in the Projector trait. This makes the accelerator fully generic over any Projector implementation, not just FE-specific ones that check a prefix. The accelerator extracts (start, end) from the range to avoid cloning overhead. Following fixest's approach, FE projectors exclude the last FE (smallest after reordering) from convergence checking. At a fixed point, if (n_fe - 1) FEs have converged, the remaining one must also have converged. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add original_to_reordered mapping to FixedEffectsIndex for tracking how FEs are reordered internally (by size for optimal convergence) - Add fe_coefficients field to DemeanResult - Add reorder_coefficients_to_original() method to restore coefficients to the user's original FE order - Add total_coef buffer to MultiFEBuffers for accumulating coefficients across all demeaning phases (warmup, two_fe_convergence, reacceleration) - Update all demeaners to populate and return FE coefficients Co-Authored-By: Claude Opus 4.5 <[email protected]>

Test cases: - Single FE coefficient correctness - Two FE coefficient correctness - Three FE coefficient correctness (random order) - Coefficient ordering preservation (verifies coefficients match original FE order, not internal reordered order) - Weighted demeaning with coefficient extraction Co-Authored-By: Claude Opus 4.5 <[email protected]>

Rust changes: - DemeanContext now has weights: Option<ObservationWeights> - When None, uses group_counts for denominators (no per-obs multiplication) - _demean_rs binding takes weights=None by default Python changes: - demean() wrapper detects uniform weights (all equal) via np.allclose - Passes None to Rust when weights are uniform, enabling fast path - Public API unchanged (weights parameter still required) This saves memory (no per-obs weight storage) and computation (no weight multiplication in scatter operations) for unweighted regression. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Refactor/demean accelerated

Remove manual 4x loop unrolling from compute_ssr methods in TwoFEProjector and MultiFEProjector. LLVM auto-vectorizes simple loops effectively, making manual unrolling unnecessary complexity. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Previously, fixed effects were always reordered by size (largest first) during demeaning. This adds a `reorder_fe` boolean parameter that allows users to control this behavior. Default is `false` (no reordering). Co-Authored-By: Claude Opus 4.5 <[email protected]>

codecov · 2026-02-01T20:35:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
core-tests	`74.97% <100.00%> (+0.05%)`	⬆️
tests-extended	`?`
tests-vs-r	`17.48% <31.03%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
pyfixest/core/__init__.py	`100.00% <100.00%> (ø)`
pyfixest/core/demean.py	`100.00% <100.00%> (ø)`
pyfixest/core/detect_singletons.py	`100.00% <100.00%> (ø)`
pyfixest/estimation/__init__.py	`100.00% <100.00%> (ø)`
pyfixest/estimation/backends.py	`72.41% <100.00%> (ø)`
pyfixest/estimation/demean_.py	`54.91% <100.00%> (ø)`
pyfixest/estimation/feols_.py	`86.83% <ø> (-4.56%)`	⬇️
pyfixest/estimation/model_matrix_fixest_.py	`90.95% <100.00%> (-0.55%)`	⬇️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

schroedk and others added 30 commits January 4, 2026 23:35

Minor grammer and typo fixes

a39ab4b

documentation clarifications in types.rs

28eaf83

document ssc = 0 convergence reason

1610a70

Rename coef to omega in Irons-Tuck accelerate for clarity

06ef560

DemeanResult struct does not contain coefficients (though it would be…

de60290

… nice to have

Refactor Gauss-Seidel sweeper and cache FE slices

1e11f97

Merge pull request #1 from schroedk/refactor/demean_accelerated

f25a9ac

Refactor/demean accelerated

inital commit

4c18a8e

adjustments

4f51c4a

Merge branch 'master' into geneology

8277033

updates

bdad2d2

updates

277d421

s3alfisc added 2 commits February 1, 2026 21:16

more updates

dd053d5

adjustments

856aba2

s3alfisc closed this Feb 1, 2026

s3alfisc deleted the geneology branch February 1, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acknowledgements #1158

acknowledgements #1158

Uh oh!

s3alfisc commented Feb 1, 2026

Uh oh!

codecov bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

acknowledgements #1158

acknowledgements #1158

Uh oh!

Conversation

s3alfisc commented Feb 1, 2026

Uh oh!

codecov bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 1, 2026 •

edited

Loading