Skip to content

Conversation

@s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented Feb 1, 2026

Description of other software packages that have influenced pyfixest.

schroedk and others added 30 commits January 4, 2026 23:35
Implement fixest's Irons-Tuck-Grand acceleration algorithm for high-dimensional
fixed effects demeaning in Rust. This is a coefficient-space iterative method
that provides significant speedups over naive alternating projections.

Key features:
- Irons-Tuck acceleration with grand acceleration steps
- Support for 2-FE and 3+ FE cases with optimized projectors
- Algorithm aligned with R fixest implementation
- Auto-vectorized loops (no explicit SIMD dependencies)

Reference: https://github.com/lrberge/fixest (CCC_demean.cpp)
Performance improvements to the accelerated demeaning implementation:
- Optimize memory layout and share FEInfo across columns
- Add SSR (sum of squared residuals) stopping criterion for 2-FE
- Loop unrolling for 3-FE projection hot paths
- Align tolerance default with fixest (1e-6 instead of 1e-8)
Restructure the Rust demeaning code for clarity and maintainability:
- Introduce Projector trait for FE-specific projection strategies
- Introduce Demeaner trait for high-level solver strategies
- Unified DemeanBuffers struct for scratch space management
- Replace unsafe pointer code with safe iterator-based implementations
- Move related functions into appropriate impl blocks
Eliminate Python/numba overhead in the estimation pipeline:
- Implement detect_singletons in Rust to avoid numba JIT compilation
- Add Python wrapper maintaining API compatibility
- Optimize factorize() using pd.factorize instead of category conversion
- Replace slow df.isin() with np.isinf() for infinite value detection
Testing and code quality improvements:
- Add edge case tests for demean_accelerated
- Implement buffer reuse via for_each_init pattern
- Extract MultiFEBuffers struct for better readability
- Refactor Demeaner trait to own context and config references
Remove unnecessary abstractions after experimentation phase:
- Remove Accelerator trait in favor of direct IronsTuckGrand impl
- Move config into IronsTuckGrand struct
- Consolidate ConvergenceState and related types
- Update to PyO3 0.26 API (allow_threads -> detach)
Connect the new demean_accelerated module to Python and polish:
- Wire rust backend to use demean_accelerated instead of simple demean
- Fix MultiFE early convergence bug in 3+ FE demeaning
- Rename scatter/gather to apply_design_matrix for clarity
- Avoid per-column copy for Fortran-ordered input arrays
- Add type cast guard and #[inline(always)] on hot methods
Replace the simple alternating projections implementation with the
accelerated Irons-Tuck algorithm as the sole Rust demean backend.

Changes:
- Remove src/demean.rs (old simple implementation)
- Update demean.py to call _demean_accelerated_rs
- Remove demean_accelerated.py (was only needed during development)
- Update backends.py and demean_.py imports
- Clean up tests to remove redundant fixtures

The public Python API is unchanged - users calling demean() or using
the "rust" backend get the accelerated implementation transparently.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Reorder fixed effects by number of groups (largest first) to match
fixest's default `fixef.reorder = TRUE` behavior. This improves
convergence for 3+ FE cases by making the 2-FE sub-convergence
phase work on the largest FEs first.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add coef_sums_buffer to SingleFEDemeaner, TwoFEDemeaner, and MultiFEBuffers
- Change apply_design_matrix_t to write to caller-provided buffer
- Remove unnecessary in_out_2fe.to_vec() copy in MultiFEDemeaner
- Rename in_out to coef_sums/coef_sums_buffer for clarity

This eliminates per-column allocations: 1 for 2FE, 4 for 3+FE cases.
Benchmarks show 4-12% improvement for medium-sized datasets (100K obs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Unroll the accumulate_fe_contributions loop 4x to enable better
instruction-level parallelism. This produces paired loads (ldp)
and reduces loop overhead, providing ~7% speedup on large 3FE
demeaning workloads.

Also refactor compute_ssr to reuse the optimized accumulate method.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixed effects are now always sorted by number of groups (largest first),
matching fixest's default behavior. This simplifies the API and ensures
optimal convergence properties.

Changes:
- Remove `reorder_fe` field from `FixestConfig`
- Remove `with_reorder` method from `FixedEffectsIndex`
- Remove `with_config` method from `DemeanContext`
- Simplify `FixedEffectsIndex::new()` to always reorder

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace `convergence_len() -> usize` with `convergence_range() -> Range<usize>`
in the Projector trait. This makes the accelerator fully generic over any
Projector implementation, not just FE-specific ones that check a prefix.

The accelerator extracts (start, end) from the range to avoid cloning overhead.

Following fixest's approach, FE projectors exclude the last FE (smallest after
reordering) from convergence checking. At a fixed point, if (n_fe - 1) FEs
have converged, the remaining one must also have converged.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add original_to_reordered mapping to FixedEffectsIndex for tracking
  how FEs are reordered internally (by size for optimal convergence)
- Add fe_coefficients field to DemeanResult
- Add reorder_coefficients_to_original() method to restore coefficients
  to the user's original FE order
- Add total_coef buffer to MultiFEBuffers for accumulating coefficients
  across all demeaning phases (warmup, two_fe_convergence, reacceleration)
- Update all demeaners to populate and return FE coefficients

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Test cases:
- Single FE coefficient correctness
- Two FE coefficient correctness
- Three FE coefficient correctness (random order)
- Coefficient ordering preservation (verifies coefficients match
  original FE order, not internal reordered order)
- Weighted demeaning with coefficient extraction

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rust changes:
- DemeanContext now has weights: Option<ObservationWeights>
- When None, uses group_counts for denominators (no per-obs multiplication)
- _demean_rs binding takes weights=None by default

Python changes:
- demean() wrapper detects uniform weights (all equal) via np.allclose
- Passes None to Rust when weights are uniform, enabling fast path
- Public API unchanged (weights parameter still required)

This saves memory (no per-obs weight storage) and computation
(no weight multiplication in scatter operations) for unweighted regression.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove manual 4x loop unrolling from compute_ssr methods in
TwoFEProjector and MultiFEProjector. LLVM auto-vectorizes simple
loops effectively, making manual unrolling unnecessary complexity.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Previously, fixed effects were always reordered by size (largest first)
during demeaning. This adds a `reorder_fe` boolean parameter that allows
users to control this behavior. Default is `false` (no reordering).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@s3alfisc s3alfisc closed this Feb 1, 2026
@s3alfisc s3alfisc deleted the geneology branch February 1, 2026 20:22
@codecov
Copy link

codecov bot commented Feb 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
core-tests 74.97% <100.00%> (+0.05%) ⬆️
tests-extended ?
tests-vs-r 17.48% <31.03%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pyfixest/core/__init__.py 100.00% <100.00%> (ø)
pyfixest/core/demean.py 100.00% <100.00%> (ø)
pyfixest/core/detect_singletons.py 100.00% <100.00%> (ø)
pyfixest/estimation/__init__.py 100.00% <100.00%> (ø)
pyfixest/estimation/backends.py 72.41% <100.00%> (ø)
pyfixest/estimation/demean_.py 54.91% <100.00%> (ø)
pyfixest/estimation/feols_.py 86.83% <ø> (-4.56%) ⬇️
pyfixest/estimation/model_matrix_fixest_.py 90.95% <100.00%> (-0.55%) ⬇️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants