Skip to content

Feat: Optional multi-collinearity check based on Residual Variance #1139

@s3alfisc

Description

@s3alfisc

Based on discussions in #1042 (comment), we noticed that FixedEffectsModels.jl uses a different multicollinearity check than fixest(which we follow in pyfixest).

Instead of computing a cholesky decomposition after demeaning, they simply compare the "residual variance" of X_demeaned with the "initial variance" of X.

We would like to add the residual variance check as an optional check. See here for the implementation: link

API

We will do so by introducing a new argument to feols, fepois, feglm, quantreg which we might just name collin_tol2.
If it is None, we do not apply the additional check - otherwise we do.

Note: not sure if this is the best option? Maybe introduce collin_kwargs as a new argument in a non-API breaking way and announce that we will deprecate collin_tol if users provide a value?

Math

Let $x_i$ denote the $i$-th column of the design matrix $X$, and $\tilde{x}_i$ the demeaned counterpart.

We then compute the ratio of residual to original sum of squares:

$$\rho_i = \frac{|\tilde{x}_i|^2}{|x_i|^2} = \frac{\sum_{j=1}^{n} \tilde{x}_{ij}^2}{\sum_{j=1}^{n} x_{ij}^2}$$

We flag a variable as multi collinear if

rho_i less than collin_tol2.

Comparison:

Note that the FixedEffectsModels.jl method only checks for multicollinearity of X with the fixed effects, while cholesky also checks across the covariates in X.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good-issue-to-startGood first issue for newcomers interested in contributing to pyfixest. Quick wins possible! =)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions