Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 1.26 KB

File metadata and controls

18 lines (12 loc) · 1.26 KB

Linear Model Gotchas with Python vs R

Problem

  • You would think the ordinary least squares regression outputs would be consistent and deterministic between Python and R given that the solution to multiple linear regression is pretty simple and elegant.

Discussion

A design matrix $X_{n \times p}$ is rank-deficient if (i) $p&gt;n$ or (ii) $n &gt; p$ but rank(X) < p (e.g., some columns are identical). When $X$ is rank deficient, solutions to the least squares problem are not unique. That is, the set $B = { \tilde{\beta}: | y - X \tilde{\beta}|^2 = \min_{\beta} | y - X \beta|^2}$ contains more than one element.

  • R picks one of the most sparse solutions (well, such solutions may not be unique either), i.e., pick $\beta$ from the set $B$ with the smallest $|\beta|_0$.

  • Python picks $\beta$ from the set $B$ with the smallest $|\beta|_2$.