Analytic clustered SEs

As discussed offline, I just added these to **dbreg**. See here for the implementation and some examples (validated against `fixest`). Should be a relatively low lift to adapt to Python. https://github.com/grantmcdermott/dbreg/pull/29

(You want to look at the main `dbreg.R` file. Apologies; it's gotten long and unwieldy. I plan to modularize soon.)

I haven't benchmarked against the current bootstrap approach of `duckreg`, but I'm pretty pleased with the example I highlight in the PR (and updated README). Clustered SEs on a 180m row NYC taxi data in < 3 seconds :-)

```r
dbreg(
   tip_amount ~ fare_amount + passenger_count | month + vendor_name,
   path     = "read_parquet('nyc-taxi/**/*.parquet')",
   vcov     = ~month,    # clustered SEs
   strategy = "compress" # skip auto strategy overhead
)
#> [dbreg] Using strategy: compress
#> [dbreg] Executing compress strategy SQL
#>
#> Compressed OLS estimation, Dep. Var.: tip_amount 
#> Observations.: 178,544,324 (original) | 70,782 (compressed)
#> Standard Errors: Clustered (12 clusters)
#>                  Estimate Std. Error  t value   Pr(>|t|)    
#> fare_amount      0.106744   0.000657 162.4934  < 2.2e-16 ***
#> passenger_count -0.029086   0.001030 -28.2278 1.2923e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.7                 Adj. R2: 0.243549
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analytic clustered SEs #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Analytic clustered SEs #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions