Skip to content

Analytic clustered SEs #26

@grantmcdermott

Description

@grantmcdermott

As discussed offline, I just added these to dbreg. See here for the implementation and some examples (validated against fixest). Should be a relatively low lift to adapt to Python. grantmcdermott/dbreg#29

(You want to look at the main dbreg.R file. Apologies; it's gotten long and unwieldy. I plan to modularize soon.)

I haven't benchmarked against the current bootstrap approach of duckreg, but I'm pretty pleased with the example I highlight in the PR (and updated README). Clustered SEs on a 180m row NYC taxi data in < 3 seconds :-)

dbreg(
   tip_amount ~ fare_amount + passenger_count | month + vendor_name,
   path     = "read_parquet('nyc-taxi/**/*.parquet')",
   vcov     = ~month,    # clustered SEs
   strategy = "compress" # skip auto strategy overhead
)
#> [dbreg] Using strategy: compress
#> [dbreg] Executing compress strategy SQL
#>
#> Compressed OLS estimation, Dep. Var.: tip_amount 
#> Observations.: 178,544,324 (original) | 70,782 (compressed)
#> Standard Errors: Clustered (12 clusters)
#>                  Estimate Std. Error  t value   Pr(>|t|)    
#> fare_amount      0.106744   0.000657 162.4934  < 2.2e-16 ***
#> passenger_count -0.029086   0.001030 -28.2278 1.2923e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.7                 Adj. R2: 0.243549

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions