-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
As discussed offline, I just added these to dbreg. See here for the implementation and some examples (validated against fixest). Should be a relatively low lift to adapt to Python. grantmcdermott/dbreg#29
(You want to look at the main dbreg.R file. Apologies; it's gotten long and unwieldy. I plan to modularize soon.)
I haven't benchmarked against the current bootstrap approach of duckreg, but I'm pretty pleased with the example I highlight in the PR (and updated README). Clustered SEs on a 180m row NYC taxi data in < 3 seconds :-)
dbreg(
tip_amount ~ fare_amount + passenger_count | month + vendor_name,
path = "read_parquet('nyc-taxi/**/*.parquet')",
vcov = ~month, # clustered SEs
strategy = "compress" # skip auto strategy overhead
)
#> [dbreg] Using strategy: compress
#> [dbreg] Executing compress strategy SQL
#>
#> Compressed OLS estimation, Dep. Var.: tip_amount
#> Observations.: 178,544,324 (original) | 70,782 (compressed)
#> Standard Errors: Clustered (12 clusters)
#> Estimate Std. Error t value Pr(>|t|)
#> fare_amount 0.106744 0.000657 162.4934 < 2.2e-16 ***
#> passenger_count -0.029086 0.001030 -28.2278 1.2923e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.7 Adj. R2: 0.243549Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels