|
| 1 | +# diffindiff: Difference-in-Differences (DiD) Analysis Python Library |
| 2 | + |
| 3 | +This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated. |
| 4 | + |
| 5 | + |
| 6 | +## Author |
| 7 | + |
| 8 | +Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geowieland@googlemail.com) |
| 9 | + |
| 10 | + |
| 11 | +## Updates v2.2.4 |
| 12 | +- Bugfixes: |
| 13 | + - Spillover treatment really really works now (only relevant in rare cases) |
| 14 | + - Fixed merging in diddata.DiffData.add_covariates() (only relevant in rare cases) |
| 15 | + - Dropping missing values consequently (only relevant in rare cases) |
| 16 | + |
| 17 | + |
| 18 | +## Features |
| 19 | + |
| 20 | +- **Data preparation and pre-analysis**: |
| 21 | + - Define custom treatment and control groups as well as treatment periods |
| 22 | + - Create ready-to-fit DiD data objects |
| 23 | + - Create predictive counterfactuals |
| 24 | +- **DiD analysis**: |
| 25 | + - Perfom standard DiD analysis |
| 26 | + - Model extensions: |
| 27 | + - Staggered adoption |
| 28 | + - Multiple treatments |
| 29 | + - Two-way fixed effects models |
| 30 | + - Group- or individual-specific treatment effects |
| 31 | + - Group- or individual-specific time trends |
| 32 | + - Including covariates |
| 33 | + - Including after-treatment period |
| 34 | + - Triple Difference (DDD) |
| 35 | + - Own counterfactuals |
| 36 | + - Bonferroni correction for treatment effects |
| 37 | + - Placebo test |
| 38 | +- **Visualization**: |
| 39 | + - Plot observed and expected time course of treatment and control group |
| 40 | + - Plot expected time course of treatment group and counterfactual |
| 41 | + - Plot model coefficients with confidence intervals |
| 42 | + - Plot individual or group-specific treatment effects with confidence intervals |
| 43 | + - Visualize the temporal evolution of staggered treatments |
| 44 | +- **Diagnosis tools**: |
| 45 | + - Test for control conditions |
| 46 | + - Test for type of adoption |
| 47 | + - Test whether the panel dataset is balanced |
| 48 | + - Test for parallel trend assumption |
| 49 | + |
| 50 | + |
| 51 | +## Literature |
| 52 | + |
| 53 | + - Baker AC, Larcker DF, Wang CCY (2022) How much should we trust staggered difference-in-differences estimates? *Journal of Financial Economics* 144(2): 370-395. [10.1016/j.jfineco.2022.01.004](https://doi.org/10.1016/j.jfineco.2022.01.004) |
| 54 | + - Card D, Krueger AD (1994) Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania. *The American Economic Review* 84(4): 772-793. [JSTOR](https://www.jstor.org/stable/2677856) |
| 55 | + - de Haas S, Götz G, Heim S (2022) Measuring the effect of COVID‑19‑related night curfews in a bundled intervention within Germany. *Scientific Reports* 12: 19732. [10.1038/s41598-022-24086-9](https://doi.org/10.1038/s41598-022-24086-9) |
| 56 | + - Goodman-Bacon A (2021) Difference-in-differences with variation in treatment timing. *Journal of Econometrics* 225(2): 254-277. [10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014) |
| 57 | + - Greene WH (2012) *Econometric Analysis*. |
| 58 | + - Goldfarb A, Tucker C, Wang Y (2022) Conducting Research in Marketing with Quasi-Experiments. *Journal of Marketing* 86(3): 1-19. [10.1177/00222429221082977](https://doi.org/10.1177/00222429221082977) |
| 59 | + - Isporhing IE, Lipfert M, Pestel N (2021) Does re-opening schools contribute to the spread of SARS-CoV-2? Evidence from staggered summer breaks in Germany. *Journal of Public Economics* 198: 104426. [10.1016/j.jpubeco.2021.104426](https://doi.org/10.1016/j.jpubeco.2021.104426) |
| 60 | + - Li KT, Luo L, Pattabhiramaiah A (2024) Causal Inference with Quasi-Experimental Data. *IMPACT at JMR* November 13, 2024. [AMA](https://www.ama.org/marketing-news/causal-inference-with-quasi-experimental-data/) |
| 61 | + - Olden A (2018) What do you buy when no one's watching? The effect of self-service checkouts on the composition of sales in retail. Discussion paper FOR 3/18, Norwegian School of Economics, Norway. [http://hdl.handle.net/11250/2490886](http://hdl.handle.net/11250/2490886) |
| 62 | + - Olden A, Moen J (2022) The triple difference estimator. *The Econometrics Journal* 25(3): 531-553. [10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010) |
| 63 | + - Strassmann A, Çolak Y, Serra-Burriel M, Nordestgaard BG, Turk A, Afzal S, Puhan MA (2023) Nationwide indoor smoking ban and impact on smoking behaviour and lung function: a two-population natural experiment. *Thorax* 78(2): 144-150. [10.1136/thoraxjnl-2021-218436](https://doi.org/10.1136/thoraxjnl-2021-218436) |
| 64 | + - Villa JM (2016) diff: Simplifying the estimation of difference-in-differences treatment effects. *The Stata Journal* 16(1): 52-71. [10.1177/1536867X1601600108](https://doi.org/10.1177/1536867X1601600108) |
| 65 | + - von Bismarck-Osten C, Borusyak K, Schönberg U (2022) The role of schools in transmission of the SARS-CoV-2 virus: quasi-experimental evidence from Germany. *Economic Policy* 37(109): 87–130. [10.1093/epolic/eiac001](https://doi.org/10.1093/epolic/eiac001) |
| 66 | + - Wieland T (2025) Assessing the effectiveness of non-pharmaceutical interventions in the SARS-CoV-2 pandemic: results of a natural experiment regarding Baden-Württemberg (Germany) and Switzerland in the second infection wave. *Journal of Public Health: From Theory to Practice* 33(11): 2497-2511. [10.1007/s10389-024-02218-x](https://doi.org/10.1007/s10389-024-02218-x) |
| 67 | + - Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*. |
| 68 | + |
| 69 | + |
| 70 | +## Examples |
| 71 | + |
| 72 | +```python |
| 73 | +curfew_DE=pd.read_csv("data/curfew_DE.csv", sep=";", decimal=",") |
| 74 | +# Test dataset: Daily and cumulative COVID-19 infections in German counties |
| 75 | + |
| 76 | +curfew_data=create_data( |
| 77 | + outcome_data=curfew_DE, |
| 78 | + unit_id_col="county", |
| 79 | + time_col="infection_date", |
| 80 | + outcome_col="infections_cum_per100000", |
| 81 | + treatment_group= |
| 82 | + curfew_DE.loc[curfew_DE["Bundesland"].isin([9,10,14])]["county"], |
| 83 | + control_group= |
| 84 | + curfew_DE.loc[~curfew_DE["Bundesland"].isin([9,10,14])]["county"], |
| 85 | + study_period=["2020-03-01", "2020-05-15"], |
| 86 | + treatment_period=["2020-03-21", "2020-05-05"], |
| 87 | + freq="D" |
| 88 | + ) |
| 89 | +# Creating DiD dataset by defining groups and treatment time |
| 90 | + |
| 91 | +curfew_data.summary() |
| 92 | +# Summary of created treatment data |
| 93 | + |
| 94 | +curfew_model = curfew_data.analysis() |
| 95 | +# Model analysis of created data |
| 96 | + |
| 97 | +curfew_model.summary() |
| 98 | +# Model summary |
| 99 | + |
| 100 | +curfew_model.plot( |
| 101 | + y_label="Cumulative infections per 100,000", |
| 102 | + plot_title="Curfew effectiveness - Groups over time", |
| 103 | + plot_observed=True |
| 104 | + ) |
| 105 | +# Plot observed vs. predicted (means) separated by group (treatment and control) |
| 106 | + |
| 107 | +curfew_model.plot_effects( |
| 108 | + x_label="Coefficients with 95% CI", |
| 109 | + plot_title="Curfew effectiveness - DiD effects" |
| 110 | + ) |
| 111 | +# plot effects |
| 112 | + |
| 113 | +counties_DE=pd.read_csv("data/counties_DE.csv", sep=";", decimal=",", encoding='latin1') |
| 114 | +# Dataset with German county data |
| 115 | + |
| 116 | +curfew_data_withgroups = curfew_data.add_covariates( |
| 117 | + additional_df=counties_DE, |
| 118 | + unit_col="county", |
| 119 | + time_col=None, |
| 120 | + variables=["BL"]) |
| 121 | +# Adding federal state column as covariate |
| 122 | + |
| 123 | +curfew_model_withgroups = curfew_data_withgroups.analysis( |
| 124 | + GTE=True, |
| 125 | + group_by="BL") |
| 126 | +# Model analysis of created data |
| 127 | + |
| 128 | +curfew_model_withgroups.summary() |
| 129 | +# Model summary |
| 130 | + |
| 131 | +curfew_model_withgroups.plot_group_treatment_effects( |
| 132 | + treatment_group_only=True |
| 133 | + ) |
| 134 | +# Plot of group-specific treatment effects |
| 135 | +``` |
| 136 | + |
| 137 | +See the /tests directory for usage examples of most of the included functions. |
| 138 | + |
| 139 | + |
| 140 | +## Installation |
| 141 | + |
| 142 | +To install the package, use `pip`: |
| 143 | + |
| 144 | +```bash |
| 145 | +pip install diffindiff |
0 commit comments