title |
---|
synthdid |
This Julia
package implements the synthetic difference-in-differences estimation procedure, along with a range of inference and graphing procedures, following Arkhangelsky et al., (2021). Arkhangelsky et al. provide a code implementation in R, with accompanying materials here: synthdid.
This package is currently in beta and the functionality and interface is subject to change.
using Pkg
Pkg.add("synthdid")
summary_synth (generic function with 2 methods)
using synthdid
Y
: Outcome variable (numeric)N0
: Unit variable (numeric or string)T0
: Time variable (numeric)
Get data
setup_data = panel_matrices(data("california_prop99"));
Estimate DID
tau_hat = synthdid_estimate(setup_data.Y, setup_data.N0, setup_data.T0)
summary_synth(tau_hat, panel = setup_data);
synthdid: -15.604 +- NaN. Effective N0/N0 = 16.388/38~0.431. Effective T0/T
0 = 2.783/19 ~ 0.146. N1,T1 = 1, 12.
Plots treated and synthetic control trajectories.
p = synthdid_plot(tau_hat, year_unit_trayectory = setup_data.time)
plot(p["plot"])
Plots treated and synthetic control trajectories and overlays a 2x2 diff-in-diff diagram.
synthdid_units_plot(tau_hat, x_ticks = setup_data.names)
-
data("$(df)")
: Datasetscalifornia_prop99
: A dataset containing per-capita cigarette consumption (in packs).PENN
CPS
-
panel_matrices(panel::DataFrame; ...)
: Converts a data set in panel form to matrix format required by synthdid estimatorspanel
: A data.frame with columns consisting of units, time, outcome, and treatment indicator.- ...args
unit
: The column number/name corresponding to the unit identifier. Default is 1.time
: The column number/name corresponding to the time identifier. Default is 2.outcome
: The column number/name corresponding to the outcome identifier. Default is 3.treatment
: The column number/name corresponding to the treatment status. Default is 4.treated
:.last Should we sort the rows of Y and W so treated units are last. If FALSE, sort by unit number/name. Default is TRUE.
return
:- A
panelMatrix
constructor with entriesY
: the data matrix,N0
: the number of control units,T0
: the number of time periods before treatment,W
: the matrix of treatment indicators,names
: unit names vector,time
: time vector.
- A
-
vcov_synthdid_estimate(object; method="bootstrap", replications=200)
: Calculate Variance-Covariance Matrix for a Fitted Model Object.object
: A syntdid_est1, constructormethod
: the CI method. The default is bootstrap (warning: this may be slow on large data sets, the jackknife option is the fastest, with the caveat that it is not recommended for SC).replications
: the number of bootstrap replications
-
synthdid_estimate(Y::Matrix, N0::Int, T0::Int;...)
: Computes the synthetic diff-in-diff estimate for an average treatment effect on a treated block.Y
: the observation matrix.N0
: the number of control units (N_co in the paper). Rows 1-N0 of Y correspond to the control units.T0
: the number of pre-treatment time steps (T_pre in the paper). Columns 1-T0 of Y correspond to pre-treatment time steps.X
: an optional 3-D array of time-varying covariates. Shape should be N X T X C for C covariates.noise_level
:, an estimate of the noise standard deviation sigma. Defaults to the standard deviation of first differences of Y.eta_omega
: determines the tuning parameter zeta.omega = eta.omega * noise.level. Defaults to the value (N_tr T_post)^(1/4).eta_lambda
: analogous for lambda. Defaults to an 'infinitesimal' value 1e-6.zeta_omega
: if passed, overrides the defaultzeta_omega = eta_omega * noise.level
.zeta_lambda
: analogous for lambda.omega_intercept
: Binary. Use an intercept when estimating omega.lambda_intercept
: Binary. Use an intercept when estimating lambda.weights
: a Dict with fields lambda and omega. If non-nothingweights["lambda"]
is passed, we use them instead of estimating lambda weights. Same forweights["omega"]
.update_omega
: If true, solve for omega using the passed value of weights$omega only as an initialization. If false, use it exactly as passed. Defaults to false if a non-null value of `weights["omega"] is passed.update_lambda
: Analogous.min_decrease
: Tunes a stopping criterion for our weight estimator. Stop after an iteration results in a decrease in penalized MSE smaller thanmin_decrease^2
.max.iter
: A fallback stopping criterion for our weight estimator. Stop after this number of iterations.sparsify
: A function mapping a numeric vector to a (presumably sparser) numeric vector of the same shape, which must sum to one. If not null, we try to estimate sparse weights via a second round of Frank-Wolfe optimization initialized at sparsify (the solution to the first round ).max_iter_pre_sparsify
: Analogous to max_iter, but for the pre-sparsification first-round of optimization. Not used ifsparsify=nothing
. -return
(syntdid_est
constructor): An average treatment effect estimate withweights
andsetup
attached as attributes.weights
contains the estimated weights lambda and omega and corresponding intercepts, as well as regression coefficients beta if X is passed.setup
is a list describing the problem passed in: Y, N0, T0, X.
-
sc_estimate(Y, N0, T0, eta_omega=1e-6; kargs...)
: synthdid_estimate for synthetic control estimates.Y
the observation matrix.N0
the number of control units. Rows 1-N0 of Y correspond to the control units.T0
the number of pre-treatment time steps. Columns 1-T0 of Y correspond to pre-treatment time steps.eta
.omega determines the level of ridge regularization,zeta_omega = eta_omega * noise_level
, as in synthdid_estimate.kargs...
additional options for synthdid_estimatereturn
:syntdid_est1
constructor
-
did_estimate(Y, N0, T0; kargs...)
: synthdid_estimate for diff-in-diff estimates.Y
the observation matrix.N0
the number of control units. Rows 1-N0 of Y correspond to the control units.T0
the number of pre-treatment time steps. Columns 1-T0 of Y correspond to pre-treatment time steps.kargs...
additional options for synthdid_estimatereturn
:syntdid_est1
constructor
-
synthdid_effect_curve(estimate::synthdid_est1)
: Outputs the effect curve that was averaged to produce our estimateestimate
: output by synthdid_estimatereturn
:vector
synthdid_effect_curve
-
estimate_dgp(Y, assignment_vector, j_rank)
: Estimates the DGP parameters used in the placebo studies in Sections 3 and 5 of the synthetic difference in differences paper.Y
: an NxT matrix of outcomesassignment_vector
: Nx1 vector of treatment assignmentsj_rank
: the rank of the estimated signal component L- return:
- A
Dict
with elements F, M, Sigma, pi as described in Section 3.1.1 and an element ar_coef with the AR(2) model coefficients underlying the covariance Sigma
- A
-
simulate_dgp(parameters, N1, T1)
: Simulates data from DGPs used in the placebo studies in Sections 3 and 5 of the synthetic difference in differences paper. Described there in Section 3.1.1.parameters
: aDict
of dgp parameters (F,M,Sigma,pi) as output by estimate.dgpN1
: a cap on the number of treated units,T1
: the number of treated periods.return
:- a
Dict
with 3 elements: the outcome matrix Y, the number of control units N0, and the number of control periods T0. The first N0 rows of Y are for units assigned to control, the remaining rows are for units assigned to treatment.
- a
-
lindsey_density_estimate(x, K, deg)
: Computes a density estimator by smoothing a histogram using Poisson regression.x
: one-dimensional vector of data;K
: number of bins in the histogram;deg
: degree of natural splines used in Poisson regression;return
:- A
Dict
with 2 fields, centers and density, which are K-dimensional vectors containing the bin centers and estimated density within each bin respectively.
- A
-
synthdid_plot(estimates::synthdid_est1; ...)
: Plots treated and synthetic control trajectories and overlays a 2x2 diff-in-diff diagram of our estimator. In this overlay, the treatment effect is indicated by an arrow. The weights lambda defining our synthetic pre-treatment time period are plotted below.-
estimates
: a syntdid_est1 object -
treated_name
: the name of the treated curve that appears in the legend. Defaults to 'Treated' -
control_name
the name of the control curve that appears in the legend. Defaults to 'Synthetic control' -
treated_color
: the color of the treated curve -
control_color
: the color of the control curve -
year_unit_trayectory
: x ticks -
facet
: a Dict of the same length as estimates indicating the facet in which to plot each estimate. -
lambda_comparable
: true if the weights lambda should be plotted in such a way that the ribbons have the same mass from plot to plot, assuming the treated curve is the same. Useful for side-by-side or overlaid plots. Defaults to false if facet is not passed, true if passed. -
overlay
: a number in [0,1] defaulting to 0, -
lambda_plot_scale
: determines the scale of the plot of the weights lambda. -
line_width
: the line width. -
guide_linetype
: linetype of the vertical segments of the parallelogram -
point_size
: determines the size of the points of the parallelogram -
diagram_alpha
: determines transparency of diff-in-diff diagram -
trayectory_linewidth
: determines the trajectory line_width -
se_method
: determines the method used to calculate the standard error used for this confidence interval. if nothing, don't show the interval -
return: A Dict
- see plot:
plot(synthdid_plot(tau_hat)["plot"])
- see plot:
-
-
synthdid_units_plot(estimate::synthdid_est1; ...)
: Plots unit by unit difference-in-differences. Dot size indicates the weights omega_i used in the average that yields our treatment effect estimate.estiamte
: a syntdid_est1 objectx_ticks
: x ticksnegligible_threshold
: Unit weight threshold below which units are plotted as small, transparent xs instead of circles. Defaults to .001.negligible_alpha
: Determines transparency of those xs.se_method
: the method used to calculate standard errors for the CI. See vcov.synthdid_estimate. Defaults to 'jackknife' for speed. If nothing, don't plot a CI.
-
synthdid_rmse_plot(estimate)
: A diagnostic plot for sc.weight.fw.covariates. Plots the objective function, regularized RMSE, as a function of the number of Frank-Wolfe / Gradient steps taken.estimate
: a syntdid_est1 object
Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. Synthetic Difference in Differences, American Economic Review, December 2021.