v2.3.1.904: time-varying weights fix, fix_weights param, namespace cleanup by pedrohcgs · Pull Request #256 · bcallaway11/did

pedrohcgs · 2026-04-01T17:18:55Z

Summary

Fix time-varying weights mismatch between faster_mode=TRUE and faster_mode=FALSE on balanced panel data. The fast path was always using first-period weights; now it uses a weights_tensor to select the correct period's weights, matching the slow path behavior.
New fix_weights parameter in att_gt() gives users explicit control over how time-varying weights are resolved:
- NULL (default): preserves existing behavior
- "varying": per-observation weights using RC DRDID estimators (even for balanced panel)
- "base_period": fix weights at period g-1 for all cells
- "first_period": fix weights at first period in dataset
Reduced namespace pollution: replaced blanket import(stats), import(utils), import(BMisc) with selective importFrom(). The package no longer re-exports stats::filter or stats::lag, which previously masked dplyr::filter/dplyr::lag.
Fixed aggte() crash ("Error in get(gname): invalid first argument") caused by dreamerr >= 1.5.0 intercepting get() inside data.table's [ operator. Replaced with set().
Runtime message when time-varying weights are detected in balanced panel data
Inference tests now skip gracefully when did v2.1.2 is unavailable from CRAN (instead of failing)
GitHub Action to auto-bump dev version in DESCRIPTION on PR merge

Test plan

All fix_weights options match between faster_mode=TRUE and FALSE (reg, dr, ipw × varying, universal base period)
Time-invariant weights produce identical results across all fix_weights options
aggte() works when columns are named gname/tname/idname
No dplyr::filter masking after loading did
Wolfers dataset smoke test passes
User's NYC STR dataset (60K units, 25 periods) runs end-to-end
236 PASS, 0 FAIL, 8 SKIP

🤖 Generated with Claude Code

…amespace pollution, fix dreamerr/get() crash - Fix faster_mode mismatch with time-varying sampling weights on balanced panel - Add fix_weights parameter: NULL (default), "varying", "base_period", "first_period" - Replace import(stats), import(utils), import(BMisc) with selective importFrom - Fix aggte() crash from dreamerr intercepting get() inside data.table - Add runtime message for time-varying weights detection - Skip inference tests gracefully when did v2.1.2 unavailable from CRAN - Add GitHub Action to auto-bump version on PR merge - 236 PASS, 0 FAIL, 8 SKIP

Copilot

Pull request overview

This PR updates the did package’s handling of time-varying sampling weights to make faster_mode=TRUE consistent with the non-fast path, introduces a new fix_weights control knob in att_gt(), and reduces namespace pollution by switching to selective imports. It also includes a fix for an aggte() crash related to data.table usage, expands docs/NEWS, updates tests to better handle unavailable historical did versions, and adds a GitHub Action to auto-bump the dev version.

Changes:

Fix faster_mode vs non-fast mismatches for time-varying weights by introducing a per-period weights_tensor, and add fix_weights to control weight resolution behavior.
Replace problematic data.table get() usage in aggte() with set() to avoid crashes.
Switch from blanket import() to importFrom() and add/adjust tests, docs, and a version-bump workflow.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/testthat/test-inference.R	Skip inference tests when `did` v2.1.2 can’t be installed; installs old version into a temp lib.
tests/testthat/test-att_gt.R	Adds regression tests for `fix_weights`, time-varying weights consistency, and `gname/tname/idname` naming edge cases.
R/utility_functions.R	Updates `check_balance()` to avoid `get()` patterns and use safer column access.
R/pre_process_did2.R	Adds time-varying weight detection message; builds `weights_tensor`; adds `fix_weights` arg plumbing.
R/pre_process_did.R	Adds `fix_weights` arg plumbing and time-varying weight detection message for the non-fast path.
R/imports.R	Replaces blanket imports with selective `@importFrom` roxygen directives.
R/DIDparams2.R	Adds `weights_tensor` and `fix_weights` to the `DIDparams` object for faster mode.
R/DIDparams.R	Adds `fix_weights` to the `DIDparams` object for the non-fast path.
R/compute.att_gt2.R	Uses `weights_tensor` for correct period weights in faster mode; adds RC fallback for `fix_weights="varying"`.
R/compute.att_gt.R	Adds `fix_weights` behavior to the non-fast path, including RC fallback for `varying`.
R/compute.aggte.R	Replaces `get()`-inside-`[.data.table` with `set()` to avoid the reported crash.
R/att_gt.R	Documents and validates `fix_weights`; passes it through preprocessing.
NEWS.md	Adds release notes for 2.3.1.904 (weights fix, `fix_weights`, namespace cleanup, crash fix, etc.).
NAMESPACE	Converts broad imports to `importFrom()` declarations (stats/utils/BMisc).
man/pre_process_did2.Rd	Documents `fix_weights` and expanded `weightsname` semantics.
man/pre_process_did.Rd	Documents `fix_weights` and expanded `weightsname` semantics.
man/DIDparams.Rd	Documents `fix_weights` and expanded `weightsname` semantics.
man/conditional_did_pretest.Rd	Expands `weightsname` documentation (references `fix_weights`).
man/att_gt.Rd	Documents new `fix_weights` argument and time-varying weight behavior.
DESCRIPTION	Bumps package version to 2.3.1.904.
.github/workflows/bump-version.yaml	Adds an action to auto-increment the dev version on PR merge.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T17:27:53Z

R/compute.att_gt.R

+            # RC influence function has 2*n1 rows (stacked pre + post);
+            # aggregate back to unit level by summing pre and post contributions
+            inf_rc <- res$att.inf.func
+            n1_half <- length(inf_rc) %/% 2L
+            res$att.inf.func <- inf_rc[1:n1_half] + inf_rc[(n1_half + 1):(2 * n1_half)]


In the fix_weights == "varying" (RC) path, the influence function is being collapsed by splitting the vector in half and summing. But disdat_long is constructed via data[time_mask] without reordering, so in balanced panel data it will typically be interleaved by unit (pre/post alternating), not stacked as (all pre, then all post). This will mis-aggregate the influence function and lead to incorrect standard errors/inference. Reorder disdat_long to stack periods before calling the RC estimator, or aggregate att.inf.func back to unit level using the id (or by pairing consecutive pre/post rows) rather than summing halves.

Suggested change

# RC influence function has 2*n1 rows (stacked pre + post);

# aggregate back to unit level by summing pre and post contributions

inf_rc <- res$att.inf.func

n1_half <- length(inf_rc) %/% 2L

res$att.inf.func <- inf_rc[1:n1_half] + inf_rc[(n1_half + 1):(2 * n1_half)]

# RC influence function has 2 * n1_unit rows (pre and post per unit);

# aggregate back to unit level by summing consecutive pre/post contributions

inf_rc <- res$att.inf.func

n_pairs <- length(inf_rc) %/% 2L

pre_idx <- seq.int(1L, by = 2L, length.out = n_pairs)

post_idx <- pre_idx + 1L

res$att.inf.func <- inf_rc[pre_idx] + inf_rc[post_idx]

Copilot · 2026-04-01T17:27:53Z

R/pre_process_did2.R


+  # Check for time-varying weights in panel data
+  if (!is.null(args$weightsname) && args$panel) {
+    w_range <- data[, .(w_range = max(weights) - min(weights)), by = get(args$idname)]


The by = get(args$idname) grouping is avoidable and inconsistent with the newer by = c(id_col) approach used elsewhere. Consider switching to by = c(args$idname) (or similar) to avoid get() inside data.table expressions and improve robustness/performance.

Suggested change

w_range <- data[, .(w_range = max(weights) - min(weights)), by = get(args$idname)]

w_range <- data[, .(w_range = max(weights) - min(weights)), by = c(args$idname)]

Copilot · 2026-04-01T17:27:53Z

tests/testthat/test-inference.R

 dir.create(temp_lib)
-remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
+old_did_available <- tryCatch({
+  remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)


This test installs from a hard-coded HTTP CRAN mirror. Use an HTTPS mirror (e.g., https://cloud.r-project.org) to avoid potential network/proxy blocks and to match common repository policies.

Suggested change

remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)

remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)

Copilot · 2026-04-01T17:27:54Z

tests/testthat/test-inference.R

-remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
+old_did_available <- tryCatch({
+  remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)
+  TRUE


old_did_available is set to TRUE as long as install_version() doesn't throw, but installation can still fail (e.g., non-zero exit with only warnings), which will make the later library(did, lib.loc = temp_lib) calls fail. Consider verifying the install explicitly (e.g., check requireNamespace("did", lib.loc = temp_lib) or inspect the installed package version) before declaring the old version available.

Suggested change

TRUE

isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))

Copilot · 2026-04-01T17:27:54Z

.github/workflows/bump-version.yaml

+    steps:
+      - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.base.ref }}
+          token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Bump dev version in DESCRIPTION
+        run: |
+          # Extract current version
+          current=$(grep '^Version:' DESCRIPTION | sed 's/Version: //')
+          echo "Current version: $current"
+
+          # Split into parts
+          IFS='.' read -ra parts <<< "$current"
+          major="${parts[0]}"
+          minor="${parts[1]}"
+          patch="${parts[2]}"
+          dev="${parts[3]:-0}"
+
+          # Increment dev version
+          new_dev=$((dev + 1))
+          new_version="${major}.${minor}.${patch}.${new_dev}"
+          echo "New version: $new_version"
+
+          # Update DESCRIPTION
+          sed -i "s/^Version: .*/Version: ${new_version}/" DESCRIPTION
+
+          # Configure git
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+
+          # Commit and push
+          git add DESCRIPTION
+          git diff --cached --quiet || git commit -m "Bump version to ${new_version}"
+          git push


This workflow directly commits & pushes to the base branch on PR merge; if multiple PRs merge close together, pushes can race and fail with a non-fast-forward error. Consider adding a concurrency group and/or pulling/rebasing before pushing (or using a dedicated version-bump action that retries) to make the bump more reliable.

pedrohcgs

@copilot review

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

R/pre_process_did2.R:318

unit_count is now grouped with by = c(args$idname), so mis_unit no longer has a column named get. The subsequent mis_unit$get will error when dropping incomplete units. Use the actual id column from mis_unit (e.g., mis_unit[[args$idname]] / mis_unit[[1]]) instead of $get.

      unit_count <- data[, .(count = .N), by = c(args$idname)]
      if(any(unit_count[, count < raw_time_size])){
        mis_unit <- unit_count[count < raw_time_size]
        warning(nrow(mis_unit), " units are missing in some periods. Converting to balanced panel by dropping them.")
        data <- data[!get(args$idname) %in% mis_unit$get]
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T21:14:31Z

R/compute.att_gt.R

+        # Handle fix_weights for RC/unbalanced panel
+        if (!is.null(fix_weights) && fix_weights %in% c("base_period", "first_period")) {
+          # Determine which period's weight to use
+          if (fix_weights == "base_period") {
+            target_period <- tlist[pret_g]
+          } else {
+            target_period <- tlist[1]
+          }
+          # Build lookup: weight from target period per unit
+          target_rows <- data[t_col == target_period, ]
+          target_w <- stats::setNames(target_rows$.w, target_rows$.rowid)
+          # Look up weight for each observation's unit
+          w <- as.numeric(target_w[as.character(disdat$.rowid)])
+          # Drop units not observed in the target period
+          missing_w <- is.na(w)
+          if (any(missing_w)) {
+            n_dropped <- length(unique(disdat$.rowid[missing_w]))
+            warning(paste0("Dropped ", n_dropped, " units not observed in ",
+                           fix_weights, " (period ", target_period, ") ",
+                           "for group ", glist[g], " in time period ", tlist[t + tfac]))
+            disdat <- disdat[!missing_w, ]
+            G <- disdat$.G
+            C <- disdat$.C
+            Y <- disdat[[yname]]
+            post <- 1 * (disdat[[tname]] == tlist[t + tfac])
+            n1 <- sum(G + C)
+            w <- w[!missing_w]
+          }


fix_weights = "base_period"/"first_period" in the RC/unbalanced branch looks up weights by .rowid. When panel = FALSE with true repeated cross-sections, .rowid is unique per observation, so all observations from the non-target period get NA weights and are dropped, making the 2x2 comparison invalid (often resulting in NA). Consider rejecting these fix_weights options for true repeated cross-sections (or defining an alternative RC-appropriate behavior) and only applying the lookup/drop logic for unbalanced panels where units can appear in multiple periods.

Copilot · 2026-04-01T21:14:32Z

R/att_gt.R

+        !(fix_weights %in% c("varying", "base_period", "first_period"))) {
+      stop("fix_weights must be NULL or one of \"varying\", \"base_period\", or \"first_period\".")
+    }
+  }


fix_weights is validated for allowed strings, but not for compatibility with panel = FALSE true repeated cross-sections. With current implementation, fix_weights = "base_period"/"first_period" can effectively drop one period’s observations (since IDs don’t persist across time), producing invalid/NA results. Please add validation to disallow these options when panel = FALSE and the data are true repeated cross-sections (or clearly document/implement a different RC-compatible definition).

Suggested change

}

}

if (!panel && !is.null(fix_weights) &&

fix_weights %in% c("base_period", "first_period")) {

stop(paste0(

"fix_weights = \"", fix_weights, "\" is not supported when panel = FALSE ",

"(true repeated cross-sections). Use fix_weights = \"varying\" or NULL ",

"for repeated cross-section data."

))

}

Copilot · 2026-04-01T21:14:32Z

tests/testthat/test-inference.R

 temp_lib <- tempfile()
 dir.create(temp_lib)
-remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
+old_did_available <- tryCatch({
+  remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)
+  isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))
+}, error = function(e) FALSE)


This test file performs remotes::install_version() at file-load time. On CRAN/CI environments without internet (or where network access during checks is disallowed), the attempted download can still violate policies even if errors are caught and tests are later skipped. Move the install attempt inside test_that() (or a helper called after skip_on_cran()/skip_if_offline()), so the file has no side effects when sourced during R CMD check.

Copilot · 2026-04-01T21:14:32Z

tests/testthat/test-inference.R

 temp_lib <- tempfile()
 dir.create(temp_lib)
-remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
+old_did_available <- tryCatch({
+  remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)
+  isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))
+}, error = function(e) FALSE)
+
+if (!old_did_available) {
+  # Clean up and skip all tests in this file
+  unlink(temp_lib, recursive = TRUE)
+}


temp_lib is only removed when old_did_available is FALSE. When it is TRUE, the temporary library directory is left behind for the rest of the test run. Add an on.exit(unlink(temp_lib, recursive = TRUE), add = TRUE) (or equivalent teardown) to ensure cleanup regardless of install success/skips.

Copilot · 2026-04-01T21:14:33Z

.github/workflows/bump-version.yaml

+on:
+  pull_request:
+    types: [closed]
+    branches: [master, main]
+
+concurrency:
+  group: bump-version
+  cancel-in-progress: false
+
+jobs:
+  bump-version:
+    if: github.event.pull_request.merged == true
+    runs-on: ubuntu-latest
+
+    permissions:
+      contents: write
+
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.base.ref }}
+          token: ${{ secrets.GITHUB_TOKEN }}
+


This workflow attempts to push a commit back to the base branch on the pull_request event. For PRs from forks, GITHUB_TOKEN is typically read-only on pull_request workflows, so git push will fail. Consider triggering on push to the default branch (after merge) or using pull_request_target with appropriate hardening, so the workflow can reliably bump the version without permission issues.

Copilot · 2026-04-01T21:14:33Z

R/compute.att_gt.R

+            Y_rc <- disdat_long[[yname]]
+            G_rc <- disdat_long$.G
+            post_rc <- as.numeric(disdat_long[[tname]] == tlist[t + tfac])
+            w_rc <- disdat_long$.w
+            covariates_rc <- model.matrix(xformla, data = disdat_long)
+            n1_rc <- sum(G_rc + disdat_long$.C) # careful: n1 for RC is different
+
+            if (inherits(est_method, "function")) {


n1_rc is computed but never used. This is confusing in a section that is already doing special handling for the RC estimator; either remove it or use it for the subsequent scaling if that was the intent.

Copilot · 2026-04-01T21:14:33Z

R/compute.att_gt2.R

+    # Handle fix_weights for RC/unbalanced panel
+    if (!is.null(dp2$fix_weights) && dp2$fix_weights %in% c("base_period", "first_period")) {
+      if (dp2$fix_weights == "base_period") {
+        target_period <- dp2$time_periods[dp2$.pret_by_group[g]]
+      } else {
+        target_period <- dp2$time_periods[1]
+      }
+      # Build weight lookup from target period
+      tid <- dp2$time_invariant_data
+      target_mask <- tid[[dp2$tname]] == target_period
+      target_ids <- tid[[dp2$idname]][target_mask]
+      target_ws <- tid[["weights"]][target_mask]
+      target_w_lookup <- stats::setNames(target_ws, as.character(target_ids))
+      # Look up weight for each observation
+      obs_ids <- as.character(tid[[dp2$idname]])
+      fixed_w <- as.numeric(target_w_lookup[obs_ids])
+      # Units not in target period get NA weight — will be filtered in run_DRDID
+      cohort_data <- data.table(did_cohort_index, tid[[dp2$yname]], tid$post, fixed_w, tid$.rowid)
+    } else {
+      cohort_data <- data.table(did_cohort_index, dp2$time_invariant_data[[dp2$yname]], dp2$time_invariant_data$post, dp2$time_invariant_data$weights, dp2$time_invariant_data$.rowid)
+    }


In the panel = FALSE branch, the fixed-weight handling (base_period/first_period) builds a per-"unit" lookup using dp2$idname. For true repeated cross-sections, idname is set to .rowid (unique per row), so observations from the non-target period will always have missing lookups (and will be filtered out downstream), breaking the RC 2x2 comparison. This logic should be limited to unbalanced panels (persistent unit IDs) or blocked via validation for true repeated cross-sections.

pedrohcgs · 2026-04-02T01:31:42Z

Closing in favor of a cleaner PR. The changes are correct but the review history got messy.

Copilot AI review requested due to automatic review settings April 1, 2026 17:18

Copilot started reviewing on behalf of pedrohcgs April 1, 2026 17:19 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

pedrohcgs commented Apr 1, 2026

View reviewed changes

Copilot started work on behalf of pedrohcgs April 1, 2026 20:58 View session

Copilot finished work on behalf of pedrohcgs April 1, 2026 21:02

pedrohcgs requested a review from Copilot April 1, 2026 21:04

Copilot started reviewing on behalf of pedrohcgs April 1, 2026 21:05 View session

Repository owner deleted a comment from Copilot AI Apr 1, 2026

Copilot AI reviewed Apr 1, 2026

View reviewed changes

pedrohcgs force-pushed the v2.3.1.902-bugfixes branch from 11bd00b to f7104f1 Compare April 2, 2026 01:09

pedrohcgs closed this Apr 2, 2026

	w_range <- data[, .(w_range = max(weights) - min(weights)), by = get(args$idname)]
	w_range <- data[, .(w_range = max(weights) - min(weights)), by = c(args$idname)]

	remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)
	remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)

	TRUE
	isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))

-  }
+  }
+  if (!panel && !is.null(fix_weights) &&
+      fix_weights %in% c("base_period", "first_period")) {
+    stop(paste0(
+      "fix_weights = \"", fix_weights, "\" is not supported when panel = FALSE ",
+      "(true repeated cross-sections). Use fix_weights = \"varying\" or NULL ",
+      "for repeated cross-section data."
+    ))
+  }

Conversation

pedrohcgs commented Apr 1, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

pedrohcgs left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

pedrohcgs commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants