Skip to content

v2.3.1.904: time-varying weights fix, fix_weights param, namespace cleanup#256

Closed
pedrohcgs wants to merge 1 commit intomasterfrom
v2.3.1.902-bugfixes
Closed

v2.3.1.904: time-varying weights fix, fix_weights param, namespace cleanup#256
pedrohcgs wants to merge 1 commit intomasterfrom
v2.3.1.902-bugfixes

Conversation

@pedrohcgs
Copy link
Copy Markdown
Collaborator

Summary

  • Fix time-varying weights mismatch between faster_mode=TRUE and faster_mode=FALSE on balanced panel data. The fast path was always using first-period weights; now it uses a weights_tensor to select the correct period's weights, matching the slow path behavior.

  • New fix_weights parameter in att_gt() gives users explicit control over how time-varying weights are resolved:

    • NULL (default): preserves existing behavior
    • "varying": per-observation weights using RC DRDID estimators (even for balanced panel)
    • "base_period": fix weights at period g-1 for all cells
    • "first_period": fix weights at first period in dataset
  • Reduced namespace pollution: replaced blanket import(stats), import(utils), import(BMisc) with selective importFrom(). The package no longer re-exports stats::filter or stats::lag, which previously masked dplyr::filter/dplyr::lag.

  • Fixed aggte() crash ("Error in get(gname): invalid first argument") caused by dreamerr >= 1.5.0 intercepting get() inside data.table's [ operator. Replaced with set().

  • Runtime message when time-varying weights are detected in balanced panel data

  • Inference tests now skip gracefully when did v2.1.2 is unavailable from CRAN (instead of failing)

  • GitHub Action to auto-bump dev version in DESCRIPTION on PR merge

Test plan

  • All fix_weights options match between faster_mode=TRUE and FALSE (reg, dr, ipw × varying, universal base period)
  • Time-invariant weights produce identical results across all fix_weights options
  • aggte() works when columns are named gname/tname/idname
  • No dplyr::filter masking after loading did
  • Wolfers dataset smoke test passes
  • User's NYC STR dataset (60K units, 25 periods) runs end-to-end
  • 236 PASS, 0 FAIL, 8 SKIP

🤖 Generated with Claude Code

…amespace pollution, fix dreamerr/get() crash

- Fix faster_mode mismatch with time-varying sampling weights on balanced panel
- Add fix_weights parameter: NULL (default), "varying", "base_period", "first_period"
- Replace import(stats), import(utils), import(BMisc) with selective importFrom
- Fix aggte() crash from dreamerr intercepting get() inside data.table
- Add runtime message for time-varying weights detection
- Skip inference tests gracefully when did v2.1.2 unavailable from CRAN
- Add GitHub Action to auto-bump version on PR merge
- 236 PASS, 0 FAIL, 8 SKIP
Copilot AI review requested due to automatic review settings April 1, 2026 17:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the did package’s handling of time-varying sampling weights to make faster_mode=TRUE consistent with the non-fast path, introduces a new fix_weights control knob in att_gt(), and reduces namespace pollution by switching to selective imports. It also includes a fix for an aggte() crash related to data.table usage, expands docs/NEWS, updates tests to better handle unavailable historical did versions, and adds a GitHub Action to auto-bump the dev version.

Changes:

  • Fix faster_mode vs non-fast mismatches for time-varying weights by introducing a per-period weights_tensor, and add fix_weights to control weight resolution behavior.
  • Replace problematic data.table get() usage in aggte() with set() to avoid crashes.
  • Switch from blanket import() to importFrom() and add/adjust tests, docs, and a version-bump workflow.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/testthat/test-inference.R Skip inference tests when did v2.1.2 can’t be installed; installs old version into a temp lib.
tests/testthat/test-att_gt.R Adds regression tests for fix_weights, time-varying weights consistency, and gname/tname/idname naming edge cases.
R/utility_functions.R Updates check_balance() to avoid get() patterns and use safer column access.
R/pre_process_did2.R Adds time-varying weight detection message; builds weights_tensor; adds fix_weights arg plumbing.
R/pre_process_did.R Adds fix_weights arg plumbing and time-varying weight detection message for the non-fast path.
R/imports.R Replaces blanket imports with selective @importFrom roxygen directives.
R/DIDparams2.R Adds weights_tensor and fix_weights to the DIDparams object for faster mode.
R/DIDparams.R Adds fix_weights to the DIDparams object for the non-fast path.
R/compute.att_gt2.R Uses weights_tensor for correct period weights in faster mode; adds RC fallback for fix_weights="varying".
R/compute.att_gt.R Adds fix_weights behavior to the non-fast path, including RC fallback for varying.
R/compute.aggte.R Replaces get()-inside-[.data.table with set() to avoid the reported crash.
R/att_gt.R Documents and validates fix_weights; passes it through preprocessing.
NEWS.md Adds release notes for 2.3.1.904 (weights fix, fix_weights, namespace cleanup, crash fix, etc.).
NAMESPACE Converts broad imports to importFrom() declarations (stats/utils/BMisc).
man/pre_process_did2.Rd Documents fix_weights and expanded weightsname semantics.
man/pre_process_did.Rd Documents fix_weights and expanded weightsname semantics.
man/DIDparams.Rd Documents fix_weights and expanded weightsname semantics.
man/conditional_did_pretest.Rd Expands weightsname documentation (references fix_weights).
man/att_gt.Rd Documents new fix_weights argument and time-varying weight behavior.
DESCRIPTION Bumps package version to 2.3.1.904.
.github/workflows/bump-version.yaml Adds an action to auto-increment the dev version on PR merge.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +340 to +344
# RC influence function has 2*n1 rows (stacked pre + post);
# aggregate back to unit level by summing pre and post contributions
inf_rc <- res$att.inf.func
n1_half <- length(inf_rc) %/% 2L
res$att.inf.func <- inf_rc[1:n1_half] + inf_rc[(n1_half + 1):(2 * n1_half)]
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fix_weights == "varying" (RC) path, the influence function is being collapsed by splitting the vector in half and summing. But disdat_long is constructed via data[time_mask] without reordering, so in balanced panel data it will typically be interleaved by unit (pre/post alternating), not stacked as (all pre, then all post). This will mis-aggregate the influence function and lead to incorrect standard errors/inference. Reorder disdat_long to stack periods before calling the RC estimator, or aggregate att.inf.func back to unit level using the id (or by pairing consecutive pre/post rows) rather than summing halves.

Suggested change
# RC influence function has 2*n1 rows (stacked pre + post);
# aggregate back to unit level by summing pre and post contributions
inf_rc <- res$att.inf.func
n1_half <- length(inf_rc) %/% 2L
res$att.inf.func <- inf_rc[1:n1_half] + inf_rc[(n1_half + 1):(2 * n1_half)]
# RC influence function has 2 * n1_unit rows (pre and post per unit);
# aggregate back to unit level by summing consecutive pre/post contributions
inf_rc <- res$att.inf.func
n_pairs <- length(inf_rc) %/% 2L
pre_idx <- seq.int(1L, by = 2L, length.out = n_pairs)
post_idx <- pre_idx + 1L
res$att.inf.func <- inf_rc[pre_idx] + inf_rc[post_idx]

Copilot uses AI. Check for mistakes.

# Check for time-varying weights in panel data
if (!is.null(args$weightsname) && args$panel) {
w_range <- data[, .(w_range = max(weights) - min(weights)), by = get(args$idname)]
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The by = get(args$idname) grouping is avoidable and inconsistent with the newer by = c(id_col) approach used elsewhere. Consider switching to by = c(args$idname) (or similar) to avoid get() inside data.table expressions and improve robustness/performance.

Suggested change
w_range <- data[, .(w_range = max(weights) - min(weights)), by = get(args$idname)]
w_range <- data[, .(w_range = max(weights) - min(weights)), by = c(args$idname)]

Copilot uses AI. Check for mistakes.
dir.create(temp_lib)
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
old_did_available <- tryCatch({
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test installs from a hard-coded HTTP CRAN mirror. Use an HTTPS mirror (e.g., https://cloud.r-project.org) to avoid potential network/proxy blocks and to match common repository policies.

Suggested change
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)

Copilot uses AI. Check for mistakes.
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
old_did_available <- tryCatch({
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org", quiet = TRUE)
TRUE
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old_did_available is set to TRUE as long as install_version() doesn't throw, but installation can still fail (e.g., non-zero exit with only warnings), which will make the later library(did, lib.loc = temp_lib) calls fail. Consider verifying the install explicitly (e.g., check requireNamespace("did", lib.loc = temp_lib) or inspect the installed package version) before declaring the old version available.

Suggested change
TRUE
isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +50
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.base.ref }}
token: ${{ secrets.GITHUB_TOKEN }}

- name: Bump dev version in DESCRIPTION
run: |
# Extract current version
current=$(grep '^Version:' DESCRIPTION | sed 's/Version: //')
echo "Current version: $current"

# Split into parts
IFS='.' read -ra parts <<< "$current"
major="${parts[0]}"
minor="${parts[1]}"
patch="${parts[2]}"
dev="${parts[3]:-0}"

# Increment dev version
new_dev=$((dev + 1))
new_version="${major}.${minor}.${patch}.${new_dev}"
echo "New version: $new_version"

# Update DESCRIPTION
sed -i "s/^Version: .*/Version: ${new_version}/" DESCRIPTION

# Configure git
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"

# Commit and push
git add DESCRIPTION
git diff --cached --quiet || git commit -m "Bump version to ${new_version}"
git push
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow directly commits & pushes to the base branch on PR merge; if multiple PRs merge close together, pushes can race and fail with a non-fast-forward error. Consider adding a concurrency group and/or pulling/rebasing before pushing (or using a dedicated version-bump action that retries) to make the bump more reliable.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

@pedrohcgs pedrohcgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot review

@pedrohcgs pedrohcgs requested a review from Copilot April 1, 2026 21:04
Repository owner deleted a comment from Copilot AI Apr 1, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

R/pre_process_did2.R:318

  • unit_count is now grouped with by = c(args$idname), so mis_unit no longer has a column named get. The subsequent mis_unit$get will error when dropping incomplete units. Use the actual id column from mis_unit (e.g., mis_unit[[args$idname]] / mis_unit[[1]]) instead of $get.
      unit_count <- data[, .(count = .N), by = c(args$idname)]
      if(any(unit_count[, count < raw_time_size])){
        mis_unit <- unit_count[count < raw_time_size]
        warning(nrow(mis_unit), " units are missing in some periods. Converting to balanced panel by dropping them.")
        data <- data[!get(args$idname) %in% mis_unit$get]
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +394 to +421
# Handle fix_weights for RC/unbalanced panel
if (!is.null(fix_weights) && fix_weights %in% c("base_period", "first_period")) {
# Determine which period's weight to use
if (fix_weights == "base_period") {
target_period <- tlist[pret_g]
} else {
target_period <- tlist[1]
}
# Build lookup: weight from target period per unit
target_rows <- data[t_col == target_period, ]
target_w <- stats::setNames(target_rows$.w, target_rows$.rowid)
# Look up weight for each observation's unit
w <- as.numeric(target_w[as.character(disdat$.rowid)])
# Drop units not observed in the target period
missing_w <- is.na(w)
if (any(missing_w)) {
n_dropped <- length(unique(disdat$.rowid[missing_w]))
warning(paste0("Dropped ", n_dropped, " units not observed in ",
fix_weights, " (period ", target_period, ") ",
"for group ", glist[g], " in time period ", tlist[t + tfac]))
disdat <- disdat[!missing_w, ]
G <- disdat$.G
C <- disdat$.C
Y <- disdat[[yname]]
post <- 1 * (disdat[[tname]] == tlist[t + tfac])
n1 <- sum(G + C)
w <- w[!missing_w]
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix_weights = "base_period"/"first_period" in the RC/unbalanced branch looks up weights by .rowid. When panel = FALSE with true repeated cross-sections, .rowid is unique per observation, so all observations from the non-target period get NA weights and are dropped, making the 2x2 comparison invalid (often resulting in NA). Consider rejecting these fix_weights options for true repeated cross-sections (or defining an alternative RC-appropriate behavior) and only applying the lookup/drop logic for unbalanced panels where units can appear in multiple periods.

Copilot uses AI. Check for mistakes.
!(fix_weights %in% c("varying", "base_period", "first_period"))) {
stop("fix_weights must be NULL or one of \"varying\", \"base_period\", or \"first_period\".")
}
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix_weights is validated for allowed strings, but not for compatibility with panel = FALSE true repeated cross-sections. With current implementation, fix_weights = "base_period"/"first_period" can effectively drop one period’s observations (since IDs don’t persist across time), producing invalid/NA results. Please add validation to disallow these options when panel = FALSE and the data are true repeated cross-sections (or clearly document/implement a different RC-compatible definition).

Suggested change
}
}
if (!panel && !is.null(fix_weights) &&
fix_weights %in% c("base_period", "first_period")) {
stop(paste0(
"fix_weights = \"", fix_weights, "\" is not supported when panel = FALSE ",
"(true repeated cross-sections). Use fix_weights = \"varying\" or NULL ",
"for repeated cross-section data."
))
}

Copilot uses AI. Check for mistakes.
Comment on lines 31 to +36
temp_lib <- tempfile()
dir.create(temp_lib)
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
old_did_available <- tryCatch({
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)
isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))
}, error = function(e) FALSE)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file performs remotes::install_version() at file-load time. On CRAN/CI environments without internet (or where network access during checks is disallowed), the attempted download can still violate policies even if errors are caught and tests are later skipped. Move the install attempt inside test_that() (or a helper called after skip_on_cran()/skip_if_offline()), so the file has no side effects when sourced during R CMD check.

Copilot uses AI. Check for mistakes.
Comment on lines 31 to +41
temp_lib <- tempfile()
dir.create(temp_lib)
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "http://cran.us.r-project.org")
old_did_available <- tryCatch({
remotes::install_version("did", version = "2.1.2", lib = temp_lib, repos = "https://cloud.r-project.org", quiet = TRUE)
isTRUE(requireNamespace("did", lib.loc = temp_lib, quietly = TRUE))
}, error = function(e) FALSE)

if (!old_did_available) {
# Clean up and skip all tests in this file
unlink(temp_lib, recursive = TRUE)
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp_lib is only removed when old_did_available is FALSE. When it is TRUE, the temporary library directory is left behind for the rest of the test run. Add an on.exit(unlink(temp_lib, recursive = TRUE), add = TRUE) (or equivalent teardown) to ensure cleanup regardless of install success/skips.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +25
on:
pull_request:
types: [closed]
branches: [master, main]

concurrency:
group: bump-version
cancel-in-progress: false

jobs:
bump-version:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest

permissions:
contents: write

steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.base.ref }}
token: ${{ secrets.GITHUB_TOKEN }}

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow attempts to push a commit back to the base branch on the pull_request event. For PRs from forks, GITHUB_TOKEN is typically read-only on pull_request workflows, so git push will fail. Consider triggering on push to the default branch (after merge) or using pull_request_target with appropriate hardening, so the workflow can reliably bump the version without permission issues.

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +286
Y_rc <- disdat_long[[yname]]
G_rc <- disdat_long$.G
post_rc <- as.numeric(disdat_long[[tname]] == tlist[t + tfac])
w_rc <- disdat_long$.w
covariates_rc <- model.matrix(xformla, data = disdat_long)
n1_rc <- sum(G_rc + disdat_long$.C) # careful: n1 for RC is different

if (inherits(est_method, "function")) {
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n1_rc is computed but never used. This is confusing in a section that is already doing special handling for the RC estimator; either remove it or use it for the subsequent scaling if that was the intent.

Copilot uses AI. Check for mistakes.
Comment on lines +420 to +440
# Handle fix_weights for RC/unbalanced panel
if (!is.null(dp2$fix_weights) && dp2$fix_weights %in% c("base_period", "first_period")) {
if (dp2$fix_weights == "base_period") {
target_period <- dp2$time_periods[dp2$.pret_by_group[g]]
} else {
target_period <- dp2$time_periods[1]
}
# Build weight lookup from target period
tid <- dp2$time_invariant_data
target_mask <- tid[[dp2$tname]] == target_period
target_ids <- tid[[dp2$idname]][target_mask]
target_ws <- tid[["weights"]][target_mask]
target_w_lookup <- stats::setNames(target_ws, as.character(target_ids))
# Look up weight for each observation
obs_ids <- as.character(tid[[dp2$idname]])
fixed_w <- as.numeric(target_w_lookup[obs_ids])
# Units not in target period get NA weight — will be filtered in run_DRDID
cohort_data <- data.table(did_cohort_index, tid[[dp2$yname]], tid$post, fixed_w, tid$.rowid)
} else {
cohort_data <- data.table(did_cohort_index, dp2$time_invariant_data[[dp2$yname]], dp2$time_invariant_data$post, dp2$time_invariant_data$weights, dp2$time_invariant_data$.rowid)
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the panel = FALSE branch, the fixed-weight handling (base_period/first_period) builds a per-"unit" lookup using dp2$idname. For true repeated cross-sections, idname is set to .rowid (unique per row), so observations from the non-target period will always have missing lookups (and will be filtered out downstream), breaking the RC 2x2 comparison. This logic should be limited to unbalanced panels (persistent unit IDs) or blocked via validation for true repeated cross-sections.

Copilot uses AI. Check for mistakes.
@pedrohcgs pedrohcgs force-pushed the v2.3.1.902-bugfixes branch from 11bd00b to f7104f1 Compare April 2, 2026 01:09
@pedrohcgs
Copy link
Copy Markdown
Collaborator Author

Closing in favor of a cleaner PR. The changes are correct but the review history got messy.

@pedrohcgs pedrohcgs closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants