Skip to content

Conversation

@kaitejohnson
Copy link
Collaborator

@kaitejohnson kaitejohnson commented Jan 5, 2026

Per Dylan's suggestion, swapping out the NHSN signal for the percent ED visits due to COVID from NSSP data. I obtained this from downloading the data here

Note:

  • I added the data to the repo so it is fully reproducible, as I couldn't figure out how to get an API link to the csv file from data.cdc.gov

Summary by CodeRabbit

Release Notes

  • New Features

    • Redesigned hospital visualisation to display the percentage of emergency department visits due to COVID-19 over time, replacing the previous hospital admissions metric.
    • Updated chart format from bar to line graph for improved temporal trend visualisation.
  • Documentation

    • Updated function documentation to reflect revised data source parameters and visualisation outputs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Updates hospital admissions plotting function to use local CSV file instead of URL, changes metric from hospital admissions to percent of ED visits due to COVID, modifies associated global variables, updates coverage plotting function documentation, and adjusts target data subset from US minus California to full US.

Changes

Cohort / File(s) Summary
Core plotting function update
R/fig_data.R
Added data_fp parameter with default path to local CSV file; updated data loading to read from CSV and join with location_data; changed plot metric from value (hospital admissions) to percent_visits_covid; changed plot type from bar to line; updated axis label and added location title to plot
Global variables
R/globals.R
Added week_end and percent_visits_covid to get_plot_hosp_admissions group; removed value from same group; reassigned coverage-related variables (model_id, location, interval_range, interval_coverage, empirical_coverage, interval_label) from get_plot_coverage_overall to get_plot_coverage_by_loc group with updated interval thresholds
Hospital admissions documentation
man/get_plot_hosp_admissions.Rd
Updated title and description to reflect "Percent of hospital admissions due to COVID over time"; replaced url parameter with data_fp parameter; added data source provenance information
Coverage plotting documentation
man/get_plot_coverage_overall.Rd
Removed locs parameter from function signature; updated title and description to include "and locations"; updated file reference to R/fig_overall_scores.R
New coverage function documentation
man/get_plot_coverage_by_loc.Rd
New roxygen2 documentation file for get_plot_coverage_by_loc function with parameter and description details
Target configuration
targets/fig_data_targets.R
Changed plot_hosp_admissions_usminca target to use "US" location instead of "US minus CA"

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing hospital admissions data with percent ED visits due to COVID data to extend the timerange.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kaitejohnson kaitejohnson marked this pull request as ready for review January 5, 2026 13:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Fix all issues with AI Agents 🤖
In @man/get_plot_hosp_admissions.Rd:
- Line 5: The Rd documentation incorrectly uses "hospital admissions" instead of
"ED visits"; update the \title and the description field (the text around line
38) in get_plot_hosp_admissions.Rd to say "ED visits" or "emergency department
(ED) visits" consistently, and adjust any phrasing that implies hospital
admissions (e.g., change "Percent of hospital admissions due to covid over time"
to "Percent of ED visits due to COVID-19 over time") so the wording matches the
data source and implementation.

In @R/fig_data.R:
- Line 201: Update the roxygen title string that currently reads "Percent of
hospital admissions due to covid over time" to use correct terminology "Percent
of ED visits due to COVID-19 over time" (match capitalization if desired) and
ensure any other metadata/title variables in the same block are consistent with
the y-axis label and NSSP source; specifically, edit the roxygen comment line
starting with "#' Percent of hospital admissions due to covid over time" and
verify there are no other occurrences of "hospital admissions" in this help/plot
block.
- Around line 226-236: The left_join between nssp_cdcgov and location_data is
using the wrong key; change the join in the pipeline that produces raw_data (the
left_join call involving nssp_cdcgov and location_data) to match geography to
the FluSight `location` column instead of `location_name` so rows with FIPS
codes align correctly (i.e., use by = c("geography" = "location") in the
left_join definition).

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
targets/fig_data_targets.R (1)

77-85: Critical: Inconsistent location selection for US minus CA figure.

The target plot_hosp_admissions_usminca has been changed to plot location_to_plot = "US", but:

  1. The target name suffix _usminca implies "US minus CA"
  2. It's grouped with other US minus CA targets (lines 50-97)
  3. It's used in first_data_fig_alt (line 91) as plot_hosp1, which should represent US minus CA data to maintain consistency with plot_freq1 and plot_seq1

Unlike sequence data (which can be aggregated excluding CA via obs_data_us_min_ca), the NSSP data is loaded by state-level location_to_plot. This means the hospital admissions plot will show full US data whilst the other two panels show US minus CA data, creating an inconsistent comparison.

Possible solutions

Option 1: If NSSP data includes state-level data, filter to exclude CA:

# Would require changes to get_plot_hosp_admissions to accept excluded states
# or pre-filter the data in the target

Option 2: If the intention is to show full US for all panels, rename the target and update the figure assembly logic to reflect that this is no longer a US minus CA comparison.

Option 3: If NSSP data doesn't support state exclusion, document this limitation clearly in comments and consider whether this comparison figure should still be generated.

🧹 Nitpick comments (1)
R/fig_data.R (1)

246-246: Remove commented-out code.

The commented line # scale_y_continuous(transform = "log10")+ should be removed rather than left in the codebase.

🔎 Proposed fix
     ylab("Percent of ED visits\ndue to COVID") +
-    # scale_y_continuous(transform = "log10")+
     scale_x_date(
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac058fa and 706ec14.

⛔ Files ignored due to path filters (1)
  • input/nssp_states_and_national.csv is excluded by !**/*.csv
📒 Files selected for processing (6)
  • R/fig_data.R
  • R/globals.R
  • man/get_plot_coverage_by_loc.Rd
  • man/get_plot_coverage_overall.Rd
  • man/get_plot_hosp_admissions.Rd
  • targets/fig_data_targets.R
🔇 Additional comments (5)
man/get_plot_coverage_by_loc.Rd (1)

1-19: LGTM!

The documentation for get_plot_coverage_by_loc is well-formed and complete.

R/globals.R (2)

72-74: LGTM!

The new global variables week_end and percent_visits_covid correctly correspond to the NSSP data columns used in the updated get_plot_hosp_admissions function.


109-116: LGTM!

The reorganisation of coverage-related global variables between get_plot_coverage_overall and get_plot_coverage_by_loc appears correct and aligns with the updated plotting function scopes.

Also applies to: 149-156

man/get_plot_coverage_overall.Rd (2)

2-2: Documentation source updated appropriately.

The function documentation now references R/fig_overall_scores.R, which appears to be a more semantically appropriate location for this overall coverage plotting function.


5-6: Documentation updated consistently for parameter removal.

The documentation correctly reflects the removal of the locs parameter from the function signature. The title and description now clarify that the plot summarises across both nowcast dates and locations, which aligns well with the "overall" nature of the function.

Verification confirms that all call sites throughout the codebase have been updated to use the new single-parameter signature:

  • targets/fig_overall_targets.R:61get_plot_coverage_overall(coverage_summarised_ep)
  • targets/fig_overall_targets.R:136get_plot_coverage_overall(coverage_summarised_ca)

The breaking change has been properly propagated.

\name{get_plot_hosp_admissions}
\alias{get_plot_hosp_admissions}
\title{Hospital admissions over time}
\title{Percent of hospital admissions due to covid over time}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct terminology: ED visits, not hospital admissions.

Both the title (line 5) and description (line 38) incorrectly state "hospital admissions" when the data source and implementation use "ED visits" (emergency department visits).

🔎 Proposed fix
-\title{Percent of hospital admissions due to covid over time}
+\title{Percent of ED visits due to COVID over time}
-Percent of hospital admissions due to covid over time
+Percent of ED visits due to COVID over time

Also applies to: 38-38

🤖 Prompt for AI Agents
In @man/get_plot_hosp_admissions.Rd around line 5, The Rd documentation
incorrectly uses "hospital admissions" instead of "ED visits"; update the \title
and the description field (the text around line 38) in
get_plot_hosp_admissions.Rd to say "ED visits" or "emergency department (ED)
visits" consistently, and adjust any phrasing that implies hospital admissions
(e.g., change "Percent of hospital admissions due to covid over time" to
"Percent of ED visits due to COVID-19 over time") so the wording matches the
data source and implementation.

R/fig_data.R Outdated
}

#' Hospital admissions over time
#' Percent of hospital admissions due to covid over time
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct terminology: ED visits, not hospital admissions.

The title states "Percent of hospital admissions due to covid" but the data source (NSSP) and y-axis label (line 245) correctly reference "ED visits" (emergency department visits). Hospital admissions and ED visits are distinct metrics.

🔎 Proposed fix
-#' Percent of hospital admissions due to covid over time
+#' Percent of ED visits due to COVID over time
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#' Percent of hospital admissions due to covid over time
#' Percent of ED visits due to COVID over time
🤖 Prompt for AI Agents
In @R/fig_data.R around line 201, Update the roxygen title string that currently
reads "Percent of hospital admissions due to covid over time" to use correct
terminology "Percent of ED visits due to COVID-19 over time" (match
capitalization if desired) and ensure any other metadata/title variables in the
same block are consistent with the y-axis label and NSSP source; specifically,
edit the roxygen comment line starting with "#' Percent of hospital admissions
due to covid over time" and verify there are no other occurrences of "hospital
admissions" in this help/plot block.

Comment on lines +226 to +236
nssp_cdcgov <- read_csv(data_fp)
raw_data <- nssp_cdcgov |>
left_join(location_data,
by = c("geography" = "location_name") # nolint
) |>
rename(date = week_end) |>
filter(
abbreviation == !!location_to_plot,
date >= min(date_range),
date <= max(date_range)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the NSSP CSV schema and check for a sample of join key values

if [ -f "input/nssp_states_and_national.csv" ]; then
  echo "=== CSV column names ==="
  head -1 input/nssp_states_and_national.csv
  
  echo ""
  echo "=== Sample geography values (first 10 unique) ==="
  tail -n +2 input/nssp_states_and_national.csv | cut -d',' -f1 | sort -u | head -10
else
  echo "NSSP CSV file not found at input/nssp_states_and_national.csv"
  echo ""
  echo "Searching for NSSP-related CSV files in the repository..."
  find . -name "*nssp*.csv" -o -name "*states*.csv" | head -20
fi

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 608


🏁 Script executed:

#!/bin/bash
# Find where location_data is created/loaded in the R file
grep -n "location_data" R/fig_data.R | head -20

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 248


🏁 Script executed:

#!/bin/bash
# View the function signature and parameter documentation
sed -n '200,240p' R/fig_data.R

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 1772


🏁 Script executed:

#!/bin/bash
# Find where get_plot_hosp_admissions is called
echo "=== Function calls to get_plot_hosp_admissions ==="
rg "get_plot_hosp_admissions" --type r -B 2 -A 5

echo ""
echo "=== Search for location_data creation/loading ==="
rg "location_data\s*<-|location_data\s*=" --type r -B 2 -A 2 | head -50

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 5289


🏁 Script executed:

#!/bin/bash
# Find the tar_target definition for location_data
rg "tar_target.*location_data|name\s*=\s*location_data" --type r -B 5 -A 10

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 928


🏁 Script executed:

#!/bin/bash
# Find location_fp definition
echo "=== Finding location_fp ==="
rg "location_fp\s*=" --type r -B 2 -A 2

echo ""
echo "=== Search for location CSV files ==="
find . -name "*location*" -type f | grep -E "\.(csv|tsv)$" | head -20

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 454


🏁 Script executed:

#!/bin/bash
# Search for location_fp assignment
echo "=== Finding location_fp assignment ==="
rg "location_fp.*<-|location_fp\s*=" --type r -B 1 -A 1

echo ""
echo "=== Check targets config file for location_fp ==="
grep -n "location_fp" targets/config_targets.R | head -20

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 430


🏁 Script executed:

#!/bin/bash
# View the tar_target definition for location_fp
sed -n '30,45p' targets/config_targets.R

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 849


🏁 Script executed:

#!/bin/bash
# Fetch and inspect the FluSight locations CSV
echo "=== Fetching FluSight locations CSV ==="
curl -s "https://raw.githubusercontent.com/cdcepi/FluSight-forecast-hub/refs/heads/main/auxiliary-data/locations.csv" | head -5 > /tmp/locations.csv

if [ -s /tmp/locations.csv ]; then
  echo "=== Column names ==="
  head -1 /tmp/locations.csv
  
  echo ""
  echo "=== First few rows ==="
  head -6 /tmp/locations.csv
else
  echo "Failed to fetch locations CSV"
fi

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 742


Correct the join key: use location instead of location_name.

The NSSP CSV's geography column contains numeric FIPS codes (e.g., "1", "10", "100", "1000"), which correspond to the FluSight locations CSV's location column (e.g., "01", "02", "04"), not location_name (which contains place names like "Alabama", "Alaska"). The current join will fail to match any rows.

Change:

by = c("geography" = "location_name")

To:

by = c("geography" = "location")
🤖 Prompt for AI Agents
In @R/fig_data.R around lines 226-236, The left_join between nssp_cdcgov and
location_data is using the wrong key; change the join in the pipeline that
produces raw_data (the left_join call involving nssp_cdcgov and location_data)
to match geography to the FluSight `location` column instead of `location_name`
so rows with FIPS codes align correctly (i.e., use by = c("geography" =
"location") in the left_join definition).

@kaitejohnson kaitejohnson merged commit 615c782 into main Jan 5, 2026
2 checks passed
@kaitejohnson kaitejohnson deleted the 14-use-pct-ed-visits branch January 5, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants