-
Notifications
You must be signed in to change notification settings - Fork 1
Issue 14: Use percent ED visits due to COVID data so we have data for whole timerange #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughUpdates hospital admissions plotting function to use local CSV file instead of URL, changes metric from hospital admissions to percent of ED visits due to COVID, modifies associated global variables, updates coverage plotting function documentation, and adjusts target data subset from US minus California to full US. Changes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @man/get_plot_hosp_admissions.Rd:
- Line 5: The Rd documentation incorrectly uses "hospital admissions" instead of
"ED visits"; update the \title and the description field (the text around line
38) in get_plot_hosp_admissions.Rd to say "ED visits" or "emergency department
(ED) visits" consistently, and adjust any phrasing that implies hospital
admissions (e.g., change "Percent of hospital admissions due to covid over time"
to "Percent of ED visits due to COVID-19 over time") so the wording matches the
data source and implementation.
In @R/fig_data.R:
- Line 201: Update the roxygen title string that currently reads "Percent of
hospital admissions due to covid over time" to use correct terminology "Percent
of ED visits due to COVID-19 over time" (match capitalization if desired) and
ensure any other metadata/title variables in the same block are consistent with
the y-axis label and NSSP source; specifically, edit the roxygen comment line
starting with "#' Percent of hospital admissions due to covid over time" and
verify there are no other occurrences of "hospital admissions" in this help/plot
block.
- Around line 226-236: The left_join between nssp_cdcgov and location_data is
using the wrong key; change the join in the pipeline that produces raw_data (the
left_join call involving nssp_cdcgov and location_data) to match geography to
the FluSight `location` column instead of `location_name` so rows with FIPS
codes align correctly (i.e., use by = c("geography" = "location") in the
left_join definition).
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
targets/fig_data_targets.R (1)
77-85: Critical: Inconsistent location selection for US minus CA figure.The target
plot_hosp_admissions_usmincahas been changed to plotlocation_to_plot = "US", but:
- The target name suffix
_usmincaimplies "US minus CA"- It's grouped with other US minus CA targets (lines 50-97)
- It's used in
first_data_fig_alt(line 91) asplot_hosp1, which should represent US minus CA data to maintain consistency withplot_freq1andplot_seq1Unlike sequence data (which can be aggregated excluding CA via
obs_data_us_min_ca), the NSSP data is loaded by state-levellocation_to_plot. This means the hospital admissions plot will show full US data whilst the other two panels show US minus CA data, creating an inconsistent comparison.Possible solutions
Option 1: If NSSP data includes state-level data, filter to exclude CA:
# Would require changes to get_plot_hosp_admissions to accept excluded states # or pre-filter the data in the targetOption 2: If the intention is to show full US for all panels, rename the target and update the figure assembly logic to reflect that this is no longer a US minus CA comparison.
Option 3: If NSSP data doesn't support state exclusion, document this limitation clearly in comments and consider whether this comparison figure should still be generated.
🧹 Nitpick comments (1)
R/fig_data.R (1)
246-246: Remove commented-out code.The commented line
# scale_y_continuous(transform = "log10")+should be removed rather than left in the codebase.🔎 Proposed fix
ylab("Percent of ED visits\ndue to COVID") + - # scale_y_continuous(transform = "log10")+ scale_x_date(
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
input/nssp_states_and_national.csvis excluded by!**/*.csv
📒 Files selected for processing (6)
R/fig_data.RR/globals.Rman/get_plot_coverage_by_loc.Rdman/get_plot_coverage_overall.Rdman/get_plot_hosp_admissions.Rdtargets/fig_data_targets.R
🔇 Additional comments (5)
man/get_plot_coverage_by_loc.Rd (1)
1-19: LGTM!The documentation for
get_plot_coverage_by_locis well-formed and complete.R/globals.R (2)
72-74: LGTM!The new global variables
week_endandpercent_visits_covidcorrectly correspond to the NSSP data columns used in the updatedget_plot_hosp_admissionsfunction.
109-116: LGTM!The reorganisation of coverage-related global variables between
get_plot_coverage_overallandget_plot_coverage_by_locappears correct and aligns with the updated plotting function scopes.Also applies to: 149-156
man/get_plot_coverage_overall.Rd (2)
2-2: Documentation source updated appropriately.The function documentation now references
R/fig_overall_scores.R, which appears to be a more semantically appropriate location for this overall coverage plotting function.
5-6: Documentation updated consistently for parameter removal.The documentation correctly reflects the removal of the
locsparameter from the function signature. The title and description now clarify that the plot summarises across both nowcast dates and locations, which aligns well with the "overall" nature of the function.Verification confirms that all call sites throughout the codebase have been updated to use the new single-parameter signature:
targets/fig_overall_targets.R:61–get_plot_coverage_overall(coverage_summarised_ep)targets/fig_overall_targets.R:136–get_plot_coverage_overall(coverage_summarised_ca)The breaking change has been properly propagated.
man/get_plot_hosp_admissions.Rd
Outdated
| \name{get_plot_hosp_admissions} | ||
| \alias{get_plot_hosp_admissions} | ||
| \title{Hospital admissions over time} | ||
| \title{Percent of hospital admissions due to covid over time} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct terminology: ED visits, not hospital admissions.
Both the title (line 5) and description (line 38) incorrectly state "hospital admissions" when the data source and implementation use "ED visits" (emergency department visits).
🔎 Proposed fix
-\title{Percent of hospital admissions due to covid over time}
+\title{Percent of ED visits due to COVID over time}-Percent of hospital admissions due to covid over time
+Percent of ED visits due to COVID over timeAlso applies to: 38-38
🤖 Prompt for AI Agents
In @man/get_plot_hosp_admissions.Rd around line 5, The Rd documentation
incorrectly uses "hospital admissions" instead of "ED visits"; update the \title
and the description field (the text around line 38) in
get_plot_hosp_admissions.Rd to say "ED visits" or "emergency department (ED)
visits" consistently, and adjust any phrasing that implies hospital admissions
(e.g., change "Percent of hospital admissions due to covid over time" to
"Percent of ED visits due to COVID-19 over time") so the wording matches the
data source and implementation.
R/fig_data.R
Outdated
| } | ||
|
|
||
| #' Hospital admissions over time | ||
| #' Percent of hospital admissions due to covid over time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct terminology: ED visits, not hospital admissions.
The title states "Percent of hospital admissions due to covid" but the data source (NSSP) and y-axis label (line 245) correctly reference "ED visits" (emergency department visits). Hospital admissions and ED visits are distinct metrics.
🔎 Proposed fix
-#' Percent of hospital admissions due to covid over time
+#' Percent of ED visits due to COVID over time📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #' Percent of hospital admissions due to covid over time | |
| #' Percent of ED visits due to COVID over time |
🤖 Prompt for AI Agents
In @R/fig_data.R around line 201, Update the roxygen title string that currently
reads "Percent of hospital admissions due to covid over time" to use correct
terminology "Percent of ED visits due to COVID-19 over time" (match
capitalization if desired) and ensure any other metadata/title variables in the
same block are consistent with the y-axis label and NSSP source; specifically,
edit the roxygen comment line starting with "#' Percent of hospital admissions
due to covid over time" and verify there are no other occurrences of "hospital
admissions" in this help/plot block.
| nssp_cdcgov <- read_csv(data_fp) | ||
| raw_data <- nssp_cdcgov |> | ||
| left_join(location_data, | ||
| by = c("geography" = "location_name") # nolint | ||
| ) |> | ||
| rename(date = week_end) |> | ||
| filter( | ||
| abbreviation == !!location_to_plot, | ||
| date >= min(date_range), | ||
| date <= max(date_range) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the NSSP CSV schema and check for a sample of join key values
if [ -f "input/nssp_states_and_national.csv" ]; then
echo "=== CSV column names ==="
head -1 input/nssp_states_and_national.csv
echo ""
echo "=== Sample geography values (first 10 unique) ==="
tail -n +2 input/nssp_states_and_national.csv | cut -d',' -f1 | sort -u | head -10
else
echo "NSSP CSV file not found at input/nssp_states_and_national.csv"
echo ""
echo "Searching for NSSP-related CSV files in the repository..."
find . -name "*nssp*.csv" -o -name "*states*.csv" | head -20
fiRepository: epiforecasts/evalvariantnowcasthub
Length of output: 608
🏁 Script executed:
#!/bin/bash
# Find where location_data is created/loaded in the R file
grep -n "location_data" R/fig_data.R | head -20Repository: epiforecasts/evalvariantnowcasthub
Length of output: 248
🏁 Script executed:
#!/bin/bash
# View the function signature and parameter documentation
sed -n '200,240p' R/fig_data.RRepository: epiforecasts/evalvariantnowcasthub
Length of output: 1772
🏁 Script executed:
#!/bin/bash
# Find where get_plot_hosp_admissions is called
echo "=== Function calls to get_plot_hosp_admissions ==="
rg "get_plot_hosp_admissions" --type r -B 2 -A 5
echo ""
echo "=== Search for location_data creation/loading ==="
rg "location_data\s*<-|location_data\s*=" --type r -B 2 -A 2 | head -50Repository: epiforecasts/evalvariantnowcasthub
Length of output: 5289
🏁 Script executed:
#!/bin/bash
# Find the tar_target definition for location_data
rg "tar_target.*location_data|name\s*=\s*location_data" --type r -B 5 -A 10Repository: epiforecasts/evalvariantnowcasthub
Length of output: 928
🏁 Script executed:
#!/bin/bash
# Find location_fp definition
echo "=== Finding location_fp ==="
rg "location_fp\s*=" --type r -B 2 -A 2
echo ""
echo "=== Search for location CSV files ==="
find . -name "*location*" -type f | grep -E "\.(csv|tsv)$" | head -20Repository: epiforecasts/evalvariantnowcasthub
Length of output: 454
🏁 Script executed:
#!/bin/bash
# Search for location_fp assignment
echo "=== Finding location_fp assignment ==="
rg "location_fp.*<-|location_fp\s*=" --type r -B 1 -A 1
echo ""
echo "=== Check targets config file for location_fp ==="
grep -n "location_fp" targets/config_targets.R | head -20Repository: epiforecasts/evalvariantnowcasthub
Length of output: 430
🏁 Script executed:
#!/bin/bash
# View the tar_target definition for location_fp
sed -n '30,45p' targets/config_targets.RRepository: epiforecasts/evalvariantnowcasthub
Length of output: 849
🏁 Script executed:
#!/bin/bash
# Fetch and inspect the FluSight locations CSV
echo "=== Fetching FluSight locations CSV ==="
curl -s "https://raw.githubusercontent.com/cdcepi/FluSight-forecast-hub/refs/heads/main/auxiliary-data/locations.csv" | head -5 > /tmp/locations.csv
if [ -s /tmp/locations.csv ]; then
echo "=== Column names ==="
head -1 /tmp/locations.csv
echo ""
echo "=== First few rows ==="
head -6 /tmp/locations.csv
else
echo "Failed to fetch locations CSV"
fiRepository: epiforecasts/evalvariantnowcasthub
Length of output: 742
Correct the join key: use location instead of location_name.
The NSSP CSV's geography column contains numeric FIPS codes (e.g., "1", "10", "100", "1000"), which correspond to the FluSight locations CSV's location column (e.g., "01", "02", "04"), not location_name (which contains place names like "Alabama", "Alaska"). The current join will fail to match any rows.
Change:
by = c("geography" = "location_name")To:
by = c("geography" = "location")🤖 Prompt for AI Agents
In @R/fig_data.R around lines 226-236, The left_join between nssp_cdcgov and
location_data is using the wrong key; change the join in the pipeline that
produces raw_data (the left_join call involving nssp_cdcgov and location_data)
to match geography to the FluSight `location` column instead of `location_name`
so rows with FIPS codes align correctly (i.e., use by = c("geography" =
"location") in the left_join definition).
…s/evalvariantnowcasthub into 14-use-pct-ed-visits
Per Dylan's suggestion, swapping out the NHSN signal for the percent ED visits due to COVID from NSSP data. I obtained this from downloading the data here
Note:
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.