-
Notifications
You must be signed in to change notification settings - Fork 1
Figure functions #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Figure functions #108
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe PR introduces multilocation comparison plotting capabilities: adds two exported functions ( Changes
Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @R/plot_figures.R:
- Around line 91-106: When locations is NULL the code uses
sample(available_locations, ...) without a seed which makes selection
non-deterministic; change this to a deterministic selection by sorting
available_locations and taking the first min(3, length(available_locations))
entries (replace the sample(...) call), or alternatively set a fixed seed
immediately before sample() to ensure reproducible picks; update the block that
assigns locations (referencing variables available_locations, locations,
sample(), forecast_dates, and first_forecast_path) accordingly.
- Line 85: The function accepts a hosp_data_long parameter that is never used;
either remove it from the function signature and the roxygen docs (remove the
documented lines 57-58) or wire it into the plotting flow by replacing the
in-function CSV-derived observed hospital extraction (the block that
reads/derives observed hospital data around the forecast CSV handling, currently
lines ~200-218) to use hosp_data_long instead; locate the parameter in
plot_figures.R and update the corresponding roxygen @param entry and any
examples, or adjust the code path that constructs observed hospital data so it
uses the provided hosp_data_long when non-NULL (fall back to CSV only if NULL).
🧹 Nitpick comments (5)
man/plot_multilocation_comparison.Rd (1)
24-26: Consider reproducibility when selecting random locations.The documentation indicates that when
locations = NULL, three random locations are selected. For reproducible outputs, consider either:
- Documenting that users should set a seed before calling the function
- Adding a
seedparameter to the function for reproducibilityThis is a minor concern for documentation/analysis purposes but worth noting.
R/globals.R (1)
113-125: Duplicate global variable declarations exist in this file.Several of the newly added variables are already declared earlier in the file (e.g.,
q_0.5,q_0.25,q_0.75,q_0.05,q_0.95,quantile_level,predicted,forecast_date_model_ww,model_ww,log_genome_copies_per_ml). While this doesn't cause functional issues, it adds maintenance overhead. Consider whetherroxyglobalscan be configured to deduplicate these entries, or consolidate them manually if this file is edited directly.R/plot_figures.R (1)
81-89: Consider adding input validation forforecast_dates.If
forecast_datesis empty orNULL, the function will fail at line 95 (forecast_dates[1]) with an unclear error. Early validation would provide a clearer error message.Suggested validation
plot_multilocation_comparison <- function( output_path, forecast_dates, locations = NULL, hosp_data_long, forecast_horizon_to_plot = 28, historical_data_to_plot = 90, scale_selected = "natural", save_path = NULL) { + if (is.null(forecast_dates) || length(forecast_dates) == 0) { + stop("forecast_dates must be a non-empty character vector") + } + # Get locations from first forecast date if not specifiedtargets/multilocation_plot_targets.R (2)
86-86: Inconsistent namespace usage:read_csvshould bereadr::read_csv.For consistency with the explicit namespace usage of
dplyr::functions on lines 88-90, and to avoid potential conflicts, usereadr::read_csv.Suggested fix
- temp_data <- read_csv(file_path, show_col_types = FALSE) + temp_data <- readr::read_csv(file_path, show_col_types = FALSE)
119-136: Hardcoded locations may not exist in all datasets.The
locations_of_interesttarget hardcodes German cities ("Bremen", "Hamburg", "Berlin"). If running this pipeline on data from a different region or if these locations are missing, themultilocation_specific_plotstarget may fail or produce empty plots.Consider intersecting with
available_locations_multilocto ensure only valid locations are used:Suggested modification
tar_target( name = locations_of_interest, - command = c("Bremen", "Hamburg", "Berlin") + command = { + desired <- c("Bremen", "Hamburg", "Berlin") + intersect(desired, available_locations_multiloc) + } ),
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
renv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
DESCRIPTIONNAMESPACER/globals.RR/plot_figures.R_targets.Rman/lshtm_theme.Rdman/plot_multilocation_comparison.Rdrenv/activate.Rtargets/multilocation_plot_targets.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: lint-changed-files
- GitHub Check: ubuntu-latest (release)
- GitHub Check: windows-latest (release)
- GitHub Check: macOS-latest (release)
- GitHub Check: test-coverage
- GitHub Check: render
🔇 Additional comments (10)
DESCRIPTION (1)
47-49: LGTM!The new dependencies
patchworkandpurrrare appropriate additions for the multilocation plotting functionality.patchworkis commonly used for composing multiple ggplot objects, andpurrrprovides functional programming utilities that are already being used in the targets pipeline.man/lshtm_theme.Rd (1)
1-19: LGTM!The documentation is clear and appropriately describes the LSHTM custom theme function. The colour code and styling purpose are well documented.
renv/activate.R (1)
1-6: Auto-generated renv bootstrap file updated to version 1.1.5.This file is auto-generated by
renvand should not be manually edited. The update from the previous version includes internal improvements to platform prefix handling, JSON parsing, and token handling. These changes are standard upstream updates from the renv package.man/plot_multilocation_comparison.Rd (1)
1-52: LGTM!The documentation is comprehensive and clearly describes the function's purpose, parameters, and return value. The exclusion of ARIMA models and the reference to the original
get_plot_model_comparisonfunction fromEDA_plots.Rprovides helpful context._targets.R (1)
97-105: Code integration looks good.The
multilocation_plot_targetsis correctly added to theplot_targetslist with proper syntax, integrating the new multilocation comparison plotting pipeline into the targets workflow.NAMESPACE (1)
5-6: LGTM!The new exports (
lshtm_theme,plot_multilocation_comparison) and imports align correctly with the roxygen2 documentation inR/plot_figures.R. The additions forpatchwork,ggh4x, and additionalggplot2functions support the new multilocation plotting functionality.Also applies to: 17-17, 36-36, 38-41, 53-53, 63-66
R/plot_figures.R (3)
13-41: LGTM!The theme function is well-structured with clear LSHTM branding. The theme settings provide consistent styling for borders, backgrounds, grids, and text elements.
294-455: Well-structured multi-panel plot construction.The approach of creating per-location plots and combining them with patchwork handles the variable number of wastewater sites per location effectively. The spacer logic ensures consistent alignment across rows.
507-527: LGTM!The save logic with dynamic width calculation based on facet count is a good approach for handling varying plot sizes.
targets/multilocation_plot_targets.R (1)
15-42: Thescorestarget is correctly defined intargets/scoring_targets.R(lines 55–61) and will be properly resolved as a dependency by the targets framework.
R/plot_figures.R
Outdated
| output_path, | ||
| forecast_dates, | ||
| locations = NULL, | ||
| hosp_data_long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hosp_data_long parameter is never used.
This parameter is documented and accepted but never referenced in the function body. The observed hospital data is instead extracted from the forecast CSV files (lines 200-218). Either remove this parameter or integrate it into the plotting logic.
If the parameter should be removed
plot_multilocation_comparison <- function(
output_path,
forecast_dates,
locations = NULL,
- hosp_data_long,
forecast_horizon_to_plot = 28,
historical_data_to_plot = 90,
scale_selected = "natural",
save_path = NULL) {Also update the roxygen documentation to remove lines 57-58.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hosp_data_long, | |
| plot_multilocation_comparison <- function( | |
| output_path, | |
| forecast_dates, | |
| locations = NULL, | |
| forecast_horizon_to_plot = 28, | |
| historical_data_to_plot = 90, | |
| scale_selected = "natural", | |
| save_path = NULL) { |
🤖 Prompt for AI Agents
In @R/plot_figures.R at line 85, The function accepts a hosp_data_long parameter
that is never used; either remove it from the function signature and the roxygen
docs (remove the documented lines 57-58) or wire it into the plotting flow by
replacing the in-function CSV-derived observed hospital extraction (the block
that reads/derives observed hospital data around the forecast CSV handling,
currently lines ~200-218) to use hosp_data_long instead; locate the parameter in
plot_figures.R and update the corresponding roxygen @param entry and any
examples, or adjust the code path that constructs observed hospital data so it
uses the provided hosp_data_long when non-NULL (fall back to CSV only if NULL).
R/plot_figures.R
Outdated
| if (is.null(locations)) { | ||
| first_forecast_path <- file.path( | ||
| output_path, | ||
| "individual_forecasts_all_runs", | ||
| forecast_dates[1] | ||
| ) | ||
| available_locations <- list.dirs( | ||
| first_forecast_path, | ||
| full.names = FALSE, | ||
| recursive = FALSE | ||
| ) | ||
| locations <- sample( | ||
| available_locations, | ||
| size = min(3, length(available_locations)) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random location selection may affect reproducibility.
When locations = NULL, sample() is called without a seed, producing different results each run. This could make debugging difficult and cause inconsistent outputs in the targets pipeline.
Consider either:
- Setting a seed before sampling, or
- Using a deterministic selection (e.g., first 3 alphabetically).
Suggested fix for deterministic selection
if (is.null(locations)) {
first_forecast_path <- file.path(
output_path,
"individual_forecasts_all_runs",
forecast_dates[1]
)
available_locations <- list.dirs(
first_forecast_path,
full.names = FALSE,
recursive = FALSE
)
- locations <- sample(
- available_locations,
- size = min(3, length(available_locations))
- )
+ locations <- sort(available_locations)[seq_len(min(3, length(available_locations)))]
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (is.null(locations)) { | |
| first_forecast_path <- file.path( | |
| output_path, | |
| "individual_forecasts_all_runs", | |
| forecast_dates[1] | |
| ) | |
| available_locations <- list.dirs( | |
| first_forecast_path, | |
| full.names = FALSE, | |
| recursive = FALSE | |
| ) | |
| locations <- sample( | |
| available_locations, | |
| size = min(3, length(available_locations)) | |
| ) | |
| } | |
| if (is.null(locations)) { | |
| first_forecast_path <- file.path( | |
| output_path, | |
| "individual_forecasts_all_runs", | |
| forecast_dates[1] | |
| ) | |
| available_locations <- list.dirs( | |
| first_forecast_path, | |
| full.names = FALSE, | |
| recursive = FALSE | |
| ) | |
| locations <- sort(available_locations)[seq_len(min(3, length(available_locations)))] | |
| } |
🤖 Prompt for AI Agents
In @R/plot_figures.R around lines 91 - 106, When locations is NULL the code uses
sample(available_locations, ...) without a seed which makes selection
non-deterministic; change this to a deterministic selection by sorting
available_locations and taking the first min(3, length(available_locations))
entries (replace the sample(...) call), or alternatively set a fixed seed
immediately before sample() to ensure reproducible picks; update the block that
assigns locations (referencing variables available_locations, locations,
sample(), forecast_dates, and first_forecast_path) accordingly.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @R/plot_figures.R:
- Around line 96-104: The code assumes forecast_dates is non-empty when
sampling; add an early validation check in the block that handles
!show_multiple_dates to detect empty forecast_dates and return or throw a clear
error (or set a default) before any sample() calls. Specifically, in the
function or scope using the forecast_dates and show_multiple_dates variables,
check length(forecast_dates) == 0 and handle it (e.g., stop("forecast_dates is
empty") or set forecast_dates to a safe default) so that the subsequent uses of
sample(forecast_dates, size = 1) (and the dates_2024 logic) cannot attempt to
sample from an empty vector.
- Around line 85-94: The function plot_multilocation_comparison declares a
unused parameter hosp_data_long but instead reads hospital CSVs inside the
function; either remove the parameter and its @param roxygen entry and update
callers (e.g., multilocation_plot_targets.R / remove computing
hosp_data_all_locations) or replace the internal CSV-read block with usage of
the hosp_data_long argument (match the expected columns/format and any
filtering/grouping currently done by the CSV-read code) so the precomputed
target is actually consumed; update roxygen and any callers accordingly.
In @targets/multilocation_plot_targets.R:
- Around line 15-42: The tar target selected_forecast_dates_multiloc currently
uses available_locations_multiloc[1] without declaring a dependency and also
grows valid_dates with repeated c() calls; fix by referencing
available_locations_multiloc at the top of the command block (e.g., assign
test_loc <- available_locations_multiloc[1] immediately) so targets sees the
dependency, and replace the O(n^2) growth pattern by collecting valid paths with
a vectorized or preallocated approach (use vapply/lapply + Filter or preallocate
a character vector of appropriate length then subset) when checking file.exists
on file.path(...) built from scores$forecast_date and output_path_multiloc to
build valid_dates, then return the subsampled sequence as before.
🧹 Nitpick comments (2)
R/plot_figures.R (1)
106-122: Consider adding a check for emptyavailable_locations.If no location directories exist at the forecast path,
available_locationswill be empty. Whilesample(x, size = 0)returns an empty vector without error, this would likely cause downstream failures. A clearer error message here would aid debugging.Suggested improvement
available_locations <- list.dirs( first_forecast_path, full.names = FALSE, recursive = FALSE ) + if (length(available_locations) == 0) { + stop("No location directories found at: ", first_forecast_path) + } locations <- sample( available_locations, size = min(3, length(available_locations)) )targets/multilocation_plot_targets.R (1)
103-117: Random location selection affects reproducibility.Using
locations = NULLcauses random selection of 3 locations, which means the plot output will vary between pipeline runs. If reproducibility is important, consider setting a seed or specifying explicit locations.# Option: Set seed for reproducibility within the target command = { set.seed(42) plot_multilocation_comparison(...) }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
renv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
NAMESPACER/plot_figures.Rman/plot_multilocation_comparison.Rdtargets/multilocation_plot_targets.R
🚧 Files skipped from review as they are similar to previous changes (2)
- man/plot_multilocation_comparison.Rd
- NAMESPACE
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test-coverage
- GitHub Check: lint-changed-files
- GitHub Check: ubuntu-latest (release)
- GitHub Check: macOS-latest (release)
- GitHub Check: windows-latest (release)
🔇 Additional comments (13)
R/plot_figures.R (8)
1-41: LGTM - Clean theme implementation.The
lshtm_theme()function is well-documented and correctly implements a custom ggplot2 theme with LSHTM branding. The styling choices are consistent and the function follows ggplot2 conventions.
124-171: Data loading logic is sound.The nested loop structure handles missing files gracefully with
file.exists()checks. The approach of building lists and combining withbind_rows()later is appropriate.Consider wrapping
read_csv()calls intryCatch()if malformed CSV files are a possibility in production.
173-211: LGTM - Future observations loading is well-handled.The logic to find a forecast date approximately 8 weeks later and load wastewater observations gracefully handles the case where no such date exists.
213-298: Data processing and filtering logic is correct.The approach of:
- Combining forecasts via
bind_rows()- Validating data presence with an informative error
- Filtering to forecast period and specific model
- De-duplicating observations
follows good data wrangling practices. The
obs_timingcategorisation (historical vs future) provides useful visual distinction in plots.
300-398: Wastewater data processing is thorough.The approach correctly:
- Filters to forecast period
- Maps later observations to existing site names via merge
- Handles de-duplication to avoid double-counting observations
The
obs_timingdistinction between historical and future wastewater observations maintains consistency with hospital data handling.
400-599: Plot composition with patchwork is well-designed.The approach of creating separate plots per location and combining them with
patchwork::wrap_plots()effectively avoids empty facet cells that would occur with a standardfacet_grid(). The use ofplot_spacer()to align columns across locations with differing wastewater site counts is a good solution.The nested
geom_ribbon()calls withalpha = 0.3for both the 25-75% and 5-95% credible intervals create appropriate visual layering.
600-649: Hospital-only fallback plot is consistent.The fallback when no wastewater data is available correctly mirrors the styling of the combined plot, maintaining visual consistency.
651-674: Save logic with dynamic sizing is practical.The dynamic width calculation based on facet count, capped at 40 inches, is a sensible approach for multi-panel plots. The use of
dir.create(..., recursive = TRUE, showWarnings = FALSE)appropriately handles existing directories.targets/multilocation_plot_targets.R (5)
1-11: Path target definitions are clear.The output path targets are straightforward and well-named.
43-47: LGTM.Simple and clear extraction of unique locations from scores.
48-102: This target's output is unused due to upstream bug.As noted in
R/plot_figures.R, thehosp_data_longparameter ofplot_multilocation_comparison()is never used. This target computeshosp_data_all_locationswhich is passed to that parameter but the data is discarded.Once the upstream issue is resolved (either by using the parameter or removing it), this target may need adjustment accordingly.
118-136: Specific locations target is well-defined.The
locations_of_interesttarget provides explicit control over which locations to plot, andmultilocation_specific_plotscorrectly uses this for reproducible output.
137-169: Single-date plot variants are appropriately configured.The
show_multiple_dates = FALSEparameter is correctly used to generate single-date variants. The save paths include clear subdirectory names (single_date) for organisation.
R/plot_figures.R
Outdated
| plot_multilocation_comparison <- function( | ||
| output_path, | ||
| forecast_dates, | ||
| locations = NULL, | ||
| hosp_data_long, | ||
| forecast_horizon_to_plot = 28, | ||
| historical_data_to_plot = 90, | ||
| scale_selected = "natural", | ||
| save_path = NULL, | ||
| show_multiple_dates = TRUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused parameter hosp_data_long - dead code.
The hosp_data_long parameter is declared and documented but never used within the function body. The function reads hospital data directly from CSV files (lines 136-160) instead of using this parameter.
This causes the hosp_data_all_locations target in multilocation_plot_targets.R to compute data that is never utilised.
Either:
- Remove the parameter if it's not needed, or
- Use the parameter instead of re-reading from files
Option 1: Remove unused parameter
plot_multilocation_comparison <- function(
output_path,
forecast_dates,
locations = NULL,
- hosp_data_long,
forecast_horizon_to_plot = 28,
historical_data_to_plot = 90,
scale_selected = "natural",
save_path = NULL,
show_multiple_dates = TRUE) {Also update the roxygen documentation to remove @param hosp_data_long.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| plot_multilocation_comparison <- function( | |
| output_path, | |
| forecast_dates, | |
| locations = NULL, | |
| hosp_data_long, | |
| forecast_horizon_to_plot = 28, | |
| historical_data_to_plot = 90, | |
| scale_selected = "natural", | |
| save_path = NULL, | |
| show_multiple_dates = TRUE) { | |
| plot_multilocation_comparison <- function( | |
| output_path, | |
| forecast_dates, | |
| locations = NULL, | |
| forecast_horizon_to_plot = 28, | |
| historical_data_to_plot = 90, | |
| scale_selected = "natural", | |
| save_path = NULL, | |
| show_multiple_dates = TRUE) { |
🤖 Prompt for AI Agents
In @R/plot_figures.R around lines 85 - 94, The function
plot_multilocation_comparison declares a unused parameter hosp_data_long but
instead reads hospital CSVs inside the function; either remove the parameter and
its @param roxygen entry and update callers (e.g., multilocation_plot_targets.R
/ remove computing hosp_data_all_locations) or replace the internal CSV-read
block with usage of the hosp_data_long argument (match the expected
columns/format and any filtering/grouping currently done by the CSV-read code)
so the precomputed target is actually consumed; update roxygen and any callers
accordingly.
R/plot_figures.R
Outdated
| if (!show_multiple_dates) { | ||
| # Filter to 2024 dates if available | ||
| dates_2024 <- forecast_dates[grepl("^2024-", forecast_dates)] | ||
| if (length(dates_2024) > 0) { | ||
| forecast_dates <- sample(dates_2024, size = 1) | ||
| } else { | ||
| forecast_dates <- sample(forecast_dates, size = 1) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add guard for empty forecast_dates input.
If forecast_dates is passed as an empty vector, sample(forecast_dates, size = 1) at line 102 will throw an error. Consider adding an early validation check.
Suggested guard
plot_multilocation_comparison <- function(
output_path,
forecast_dates,
...
show_multiple_dates = TRUE) {
+ # Validate inputs
+ if (length(forecast_dates) == 0) {
+ stop("forecast_dates must not be empty")
+ }
+
# If show_multiple_dates is FALSE, randomly select one forecast date
if (!show_multiple_dates) {🤖 Prompt for AI Agents
In @R/plot_figures.R around lines 96 - 104, The code assumes forecast_dates is
non-empty when sampling; add an early validation check in the block that handles
!show_multiple_dates to detect empty forecast_dates and return or throw a clear
error (or set a default) before any sample() calls. Specifically, in the
function or scope using the forecast_dates and show_multiple_dates variables,
check length(forecast_dates) == 0 and handle it (e.g., stop("forecast_dates is
empty") or set forecast_dates to a safe default) so that the subsequent uses of
sample(forecast_dates, size = 1) (and the dates_2024 logic) cannot attempt to
sample from an empty vector.
targets/multilocation_plot_targets.R
Outdated
| tar_target( | ||
| name = selected_forecast_dates_multiloc, | ||
| command = { | ||
| dates <- unique(scores$forecast_date) | ||
| dates <- sort(dates) | ||
|
|
||
| # Filter to only dates with complete data | ||
| valid_dates <- c() | ||
| test_loc <- available_locations_multiloc[1] | ||
| for (test_date in dates) { | ||
| date_str <- as.character(as.Date(test_date, origin = "1970-01-01")) | ||
| test_path <- file.path( | ||
| output_path_multiloc, | ||
| "individual_forecasts_all_runs", | ||
| date_str, | ||
| test_loc, | ||
| "data", | ||
| "hosp_quantiles_ww_TRUE.csv" | ||
| ) | ||
| if (file.exists(test_path)) { | ||
| valid_dates <- c(valid_dates, date_str) | ||
| } | ||
| } | ||
|
|
||
| # Return every other valid date for clarity | ||
| valid_dates[seq(1, length(valid_dates), by = 2)] | ||
| } | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Target ordering dependency may cause issues.
This target references available_locations_multiloc[1] at line 23, but available_locations_multiloc is defined later in the list (lines 44-47). In targets, the order of target definitions in a list doesn't guarantee execution order — that's determined by the dependency graph. Since this target doesn't explicitly depend on available_locations_multiloc, the reference may fail.
Additionally, the vector-growing pattern valid_dates <- c(valid_dates, date_str) is O(n²) due to reallocation. Consider using vapply or pre-allocating.
Fix dependency and improve efficiency
tar_target(
name = selected_forecast_dates_multiloc,
command = {
+ test_loc <- available_locations_multiloc[1]
dates <- unique(scores$forecast_date)
dates <- sort(dates)
# Filter to only dates with complete data
- valid_dates <- c()
- test_loc <- available_locations_multiloc[1]
- for (test_date in dates) {
- date_str <- as.character(as.Date(test_date, origin = "1970-01-01"))
- test_path <- file.path(
- output_path_multiloc,
- "individual_forecasts_all_runs",
- date_str,
- test_loc,
- "data",
- "hosp_quantiles_ww_TRUE.csv"
- )
- if (file.exists(test_path)) {
- valid_dates <- c(valid_dates, date_str)
- }
- }
+ date_strs <- as.character(as.Date(dates, origin = "1970-01-01"))
+ valid_dates <- date_strs[vapply(date_strs, function(date_str) {
+ test_path <- file.path(
+ output_path_multiloc,
+ "individual_forecasts_all_runs",
+ date_str,
+ test_loc,
+ "data",
+ "hosp_quantiles_ww_TRUE.csv"
+ )
+ file.exists(test_path)
+ }, logical(1))]
# Return every other valid date for clarity
valid_dates[seq(1, length(valid_dates), by = 2)]
}
),This ensures available_locations_multiloc is referenced at the start of the command block, establishing the dependency.
🤖 Prompt for AI Agents
In @targets/multilocation_plot_targets.R around lines 15 - 42, The tar target
selected_forecast_dates_multiloc currently uses available_locations_multiloc[1]
without declaring a dependency and also grows valid_dates with repeated c()
calls; fix by referencing available_locations_multiloc at the top of the command
block (e.g., assign test_loc <- available_locations_multiloc[1] immediately) so
targets sees the dependency, and replace the O(n^2) growth pattern by collecting
valid paths with a vectorized or preallocated approach (use vapply/lapply +
Filter or preallocate a character vector of appropriate length then subset) when
checking file.exists on file.path(...) built from scores$forecast_date and
output_path_multiloc to build valid_dates, then return the subsampled sequence
as before.
PR to create figures showing:
Following on from #107, still need to:
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.