-
Notifications
You must be signed in to change notification settings - Fork 1
Issue 106: analysis pipeline #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
74cfc3e
68a859c
6495364
3e8d6a9
841ab78
b0d5b08
b1ec388
8a90410
99fbd27
5ed7169
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| prep_scores_to_model <- function(scores_long, | ||
| ww_metadata) { | ||
| # Pivot scores from long to wide | ||
|
|
||
| # Join wastewater metadata | ||
|
|
||
| # Placeholder | ||
| return(NULL) | ||
| } | ||
|
Comment on lines
+1
to
+9
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Implement the placeholder function before merging. This function is called by Would you like me to generate a basic implementation template for pivoting scores and joining metadata, or should this target be temporarily removed from the pipeline? 🤖 Prompt for AI Agents
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am going to ignore this for a subsequent PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kaitejohnson, understood! Would you like me to open a follow-up issue to track the implementation of |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| # Targets script for generating forecasts and performing immediate | ||
| # post-processing (quantiling and scoring) | ||
|
|
||
| # The pipeline can be run using `tar_make(script = "_targets_model_run.R")` | ||
|
|
||
| library(targets) | ||
| library(jsonlite) | ||
| library(httr) | ||
| library(tarchetypes) | ||
| library(wwinference) | ||
| library(dplyr) | ||
| library(ggplot2) | ||
| library(readr) | ||
| library(here) | ||
| library(purrr) | ||
| library(lubridate) | ||
| library(tidyr) | ||
| library(glue) | ||
| library(fs) | ||
| library(rlang) | ||
| library(scoringutils) | ||
| library(forecast) | ||
| library(future) | ||
| library(future.callr) | ||
|
|
||
| # load functions | ||
| functions <- list.files(here("R"), full.names = TRUE) | ||
| walk(functions, source) | ||
| rm("functions") | ||
|
|
||
| n_workers <- as.integer(floor(future::availableCores() / 4)) | ||
| plan(multisession, workers = n_workers) | ||
kaitejohnson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # load target modules | ||
| targets <- list.files(here("targets"), full.names = TRUE) | ||
| targets <- grep("*\\.R", targets, value = TRUE) | ||
| purrr::walk(targets, source) | ||
kaitejohnson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| tar_option_set( | ||
| packages = c( | ||
| "wwinference", | ||
| "tibble", | ||
| "dplyr", | ||
| "ggplot2", | ||
| "readr", | ||
| "lubridate", | ||
| "tidyr", | ||
| "glue", | ||
| "forecast", | ||
| "jsonlite", | ||
| "httr" | ||
| ), | ||
| workspace_on_error = TRUE, | ||
| storage = "worker", | ||
| retrieval = "worker", | ||
| memory = "transient", | ||
| garbage_collection = TRUE, | ||
| format = "parquet", # default storage format | ||
| error = "continue" | ||
| ) | ||
|
|
||
| ## Set up the date:location:model:ww+/-:right-trunc+/- permutations | ||
| set_up <- list( | ||
| create_permutations_targets | ||
| ) | ||
|
|
||
|
|
||
| ## Iterate over all permutations. For each: | ||
| # - extract the necessary data | ||
| # - pre-process the data based on the model's requirements | ||
| # - fit the model | ||
| # - extract posterior hospital admissions (calibration and forecast) | ||
| # - score the forecasts using CRPS and extract | ||
| # - quantile the calibration and forecasted admissions and extract | ||
| # - extract input data (hosp and/or ww) | ||
| # - extract model diagnostics | ||
|
|
||
| # Current set up: uses the `scenarios` tibble to do dynamic branching within | ||
| # each function via pattern = map(ind_data_created, scenarios) | ||
| load_data <- list( | ||
| # Load data for each location/forecast date combination | ||
| load_data_targets, | ||
| load_baseline_data_targets | ||
| ) | ||
|
|
||
| fit_models <- list( | ||
| fit_model_targets, | ||
| fit_baseline_model_targets | ||
| ) | ||
|
|
||
| scoring <- list( | ||
| scoring_targets | ||
| ) | ||
|
|
||
| list( | ||
| set_up, | ||
| load_data, | ||
| fit_models, | ||
| scoring | ||
| ) | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,33 @@ | ||||||
| scores <- read_csv(file.path("output", "overall_data", "scores.csv")) | ||||||
|
|
||||||
| scores_overall <- scores |> | ||||||
| group_by(model, include_ww) |> | ||||||
| summarise(wis = mean(wis)) |> | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion | 🟠 Major Add If the 🔎 Proposed fix- summarise(wis = mean(wis)) |>
+ summarise(wis = mean(wis, na.rm = TRUE)) |>📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. alos ignoring stuff in scratch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kaitejohnson, understood! Since If you'd like to prevent automated reviews on the reviews:
path_filters:
- "!scratch/**"This will exclude all files under |
||||||
| ungroup() |> | ||||||
| mutate(model_ww = glue::glue("{model}-{include_ww}")) | ||||||
|
|
||||||
|
|
||||||
| ggplot(scores_overall) + | ||||||
| geom_bar(aes(x = model_ww, y = wis, fill = model_ww), | ||||||
| stat = "identity", position = "stack" | ||||||
| ) | ||||||
|
|
||||||
| rwis <- scores_overall$wis / scores_overall$wis[2] | ||||||
kaitejohnson marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| scores_by_date <- scores |> | ||||||
| group_by(model, include_ww, forecast_date) |> | ||||||
| summarise(wis = mean(wis)) |> | ||||||
| ungroup() |> | ||||||
| mutate(model_ww = glue::glue("{model}-{include_ww}")) | ||||||
|
|
||||||
| ggplot(scores_by_date) + | ||||||
| geom_line(aes(x = forecast_date, y = wis, color = model_ww)) | ||||||
|
|
||||||
| scores_by_loc <- scores |> | ||||||
| group_by(model, include_ww, location) |> | ||||||
| summarise(wis = mean(wis)) |> | ||||||
| ungroup() |> | ||||||
| mutate(model_ww = glue::glue("{model}-{include_ww}")) | ||||||
|
|
||||||
| ggplot(scores_by_loc) + | ||||||
| geom_point(aes(x = location, y = wis, color = model_ww)) | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| analysis_EDA_plot_targets <- list( | ||
| tar_target( | ||
| name = plot_scores_by_date, | ||
| command = get_plot_scores_by_date(scores) | ||
| ), | ||
| tar_target( | ||
| name = scatterplot_scores, | ||
| command = get_scatterplot_scores(scores) | ||
| ) | ||
| ) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| analysis_config_targets <- list( | ||
| tar_target( | ||
| ww_data_post, | ||
| get_ww_as_of_forecast_date( | ||
| forecast_date = scenarios$forecast_date, | ||
| location_name = scenarios$location_name, | ||
| location_abbr = scenarios$location_abbr, | ||
| calibration_period = calibration_period_wwinference, | ||
| path_to_lod_vals = path_to_lod_vals | ||
| ), | ||
| pattern = map(scenarios) | ||
| ), | ||
| tar_target( | ||
| name = scores_fp, | ||
| command = file.path("output", "overall_data_all_runs", "scores.csv") | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added brackets here to avoid this error:
Error in rename(): ! Can't rename columns with TRUE. ✖ TRUE must be numeric or character, not TRUE.