Skip to content

Conversation

@kaitejohnson
Copy link
Collaborator

@kaitejohnson kaitejohnson commented Jan 5, 2026

First pass from the Variant Nowcast Hub's metadata files. Missing citations for the ones that don't have it -- thinking of just putting a repo url?

Co-authors:

  • Feel free to review and edit any columns in your team's row. You could either make a suggestion on this PR or you can send me an edited markdown and I will merge everything.

Summary by CodeRabbit

  • Documentation
    • Added reference documentation table for seven COVID-19 variant nowcasting predictive models, including descriptions, citations, data sources, geographic coverage, output types (point and probabilistic forecasts), and ensemble classification to support model evaluation and selection.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces a new documentation file containing a Markdown table that documents seven predictive models for COVID-19 variant nowcasting. The table includes details such as model names, descriptions, citations, data sources, locations, output types, and ensemble status for each model.

Changes

Cohort / File(s) Summary of changes
Documentation
docs/model_description_table.md
Added Markdown table documenting seven COVID-19 variant nowcasting models with columns for model name, description, citation, data sources, locations, output type, and ensemble status

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Draft model description table' clearly and concisely describes the main change—introducing a new documentation file with a table describing predictive models, which aligns with the changeset content.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch model-description-table

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kaitejohnson kaitejohnson marked this pull request as ready for review January 5, 2026 16:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Fix all issues with AI Agents 🤖
In @docs/model_description_table.md:
- Around line 3-4: Replace the placeholder "[citation]" entries for Hub-baseline
and UMass-HMLR with their full citation metadata: obtain the author list, title,
publication or preprint server, year, DOI/URL, and any version info from the
respective model owners/co-authors, then update the table rows for the
Hub-baseline and UMass-HMLR entries to include the formatted citation
(consistent with the project's citation style) and validate the links/DOIs;
notify the contributors if any details are missing and do not merge until both
citation fields are populated and link-checked.
- Line 8: The CADPH-CATaLog table row has a malformed Description cell mixing
citation/version notes and an unclosed quote; replace the Description content
for the CADPH-CATaLog entry so it contains only a concise model description
(e.g., "Fits a logistic growth function to estimate variant frequency over
time") and move any citation/version details back to the Citation or Notes
column if needed; update the CADPH-CATaLog row to match the format/style of the
CADPH-CATaMaran row so the Description column is clean and markdown is valid.
- Line 6: The table entry for LANL-CovTransformer has two issues: use "14-day
prediction" (hyphenate the compound adjective) and resolve the ensemble mismatch
between the description and the "Ensemble?" column; update either the
description for "LANL-CovTransformer" to clarify that the model is an internal
multi-stage ensemble (e.g., "an integrated multi-stage ensemble of 5 Stage 1
models used to produce a 14-day prediction") or change the "Ensemble?" column to
"Yes" if it should indicate internal ensemble status so the description and the
"Ensemble?" column are consistent.
🧹 Nitpick comments (1)
docs/model_description_table.md (1)

7-7: Minor wordiness improvement.

The phrase "prior to the fitting period" is slightly wordy. Consider shortening to "before the fitting period" for improved readability.

🔎 Proposed fix
- We use similar parameters as our PANGO lineage Multinomial Logistic Regression on CalCAT, except that at any time 31 days of data are obscured prior to the fitting period to exclude noisy data being backfilled.
+ We use similar parameters as our PANGO lineage Multinomial Logistic Regression on CalCAT, except that at any time 31 days of data are obscured before the fitting period to exclude noisy data being backfilled.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 615c782 and 3fa98e0.

📒 Files selected for processing (1)
  • docs/model_description_table.md
🧰 Additional context used
🪛 LanguageTool
docs/model_description_table.md

[uncategorized] ~6-~6: When a number forms part of an adjectival compound, use a hyphen.
Context: ...ensemble of models, which first makes a 14 day prediction using 5 models (Stage 1 mode...

(MISSING_HYPHEN)


[style] ~7-~7: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...t any time 31 days of data are obscured prior to the fitting period to exclude noisy dat...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

🔇 Additional comments (2)
docs/model_description_table.md (2)

1-2: Table structure looks good.

The markdown table is well-formatted with clear column headers and appropriate separators.


5-5: UGA-multicast entry is well-documented.

Clear description, complete citation, and appropriate metadata. No concerns.

Comment on lines 3 to 4
| Hub-baseline | A Bayesian multinomial logistic regression model that makes predictions at the national level. This model uses a linear in logit space model for the growth of the variants and makes the same predictions for each state. | [citation] | NextStrain | All | Point and probabilistic | No |
| UMass-HMLR | A Bayesian hierarchical multinomial logistic regression (HMLR) model for nowcasting COVID variants. Regression coefficients are modeled hierarchically across variants and locations. | [citation] | NextStrain | All | Point and probabilistic | No |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing citations placeholders require community contribution.

Lines 3 and 4 contain [citation] placeholders for Hub-baseline and UMass-HMLR. As noted in the PR description, these are awaiting contributions from co-authors. Please ensure citation details are collected and merged before this documentation is published to users.

I can help format and validate citations once the details are provided. Would you like me to generate a template or checklist to help collect these from the respective teams?

🤖 Prompt for AI Agents
In @docs/model_description_table.md around lines 3-4, Replace the placeholder
"[citation]" entries for Hub-baseline and UMass-HMLR with their full citation
metadata: obtain the author list, title, publication or preprint server, year,
DOI/URL, and any version info from the respective model owners/co-authors, then
update the table rows for the Hub-baseline and UMass-HMLR entries to include the
formatted citation (consistent with the project's citation style) and validate
the links/DOIs; notify the contributors if any details are missing and do not
merge until both citation fields are populated and link-checked.

| Hub-baseline | A Bayesian multinomial logistic regression model that makes predictions at the national level. This model uses a linear in logit space model for the growth of the variants and makes the same predictions for each state. | [citation] | NextStrain | All | Point and probabilistic | No |
| UMass-HMLR | A Bayesian hierarchical multinomial logistic regression (HMLR) model for nowcasting COVID variants. Regression coefficients are modeled hierarchically across variants and locations. | [citation] | NextStrain | All | Point and probabilistic | No |
| UGA-multicast | A multinomial logistic regression model with no fancy features for nowcasting COVID-19 variants. | Feng, Y., Goldberg, E. E., Kupperman, M., Zhang, X., Lin, Y., and Ke, R. (2024). CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting. Virus Evolution, to appear. | NextStrain | All | Probabilistic | No |
| LANL-CovTransformer | CovTransformer is a streamlined single-layer transformer architecture augmented with linear input and output layers, using embedding dimensions of 8 and dual attention heads. CovTransformer is an integrated ensemble of models, which first makes a 14 day prediction using 5 models (Stage 1 models). | Feng, Y., Goldberg, E. E., Kupperman, M., Zhang, X., Lin, Y., and Ke, R. (2024). CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting. Virus Evolution, to appear. | NextStrain for inference, GISAID for training | All | Point | No |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n docs/model_description_table.md

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 2932


Fix hyphenation and clarify ensemble terminology.

Change "14 day prediction" to "14-day prediction" (compound adjectives should be hyphenated).

The description states "an integrated ensemble of models" whilst the Ensemble? column shows "No." Clarify whether this refers to the model's internal structure (ensemble of stages) or its participation in the Nowcast Hub ensemble, and update the description or column accordingly for consistency.

🔎 Proposed hyphenation fix
- CovTransformer is a streamlined single-layer transformer architecture augmented with linear input and output layers, using embedding dimensions of 8 and dual attention heads. CovTransformer is an integrated ensemble of models, which first makes a 14 day prediction using 5 models (Stage 1 models).
+ CovTransformer is a streamlined single-layer transformer architecture augmented with linear input and output layers, using embedding dimensions of 8 and dual attention heads. CovTransformer is an integrated ensemble of models, which first makes a 14-day prediction using 5 models (Stage 1 models).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| LANL-CovTransformer | CovTransformer is a streamlined single-layer transformer architecture augmented with linear input and output layers, using embedding dimensions of 8 and dual attention heads. CovTransformer is an integrated ensemble of models, which first makes a 14 day prediction using 5 models (Stage 1 models). | Feng, Y., Goldberg, E. E., Kupperman, M., Zhang, X., Lin, Y., and Ke, R. (2024). CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting. Virus Evolution, to appear. | NextStrain for inference, GISAID for training | All | Point | No |
| LANL-CovTransformer | CovTransformer is a streamlined single-layer transformer architecture augmented with linear input and output layers, using embedding dimensions of 8 and dual attention heads. CovTransformer is an integrated ensemble of models, which first makes a 14-day prediction using 5 models (Stage 1 models). | Feng, Y., Goldberg, E. E., Kupperman, M., Zhang, X., Lin, Y., and Ke, R. (2024). CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting. Virus Evolution, to appear. | NextStrain for inference, GISAID for training | All | Point | No |
🧰 Tools
🪛 LanguageTool

[uncategorized] ~6-~6: When a number forms part of an adjectival compound, use a hyphen.
Context: ...ensemble of models, which first makes a 14 day prediction using 5 models (Stage 1 mode...

(MISSING_HYPHEN)

🤖 Prompt for AI Agents
In @docs/model_description_table.md around line 6, The table entry for
LANL-CovTransformer has two issues: use "14-day prediction" (hyphenate the
compound adjective) and resolve the ensemble mismatch between the description
and the "Ensemble?" column; update either the description for
"LANL-CovTransformer" to clarify that the model is an internal multi-stage
ensemble (e.g., "an integrated multi-stage ensemble of 5 Stage 1 models used to
produce a 14-day prediction") or change the "Ensemble?" column to "Yes" if it
should indicate internal ensemble status so the description and the "Ensemble?"
column are consistent.

| UGA-multicast | A multinomial logistic regression model with no fancy features for nowcasting COVID-19 variants. | Feng, Y., Goldberg, E. E., Kupperman, M., Zhang, X., Lin, Y., and Ke, R. (2024). CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting. Virus Evolution, to appear. | NextStrain | All | Probabilistic | No |
| LANL-CovTransformer | CovTransformer is a streamlined single-layer transformer architecture augmented with linear input and output layers, using embedding dimensions of 8 and dual attention heads. CovTransformer is an integrated ensemble of models, which first makes a 14 day prediction using 5 models (Stage 1 models). | Feng, Y., Goldberg, E. E., Kupperman, M., Zhang, X., Lin, Y., and Ke, R. (2024). CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting. Virus Evolution, to appear. | NextStrain for inference, GISAID for training | All | Point | No |
| CADPH-CATaMaran | We use similar parameters as our PANGO lineage Multinomial Logistic Regression on CalCAT, except that at any time 31 days of data are obscured prior to the fitting period to exclude noisy data being backfilled. This model can accommodate resampling eventually. | Wadford et al. Implementation of California COVIDNet - a multi-sector collaboration for statewide SARS-CoV-2 genomic surveillance. Front Public Health. 2023 Oct 23;11:1249614. doi: 10.3389/fpubh.2023.1249614. PMID: 37937074; PMCID: PMC10627185. | Theaigen Genomics | All | Point | No |
| CADPH-CATaLog | Fits a logistic growth function to data. Based on Althaus, Christian L., et al. "A Tale of Two Variants: Spread of SARS-CoV-2 Variants Alpha in Geneva, Switzerland, and Beta in South Africa. 1.1 Adds filtering criteria/ GISAID data addition | Wadford et al. Implementation of California COVIDNet - a multi-sector collaboration for statewide SARS-CoV-2 genomic surveillance. Front Public Health. 2023 Oct 23;11:1249614. doi: 10.3389/fpubh.2023.1249614. PMID: 37937074; PMCID: PMC10627185. | Theaigen Genomics | All | Point | No | No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file
find . -name "model_description_table.md" -type f

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 111


🏁 Script executed:

# Read the file with line numbers to examine the content around line 8
cat -n docs/model_description_table.md | head -20

Repository: epiforecasts/evalvariantnowcasthub

Length of output: 2932


Fix the malformed description in the CADPH-CATaLog entry.

The description field contains an unclosed quotation mark and mixes model description with citation text and version notes, resulting in corrupted markdown. The text should provide only a coherent description of the model, separate from the Citation column which already contains the reference.

Correct the entry to follow the format of other rows (e.g. CADPH-CATaMaran on the preceding line), with a clear description in the Description column only.

🤖 Prompt for AI Agents
In @docs/model_description_table.md around line 8, The CADPH-CATaLog table row
has a malformed Description cell mixing citation/version notes and an unclosed
quote; replace the Description content for the CADPH-CATaLog entry so it
contains only a concise model description (e.g., "Fits a logistic growth
function to estimate variant frequency over time") and move any citation/version
details back to the Citation or Notes column if needed; update the CADPH-CATaLog
row to match the format/style of the CADPH-CATaMaran row so the Description
column is clean and markdown is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants