Skip to content

Phylogenetic: Add frequencies panel#89

Merged
j23414 merged 2 commits intomainfrom
add-frequencies
Mar 26, 2025
Merged

Phylogenetic: Add frequencies panel#89
j23414 merged 2 commits intomainfrom
add-frequencies

Conversation

@j23414
Copy link
Contributor

@j23414 j23414 commented Mar 18, 2025

Description of proposed changes

Add frequencies panel

Related issue(s)

Checklist

Pushed to staging at:

Comment on lines +83 to +85
max_date: "6M"
narrow_bandwidth: 0.2
wide_bandwidth: 0.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recapping a couple of comments from Slack, could you document the reason for selecting these values for the frequencies parameters? As I understand the process, these parameters are copied from the seasonal-cov builds, but they aren't necessarily what we want for WNV. Below, I've summarized some details about how these parameters alter frequency estimation behavior which could help you make decisions about which parameter values to keep or change.

The "6M" max date sets a relative upper limit on when frequencies can be estimated based on when the augur frequencies command is run. This "6M" notation behaves just like the augur filter relative dates where "6M" means "6 months prior to the current date". In the context of this WNV build, the behavior will be to set the maximum frequency estimation date to 6 months prior to the build date. If there are sequences collected in the last 6 months, they won't be represented by the frequencies panel. If there are no sequences collected recently at all, the upper limit for the frequencies panel will keep moving forward in time, even if there aren't any data points to estimate frequencies for. If you omit the max date parameter from augur frequencies, it will use the maximum collection date from the sample metadata as the max date. This approach prevents the build from trying estimate frequencies later than the available data.

Similarly, augur frequencies will use the minimum collection date in the given metadata to set the earliest timepoint to estimate frequencies. For phylogenetic builds that have sparse historical data that act as genetic context for recent data, we want to set the minimum date to a later value. The --min-date parameter also accepts relative dates just like the --max-date parameter, so if you only wanted to estimate frequencies for the last two years of data from the build date, you could set this parameter to “2Y”.

The narrow and wide bandwidth parameters are in units of years and set the KDE width of each sample centered on the sample’s collection date. To set a bandwidth of a sample to 1 month, you could set the narrow bandwidth to 1 / 12 or 0.0833 and set the “proportion wide” parameter to 0.

The wide bandwidth parameter was added back in the day when we were still experimenting with the KDE frequency implementation and wanted a way to implement longer-tailed distributions. The narrow and wide bandwidths get added together based on the requested proportion of the wide bandwidth to include (default for —proportion-wide is 0.25). In practice, I’ve never seen a need to use the wide bandwidth (we disable it in ncov and seasonal flu). An easy way to inspect your KDE distributions for a build is to load the main and frequencies JSONs into Auspice, click a single sample in the tree, and toggle off the “Normalize frequencies” option in the left control panel. These changes make the frequency panel show the KDE distribution for the single sample you’ve selected which is representative of all samples in the tree.

As an example of this view, the first image attached here shows the effect of the wider tails for a single WNV sample that causes that sample to have a nonzero frequency between 2006 and 2010. The second image shows an example distribution from the H3N2 12y tree where the sample has a nonzero frequency from April-November in 2022 (narrow bandwidth=0.0833 or approx. 1 month, wide bandwidth proportion=0). Unless there is a good reason to add the wider tails to the KDE distributions, I’d err towards disabling the wide bandwidth by setting the proportion wide to 0.

image

image 2

I should also emphasize that most of what I've written above doesn't exist in any official documentation, so there's no way anyone could know the behavior of augur frequencies without digging into the code. I'll use these comments here to start some proper documentation about frequency estimation that should help folks with future builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! After some exploratory graphics, picked a narrow bandwidth (page 2), and added frequency documentation comments to the config file:

tip_frequencies:
# 2000 since there is an increase in WNV at that time
min_date: "2000-01-01"
max_date: "6M"
# Quarterly narrow_bandwidth or every 3 months (3 /12.0 = 0.25)
narrow_bandwidth: 0.25
proportion_wide: 0.0

And staged the new trees:

j23414 added 2 commits March 24, 2025 14:15
Generally, instead of wide_bandwidth used proportion_wide
Added documentation of frequencies parameter choices in the config file
@j23414 j23414 merged commit 036de0e into main Mar 26, 2025
29 checks passed
@j23414 j23414 deleted the add-frequencies branch March 26, 2025 16:30
kimandrews added a commit to nextstrain/tb that referenced this pull request Aug 29, 2025
Remove wide-bandwidth parameter based on prior discussion: nextstrain/WNV#89
kimandrews added a commit to nextstrain/tb that referenced this pull request Sep 23, 2025
Remove wide-bandwidth parameter based on prior discussion: nextstrain/WNV#89
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add frequencies plot

3 participants