Conversation
phylogenetic/defaults/config.yaml
Outdated
| max_date: "6M" | ||
| narrow_bandwidth: 0.2 | ||
| wide_bandwidth: 0.6 |
There was a problem hiding this comment.
Recapping a couple of comments from Slack, could you document the reason for selecting these values for the frequencies parameters? As I understand the process, these parameters are copied from the seasonal-cov builds, but they aren't necessarily what we want for WNV. Below, I've summarized some details about how these parameters alter frequency estimation behavior which could help you make decisions about which parameter values to keep or change.
The "6M" max date sets a relative upper limit on when frequencies can be estimated based on when the augur frequencies command is run. This "6M" notation behaves just like the augur filter relative dates where "6M" means "6 months prior to the current date". In the context of this WNV build, the behavior will be to set the maximum frequency estimation date to 6 months prior to the build date. If there are sequences collected in the last 6 months, they won't be represented by the frequencies panel. If there are no sequences collected recently at all, the upper limit for the frequencies panel will keep moving forward in time, even if there aren't any data points to estimate frequencies for. If you omit the max date parameter from augur frequencies, it will use the maximum collection date from the sample metadata as the max date. This approach prevents the build from trying estimate frequencies later than the available data.
Similarly, augur frequencies will use the minimum collection date in the given metadata to set the earliest timepoint to estimate frequencies. For phylogenetic builds that have sparse historical data that act as genetic context for recent data, we want to set the minimum date to a later value. The --min-date parameter also accepts relative dates just like the --max-date parameter, so if you only wanted to estimate frequencies for the last two years of data from the build date, you could set this parameter to “2Y”.
The narrow and wide bandwidth parameters are in units of years and set the KDE width of each sample centered on the sample’s collection date. To set a bandwidth of a sample to 1 month, you could set the narrow bandwidth to 1 / 12 or 0.0833 and set the “proportion wide” parameter to 0.
The wide bandwidth parameter was added back in the day when we were still experimenting with the KDE frequency implementation and wanted a way to implement longer-tailed distributions. The narrow and wide bandwidths get added together based on the requested proportion of the wide bandwidth to include (default for —proportion-wide is 0.25). In practice, I’ve never seen a need to use the wide bandwidth (we disable it in ncov and seasonal flu). An easy way to inspect your KDE distributions for a build is to load the main and frequencies JSONs into Auspice, click a single sample in the tree, and toggle off the “Normalize frequencies” option in the left control panel. These changes make the frequency panel show the KDE distribution for the single sample you’ve selected which is representative of all samples in the tree.
As an example of this view, the first image attached here shows the effect of the wider tails for a single WNV sample that causes that sample to have a nonzero frequency between 2006 and 2010. The second image shows an example distribution from the H3N2 12y tree where the sample has a nonzero frequency from April-November in 2022 (narrow bandwidth=0.0833 or approx. 1 month, wide bandwidth proportion=0). Unless there is a good reason to add the wider tails to the KDE distributions, I’d err towards disabling the wide bandwidth by setting the proportion wide to 0.
I should also emphasize that most of what I've written above doesn't exist in any official documentation, so there's no way anyone could know the behavior of augur frequencies without digging into the code. I'll use these comments here to start some proper documentation about frequency estimation that should help folks with future builds.
There was a problem hiding this comment.
Thanks! After some exploratory graphics, picked a narrow bandwidth (page 2), and added frequency documentation comments to the config file:
WNV/phylogenetic/defaults/config.yaml
Lines 81 to 87 in e3505cd
And staged the new trees:
Generally, instead of wide_bandwidth used proportion_wide Added documentation of frequencies parameter choices in the config file
Remove wide-bandwidth parameter based on prior discussion: nextstrain/WNV#89
Remove wide-bandwidth parameter based on prior discussion: nextstrain/WNV#89


Description of proposed changes
Add frequencies panel
Related issue(s)
Checklist
reference.gbfromNC_009942toNC_009942_REFPushed to staging at: