📊 survey: Add data on dietary choices of Americans#5598
📊 survey: Add data on dietary choices of Americans#5598pabloarosado merged 11 commits intomasterfrom
Conversation
|
Quick links (staging server):
Login: chart-diff: ✅
data-diff: ✅ No differences found+ Dataset garden/survey/2026-02-02/dietary_choices
+ + Table dietary_choices
+ + Column base
+ + Column base_unweighted
+ + Column flexitarian
+ + Column meat_eater
+ + Column none
+ + Column pescetarian
+ + Column vegan
+ + Column vegetarian
- Dataset garden/survey/2026-02-02/dietary_choices_uk
Legend: +New ~Modified -Removed =Identical Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippetAutomatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included Edited: 2026-02-28 17:44:24 UTC |
There was a problem hiding this comment.
Pull request overview
Adds the YouGov “Dietary choices of Americans” survey to the ETL, mirroring the existing UK pipeline, and wires it into the survey DAG to produce Grapher-ready outputs.
Changes:
- Added a new US survey snapshot (
dietary_choices_us.xlsx) and corresponding Meadow/Garden/Grapher ETL steps. - Updated the UK snapshot license metadata to
CC BY-NC 4.0and aligned UK garden processing to usemap_series. - Registered the new US pipeline steps in
dag/survey.yml.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| snapshots/survey/2026-02-02/dietary_choices_us.xlsx.dvc | Adds snapshot metadata and DVC pointer for the US survey extract. |
| snapshots/survey/2026-02-02/dietary_choices_uk.xlsx.dvc | Updates UK snapshot license metadata. |
| etl/steps/data/meadow/survey/2026-02-02/dietary_choices_us.py | Loads the US snapshot and concatenates all sheets into a Meadow table. |
| etl/steps/data/garden/survey/2026-02-02/dietary_choices_us.py | Cleans/reshapes the Meadow table into a Garden dataset with sanity checks. |
| etl/steps/data/grapher/survey/2026-02-02/dietary_choices_us.py | Converts the Garden table into a Grapher dataset (day-based time axis). |
| etl/steps/data/garden/survey/2026-02-02/dietary_choices_us.meta.yml | Adds variable metadata for the US Garden dataset. |
| etl/steps/data/garden/survey/2026-02-02/dietary_choices_uk.py | Switches diet label mapping to map_series for stricter mapping checks. |
| dag/survey.yml | Adds Meadow/Garden/Grapher pipeline nodes for the US survey dataset. |
veronikasamborska1994
left a comment
There was a problem hiding this comment.
I didn’t go over the code in much detail since it’s been reviewed before, and overall it looks very good.
I do have a few general comments, which you should feel free to ignore since I don’t know the full context for how these will be used:
I noticed that in both datasets the “All adults” category is missing from the by-age charts. I think it would be fine to include it since it's still technically also an age group.
To me, the two charts in each dataset also look a bit redundant, though I may be missing some context. I’d probably just show the stacked bar chart with all categories (including “All adults”). If you have time, you could also consider combining the UK and US datasets to show the breakdown of eating habits among all adults in both countries. The results seem fairly comparable, especially once you account for the different sample sizes and the slightly odd “None of the above” category. But that still feels more useful than having two charts per dataset that largely show the same thing.
I also wonder whether there’s a way to handle the percentages not adding up to 100%. For example, you could add or subtract the difference from the “None of the above” category and note this in description_processing. I don’t think that would be difficult to justify, and it would make the chart look a bit cleaner, but it’s up to you.
|
Thank you @veronikasamborska1994! I've applied all your suggestions. |
Summary
Fetch YouGov data on dietary choices of Americans, and create charts.
Context
So far we had data on the dietary choices of Brits. But there seems to be an identical survey from YouGov on the dietary choices of Americans. The only differences I've noticed are:
For simplicity, I've simply repeated the ETL steps we used for UK to this survey, and with minor adjustments it worked out of the box. I've also replicated the charts.
I thought it could make sense to combine UK and US data, and let the user switch countries (specifically in the discrete bar chart), but it was extra effort, and given that it was just two countries, I thought it was not that much extra value, so I've kept them as separate charts.