Skip to content

📊 survey: Add data on dietary choices of Americans#5598

Merged
pabloarosado merged 11 commits intomasterfrom
data-diet-americans
Feb 28, 2026
Merged

📊 survey: Add data on dietary choices of Americans#5598
pabloarosado merged 11 commits intomasterfrom
data-diet-americans

Conversation

@pabloarosado
Copy link
Contributor

@pabloarosado pabloarosado commented Feb 2, 2026

Summary

Fetch YouGov data on dietary choices of Americans, and create charts.

Context

So far we had data on the dietary choices of Brits. But there seems to be an identical survey from YouGov on the dietary choices of Americans. The only differences I've noticed are:

  • Age groups are slightly different.
  • Sample size is about half (~1000 participants per wave in the US instead of ~2000 in the UK).
  • (Irrelevant, just spacing) difference in the phrasing of one of the choices.
  • Maybe due to smaller sample size (and rounding to zero decimals in the original data), sometimes the percentages don't add up to 100%; I mentioned this in the footnote.

For simplicity, I've simply repeated the ETL steps we used for UK to this survey, and with minor adjustments it worked out of the box. I've also replicated the charts.

I thought it could make sense to combine UK and US data, and let the user switch countries (specifically in the discrete bar chart), but it was extra effort, and given that it was just two countries, I thought it was not that much extra value, so I've kept them as separate charts.

@owidbot
Copy link
Contributor

owidbot commented Feb 2, 2026

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-data-diet-americans

chart-diff: ✅
  • 3/3 reviewed charts
  • Modified: 2/2
  • New: 1/1
  • Rejected: 0
  • Data changes: 0
  • Metadata changes: 0
data-diff: ✅ No differences found
+ Dataset garden/survey/2026-02-02/dietary_choices
+ + Table dietary_choices
+   + Column base
+   + Column base_unweighted
+   + Column flexitarian
+   + Column meat_eater
+   + Column none
+   + Column pescetarian
+   + Column vegan
+   + Column vegetarian
- Dataset garden/survey/2026-02-02/dietary_choices_uk


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2026-02-28 17:44:24 UTC
Execution time: 17.38 seconds

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the YouGov “Dietary choices of Americans” survey to the ETL, mirroring the existing UK pipeline, and wires it into the survey DAG to produce Grapher-ready outputs.

Changes:

  • Added a new US survey snapshot (dietary_choices_us.xlsx) and corresponding Meadow/Garden/Grapher ETL steps.
  • Updated the UK snapshot license metadata to CC BY-NC 4.0 and aligned UK garden processing to use map_series.
  • Registered the new US pipeline steps in dag/survey.yml.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
snapshots/survey/2026-02-02/dietary_choices_us.xlsx.dvc Adds snapshot metadata and DVC pointer for the US survey extract.
snapshots/survey/2026-02-02/dietary_choices_uk.xlsx.dvc Updates UK snapshot license metadata.
etl/steps/data/meadow/survey/2026-02-02/dietary_choices_us.py Loads the US snapshot and concatenates all sheets into a Meadow table.
etl/steps/data/garden/survey/2026-02-02/dietary_choices_us.py Cleans/reshapes the Meadow table into a Garden dataset with sanity checks.
etl/steps/data/grapher/survey/2026-02-02/dietary_choices_us.py Converts the Garden table into a Grapher dataset (day-based time axis).
etl/steps/data/garden/survey/2026-02-02/dietary_choices_us.meta.yml Adds variable metadata for the US Garden dataset.
etl/steps/data/garden/survey/2026-02-02/dietary_choices_uk.py Switches diet label mapping to map_series for stricter mapping checks.
dag/survey.yml Adds Meadow/Garden/Grapher pipeline nodes for the US survey dataset.

@pabloarosado pabloarosado marked this pull request as ready for review February 2, 2026 09:19
Copy link
Contributor

@veronikasamborska1994 veronikasamborska1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t go over the code in much detail since it’s been reviewed before, and overall it looks very good.

I do have a few general comments, which you should feel free to ignore since I don’t know the full context for how these will be used:

I noticed that in both datasets the “All adults” category is missing from the by-age charts. I think it would be fine to include it since it's still technically also an age group.

To me, the two charts in each dataset also look a bit redundant, though I may be missing some context. I’d probably just show the stacked bar chart with all categories (including “All adults”). If you have time, you could also consider combining the UK and US datasets to show the breakdown of eating habits among all adults in both countries. The results seem fairly comparable, especially once you account for the different sample sizes and the slightly odd “None of the above” category. But that still feels more useful than having two charts per dataset that largely show the same thing.

I also wonder whether there’s a way to handle the percentages not adding up to 100%. For example, you could add or subtract the difference from the “None of the above” category and note this in description_processing. I don’t think that would be difficult to justify, and it would make the chart look a bit cleaner, but it’s up to you.

@pabloarosado
Copy link
Contributor Author

Thank you @veronikasamborska1994! I've applied all your suggestions.
I've created this new chart; it could either be a line chart or a stacked area chart. Visually, I prefer stacked area, but the good thing of the line area chart is that it lets us also have a slope and bar tab in one (and hence we would be able to redirect this old chart to the new one).
For now, I'll keep all charts, since they are cited in different places. We can reassess and redirect later.

@pabloarosado pabloarosado merged commit 1c73803 into master Feb 28, 2026
4 of 5 checks passed
@pabloarosado pabloarosado deleted the data-diet-americans branch February 28, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants