Skip to content

📊 explore discrepancy in population growth rate#5720

Merged
veronikasamborska1994 merged 6 commits intomasterfrom
data-populationgrowth-discrepancy
Feb 27, 2026
Merged

📊 explore discrepancy in population growth rate#5720
veronikasamborska1994 merged 6 commits intomasterfrom
data-populationgrowth-discrepancy

Conversation

@lucasrodes
Copy link
Member

@lucasrodes lucasrodes commented Feb 26, 2026

Review indicator here: http://staging-site-data-populationgrowth-discre/admin/variables/953919

Summary

  • Use UN WPP growth rates directly for 1950–2100 instead of estimating them from our rounded population series. This fixes minor discrepancies caused by:
    • Population values being rounded to integers (uint64), introducing small errors in log-ratio calculations
    • The UN computing growth rates from more precise mid-year estimates, not from rounded annual figures
  • Keep estimated growth rates for pre-1950, computed from our composite population series using 100 * ln(P_t / P_{t-1}) / (t - t_{t-1})
  • Generalize format_wpp() to handle both population and growth_rate tables via column_indicator / indicator_dtype parameters
  • Clean up growth rate estimation: remove unused next_population/next_year columns, deduplicate transition-year smoothing logic

Test plan

  • Run etlr demography/2024-07-15/population --private and verify output
  • Compare population growth rates against published UN WPP values for spot-check countries
  • Verify pre-1950 growth rates are unchanged

🤖 Generated with Claude Code

@owidbot
Copy link
Contributor

owidbot commented Feb 26, 2026

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-data-populationgrowth-discre

chart-diff: ✅ No charts for review.
data-diff: ❌ Found differences
= Dataset garden/cdc/latest/measles_cases
  ~ Table measles_cases (changed metadata)
-     -     date_accessed: '2026-02-27'
+     +     date_accessed: '2026-02-26'
= Dataset garden/demography/2024-07-15/population
  = Table population_density
  ~ Table historical (changed metadata)
-     - title: 'Land, Inputs and Sustainability: Land Use'
-     - description: |-
-     -   The FAOSTAT Land Use domain contains data on forty-four categories of land use, irrigation and agricultural practices and five indicators relevant to monitor agriculture, forestry and fisheries activities at national, regional and global level.
-     - 
-     -   Data are available by country and year, with global coverage and annual updates.
    ~ Column growth_rate_historical (changed metadata, changed data)
-       -   - 1800–1949: historical estimates by Gapminder (v7). Growth rate estimated over 50-year periods until 1900, then 10-year periods.
+       +   - 1800–1949: historical estimates by Gapminder (v7). Growth rate estimated over 1-year periods.
-       -   - 1950–2023: population records from the United Nations World Population Prospects (2024 revision). Growth rate estimated over 1-year periods.
+       +   - 1950–2023: population records from the United Nations World Population Prospects (2024 revision). We use the UN's published growth rates directly (based on mid-year population estimates).
+       + 
+       +   ### Display filtering
+       +   To reduce noise in sparse historical data, growth rates are selectively displayed:
+       + 
+       +   - 1700–1799: Only 100-year intervals (1700, 1800)
+       +   - 1800–1899: Only 100-year intervals (1800, 1900)
+       +   - 1900–1949: Only 5-year intervals (1900, 1905, 1910, etc.)
+       +   - 1950 onwards: All years (annual data)

        ~ Changed values: 27409 / 58824 (46.59%)
           country  year  growth_rate_historical -  growth_rate_historical +
          Eswatini  1964                  2.376292                     2.434
           Finland  1927                  0.960922                      <NA>
             Gabon  1986                  2.759287                     2.765
            Panama  1926                  0.559161                      <NA>
              USSR  1947                   0.89191                      <NA>
  = Table population_original
  ~ Table population_growth_rate (changed metadata)
+     + title: World Population Prospects
+     + description: |-
+     +   World Population Prospects 2024 is the 28th edition of the official estimates and projections of the global population that have been published by the United Nations since 1951. The estimates are based on all available sources of data on population size and levels of fertility, mortality and international migration for 237 countries or areas. If you have questions about this dataset, please refer to (https://population.un.org/wpp/faqs). You can also explore (https://population.un.org/wpp/data-sources) for each country or visit (https://population.un.org/wpp/) for more details.
    ~ Dim country
+       + New values: 22533 / 73169 (30.80%)
           year             country
           1846 Antigua and Barbuda
           1838             Bahrain
           1836  Dominican Republic
           1874               Italy
           1810             Myanmar
    ~ Dim year
+       + New values: 22533 / 73169 (30.80%)
                      country  year
          Antigua and Barbuda  1846
                      Bahrain  1838
           Dominican Republic  1836
                        Italy  1874
                      Myanmar  1810
    ~ Column growth_rate (changed metadata, new data, changed data)
-       -   - 1800–1949: historical estimates by Gapminder (v7). Growth rate estimated over 50-year periods until 1900, then 10-year periods.
+       +   - 1800–1949: historical estimates by Gapminder (v7). Growth rate estimated over 1-year periods.
-       -   - 1950–2023: population records from the United Nations World Population Prospects (2024 revision). Growth rate estimated over 1-year periods.
+       +   - 1950–2023: population records from the United Nations World Population Prospects (2024 revision). We use the UN's published growth rates directly (based on mid-year population estimates).
+       + 
+       +   ### Display filtering
+       +   To reduce noise in sparse historical data, growth rates are selectively displayed:
+       + 
+       +   - 1700–1799: Only 100-year intervals (1700, 1800)
+       +   - 1800–1899: Only 100-year intervals (1800, 1900)
+       +   - 1900–1949: Only 5-year intervals (1900, 1905, 1910, etc.)
+       +   - 1950 onwards: All years (annual data)

+       + New values: 22533 / 73169 (30.80%)
                      country  year  growth_rate
          Antigua and Barbuda  1846         <NA>
                      Bahrain  1838         <NA>
           Dominican Republic  1836         <NA>
                        Italy  1874         <NA>
                      Myanmar  1810         <NA>
        ~ Changed values: 46180 / 73169 (63.11%)
              country  year  growth_rate -  growth_rate +
             Botswana  2002        1.88034          1.772
                 Laos  2062       0.155099           0.14
               Malawi  2086       0.632769          0.621
           San Marino  2050      -0.130405         -0.098
          Switzerland  1958       1.283671          1.292
  = Table population
  ~ Table projections (changed metadata)
-     - title: 'Land, Inputs and Sustainability: Land Use'
-     - description: |-
-     -   The FAOSTAT Land Use domain contains data on forty-four categories of land use, irrigation and agricultural practices and five indicators relevant to monitor agriculture, forestry and fisheries activities at national, regional and global level.
-     - 
-     -   Data are available by country and year, with global coverage and annual updates.
    ~ Column growth_rate_projection (changed data)
        ~ Changed values: 18782 / 19712 (95.28%)
                        country  year  growth_rate_projection -  growth_rate_projection +
                          Nauru  2043                  1.219317                      1.18
          Sao Tome and Principe  2034                  1.824028                     1.813
                       Slovenia  2077                 -0.384743                    -0.379
                           Togo  2069                  1.191711                      1.17
              Wallis and Futuna  2076                 -0.580674                    -0.524
= Dataset garden/health/latest/measles_long_run
  ~ Table measles_long_run (changed metadata)
-     -     date_accessed: '2026-02-27'
+     +     date_accessed: '2026-02-26'


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2026-02-27 10:32:24 UTC
Execution time: 1119.44 seconds

@lucasrodes lucasrodes marked this pull request as ready for review February 26, 2026 18:54
Copy link
Contributor

@veronikasamborska1994 veronikasamborska1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucasrodes I've made some changes to how we show pre 1900 data in the garden step. We've discussed this with Ed and Hannah here and I previously did it in the static viz code but now moved it to garden step to make things easier and more traceable in the future too.

@veronikasamborska1994 veronikasamborska1994 merged commit 2af0857 into master Feb 27, 2026
5 checks passed
@veronikasamborska1994 veronikasamborska1994 deleted the data-populationgrowth-discrepancy branch February 27, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants