Skip to content
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
c70f453
📊 agriculture: Update FAOSTAT data
pabloarosado Feb 25, 2026
bd2dedb
Create snapshot and meadow steps
pabloarosado Feb 25, 2026
65b1e3c
Improve scripts to create snapshots
pabloarosado Feb 25, 2026
725d2f9
Fix bug in create_new_steps script
pabloarosado Feb 25, 2026
c57696c
Add garden steps (WIP)
pabloarosado Feb 25, 2026
d1bf8dc
Harmonize countries
pabloarosado Feb 25, 2026
9f2495e
Remove unnecessary failing sanity check
pabloarosado Feb 25, 2026
245b6fc
Fix issue of missing data in latest FBS dataset
pabloarosado Feb 25, 2026
d9aeb8e
Remove anomaly fix that has been removed in the data
pabloarosado Feb 25, 2026
ce032d3
Fix missing indicator in CISP
pabloarosado Feb 25, 2026
8df415e
Fix bug related to legacy food_explorer step
pabloarosado Feb 25, 2026
a40e0b3
Fix renamed indicator in SDGB
pabloarosado Feb 25, 2026
375a3b1
Add grapher steps
pabloarosado Feb 25, 2026
869fee9
Let detect_anomalies inspect anomalies in the browser
pabloarosado Feb 25, 2026
4256610
Fix failing update_custom_metadata script
pabloarosado Feb 25, 2026
7ca3f06
Fix missing dataset description in FBS
pabloarosado Feb 25, 2026
ca689c5
Update custom dataset descriptions
pabloarosado Feb 25, 2026
4b80308
Improve script to update metadata
pabloarosado Feb 25, 2026
c2a19c8
Fix metadata
pabloarosado Feb 25, 2026
0a6615f
Merge branch 'master' of github.com:owid/etl into data-faostat-update
pabloarosado Feb 25, 2026
b3006cf
Improve format of custom elements file
pabloarosado Feb 25, 2026
579ee6b
Revert custom elements file format to the 2025 format
pabloarosado Feb 25, 2026
b3e2d84
Update elements
pabloarosado Feb 25, 2026
2f484e5
Improve script to update metadata
pabloarosado Feb 25, 2026
7259a7f
Update items metadata
pabloarosado Feb 25, 2026
5625080
Improve format
pabloarosado Feb 25, 2026
024ec05
Dummy commit to rerun faostat pipeline
pabloarosado Feb 25, 2026
98ac5b4
Improve format
pabloarosado Feb 25, 2026
4d8910d
Update global food explorer
pabloarosado Feb 25, 2026
dbcfb40
Ensure additional_variables is also updated by create_new_steps
pabloarosado Feb 25, 2026
548d707
Update additional variables steps
pabloarosado Feb 25, 2026
fe083cd
Improve create_new_steps script to handle explorers dependencies
pabloarosado Feb 25, 2026
cbac595
Improve docs
pabloarosado Feb 25, 2026
bc04edf
Update snapshot metadata
pabloarosado Feb 25, 2026
f8f5bd5
Update docs
pabloarosado Feb 25, 2026
e756c0f
Update docs
pabloarosado Feb 25, 2026
cb33a2a
Fix missing data in QV for EU countries
pabloarosado Feb 26, 2026
a4fc61f
Add description_processing
pabloarosado Feb 26, 2026
33a1bc8
Minor fixes in comments and metadata
pabloarosado Feb 26, 2026
867c7d2
Minor change in docstring
pabloarosado Feb 26, 2026
c1554d2
Fix spurious underscore in metadata of additional_variables
pabloarosado Feb 26, 2026
8e22ebb
Improve metadata of additional_variables
pabloarosado Feb 26, 2026
1e058b8
Avoid deprecation warning
pabloarosado Feb 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 160 additions & 2 deletions dag/faostat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -159,5 +159,163 @@ steps:
# Global food explorer.
#
export://explorers/faostat/latest/global_food:
- data://grapher/faostat/2025-03-17/faostat_qcl
- data://grapher/faostat/2025-03-17/faostat_fbsc
- data://grapher/faostat/2026-02-25/faostat_qcl
- data://grapher/faostat/2026-02-25/faostat_fbsc
#
# FAOSTAT meadow steps for version 2026-02-25
#
data://meadow/faostat/2026-02-25/faostat_cisp:
- snapshot://faostat/2026-02-25/faostat_cisp.zip
##################################################################################################################
# NOTE: The latest version of FBS is missing data for various countries that are currently under review.
# For now, import the data for those missing countries from the previous version:
##################################################################################################################
data://meadow/faostat/2026-02-25/faostat_fbs:
- snapshot://faostat/2026-02-25/faostat_fbs.zip
- snapshot://faostat/2025-03-17/faostat_fbs.zip
data://meadow/faostat/2026-02-25/faostat_fbsh:
- snapshot://faostat/2026-02-25/faostat_fbsh.zip
data://meadow/faostat/2026-02-25/faostat_fs:
- snapshot://faostat/2026-02-25/faostat_fs.zip
data://meadow/faostat/2026-02-25/faostat_lc:
- snapshot://faostat/2026-02-25/faostat_lc.zip
data://meadow/faostat/2026-02-25/faostat_metadata:
- snapshot://faostat/2026-02-25/faostat_metadata.json
data://meadow/faostat/2026-02-25/faostat_qcl:
- snapshot://faostat/2026-02-25/faostat_qcl.zip
data://meadow/faostat/2026-02-25/faostat_qi:
- snapshot://faostat/2026-02-25/faostat_qi.zip
data://meadow/faostat/2026-02-25/faostat_qv:
- snapshot://faostat/2026-02-25/faostat_qv.zip
data://meadow/faostat/2026-02-25/faostat_rfn:
- snapshot://faostat/2026-02-25/faostat_rfn.zip
data://meadow/faostat/2026-02-25/faostat_rl:
- snapshot://faostat/2026-02-25/faostat_rl.zip
data://meadow/faostat/2026-02-25/faostat_rp:
- snapshot://faostat/2026-02-25/faostat_rp.zip
data://meadow/faostat/2026-02-25/faostat_sdgb:
- snapshot://faostat/2026-02-25/faostat_sdgb.zip
#
# FAOSTAT garden steps for version 2026-02-25
#
data://garden/faostat/2026-02-25/faostat_cisp:
- data://meadow/faostat/2026-02-25/faostat_cisp
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_fbsc:
- data://meadow/faostat/2026-02-25/faostat_fbsh
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://meadow/faostat/2026-02-25/faostat_fbs
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_fs:
- data://meadow/faostat/2026-02-25/faostat_fs
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_lc:
- data://garden/demography/2024-07-15/population
- data://meadow/faostat/2026-02-25/faostat_lc
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_metadata:
- data://meadow/faostat/2026-02-25/faostat_rp
- data://meadow/faostat/2026-02-25/faostat_fbsh
- data://meadow/faostat/2026-02-25/faostat_sdgb
- data://meadow/faostat/2026-02-25/faostat_cisp
- data://meadow/faostat/2026-02-25/faostat_metadata
- data://meadow/faostat/2026-02-25/faostat_qi
- data://meadow/faostat/2026-02-25/faostat_fs
- data://meadow/faostat/2026-02-25/faostat_rfn
- data://meadow/faostat/2026-02-25/faostat_rl
- data://meadow/faostat/2026-02-25/faostat_lc
- data://meadow/faostat/2026-02-25/faostat_qcl
- data://meadow/faostat/2026-02-25/faostat_fbs
- data://meadow/faostat/2026-02-25/faostat_qv
data://garden/faostat/2026-02-25/faostat_qcl:
- data://garden/demography/2024-07-15/population
- data://meadow/faostat/2026-02-25/faostat_qcl
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_qi:
- data://meadow/faostat/2026-02-25/faostat_qi
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_qv:
- data://garden/regions/2023-01-01/regions
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://meadow/faostat/2026-02-25/faostat_qv
data://garden/faostat/2026-02-25/faostat_rfn:
- data://garden/demography/2024-07-15/population
- data://meadow/faostat/2026-02-25/faostat_rfn
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_rl:
- data://garden/demography/2024-07-15/population
- data://meadow/faostat/2026-02-25/faostat_rl
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_rp:
- data://meadow/faostat/2026-02-25/faostat_rp
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
data://garden/faostat/2026-02-25/faostat_sdgb:
- data://meadow/faostat/2026-02-25/faostat_sdgb
- data://garden/demography/2024-07-15/population
- data://garden/faostat/2026-02-25/faostat_metadata
- data://garden/wb/2025-07-01/income_groups
- data://garden/regions/2023-01-01/regions
#
# FAOSTAT grapher steps for version 2026-02-25
#
data://grapher/faostat/2026-02-25/faostat_cisp:
- data://garden/faostat/2026-02-25/faostat_cisp
data://grapher/faostat/2026-02-25/faostat_fbsc:
- data://garden/faostat/2026-02-25/faostat_fbsc
data://grapher/faostat/2026-02-25/faostat_fs:
- data://garden/faostat/2026-02-25/faostat_fs
data://grapher/faostat/2026-02-25/faostat_lc:
- data://garden/faostat/2026-02-25/faostat_lc
data://grapher/faostat/2026-02-25/faostat_qcl:
- data://garden/faostat/2026-02-25/faostat_qcl
data://grapher/faostat/2026-02-25/faostat_qi:
- data://garden/faostat/2026-02-25/faostat_qi
data://grapher/faostat/2026-02-25/faostat_qv:
- data://garden/faostat/2026-02-25/faostat_qv
data://grapher/faostat/2026-02-25/faostat_rfn:
- data://garden/faostat/2026-02-25/faostat_rfn
data://grapher/faostat/2026-02-25/faostat_rl:
- data://garden/faostat/2026-02-25/faostat_rl
data://grapher/faostat/2026-02-25/faostat_rp:
- data://garden/faostat/2026-02-25/faostat_rp
data://grapher/faostat/2026-02-25/faostat_sdgb:
- data://garden/faostat/2026-02-25/faostat_sdgb
#
# FAOSTAT garden step for additional variables for version 2026-02-25
#
data://garden/faostat/2026-02-25/additional_variables:
- data://garden/faostat/2026-02-25/faostat_rl
- data://garden/faostat/2026-02-25/faostat_qi
- data://garden/faostat/2026-02-25/faostat_qcl
- data://garden/faostat/2026-02-25/faostat_sdgb
- data://garden/faostat/2026-02-25/faostat_fbsc
- data://garden/faostat/2026-02-25/faostat_rfn
#
# FAOSTAT grapher step for additional variables for version 2026-02-25
#
data://grapher/faostat/2026-02-25/additional_variables:
- data://garden/faostat/2026-02-25/additional_variables
68 changes: 28 additions & 40 deletions docs/data/faostat.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,12 +206,6 @@ Using data from garden, we create an additional dataset in the `explorers` chann
These are the steps OWID follows to ensure that FAOSTAT data is up-to-date, or to update one or more datasets for
which there is new data (let us call the new dataset version to be created `YYYY-MM-DD`):

0. Activate the etl virtual environment (from the root folder of the etl repository):

```bash
. .venv/bin/activate
```

1. Execute the ingestion script, to fetch data for any dataset that may have been updated in FAOSTAT.
If no dataset requires an update, the workflow stops here.

Expand All @@ -220,7 +214,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
This can be executed with the `-r` flag to simply check for updates without writing anything.

```bash
python etl/scripts/faostat/create_new_snapshots.py
python etl/scripts/faostat/create_new_snapshots.py -a
```

!!! note
Expand All @@ -233,17 +227,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
downloading this domain, add it to the list `INCLUDED_DATASETS_CODES`. Then replace variables used in those
charts with the new ones.

2. Manually inspect the snapshot metadata files, and fix common issues in dataset descriptions:

- Insert line break after first sentence (which usually is the general description of the dataset).
- Remove spurious symbols.
- Insert spaces where missing (e.g. "end of sentence.Start of next sentence").
- Remove double spaces (e.g. "end of sentence. Start of next sentence").
- Insert line breaks to create paragraphs (by context).
- Remove incomplete sentences (sometimes there are half sentences that may have been added by mistake).
- Remove mentions to links in FAOSTAT page (since they will not be seen from grapher).

3. Create new meadow steps.
2. Create new meadow steps.

!!! note

Expand All @@ -253,13 +237,13 @@ which there is new data (let us call the new dataset version to be created `YYYY
python etl/scripts/faostat/create_new_steps.py -c meadow -a
```

4. Run the new etl meadow steps, to generate the meadow datasets.
3. Run the new etl meadow steps, to generate the meadow datasets.

```bash
etl run meadow/faostat/YYYY-MM-DD
```

5. Create new garden steps.
4. Create new garden steps.

```bash
python etl/scripts/faostat/create_new_steps.py -c garden
Expand All @@ -272,7 +256,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
This way we can be aware of any unexpected FAO changes in units.
If any changes are made to `custom_*.csv` files, you may need to force-run the garden `faostat_metadata` step to implement those changes.

6. Run the new etl garden steps, to generate the garden datasets.
5. Run the new etl garden steps, to generate the garden datasets.

```bash
etl run garden/faostat/YYYY-MM-DD
Expand All @@ -296,7 +280,7 @@ which there is new data (let us call the new dataset version to be created `YYYY

TODO: The descriptions of anomalies used to appear in `description`, but now they are not included in any indicator metadata. Ideally they should appear in `description_processing`. Consider doing this in the next update.

7. Inspect and update any possible changes of dataset/item/element/unit names and descriptions.
6. Inspect and update any possible changes of dataset/item/element/unit names and descriptions.

```bash
python etl/scripts/faostat/update_custom_metadata.py
Expand All @@ -308,38 +292,31 @@ which there is new data (let us call the new dataset version to be created `YYYY
etl run garden/faostat/YYYY-MM-DD
```

8. Create new grapher steps.
7. Create new grapher steps.

```bash
python etl/scripts/faostat/create_new_steps.py -c grapher
```

9. Run the new etl grapher steps, to generate the grapher charts.
8. Run the new etl grapher steps, to generate the grapher charts.

```bash
etl run faostat/YYYY-MM-DD --grapher
```

10. From the ETL Wizard, use Indicator Upgrader for each of the grapher datasets to replace variables in charts to their latest versions.
9. Replace variables in charts to their latest versions.

11. Update the versions of the dependencies of the explorers step `export://explorers/faostat/latest/global_food` in the dag (for the moment, this has to be done manually).
```bash
etl indicator-upgrade auto
```

12. Run the explorers step, to update the global food explorer.
10. Run the explorers step, to update the global food explorer.

```bash
etl run explorers/faostat/latest/global_food --export
```

13. From the ETL Wizard, use Chart Diff to visually inspect changes between the old and new versions of updated charts, and
accept or reject changes. Inspect also changes in the global food explorer using Explorer Diff.

14. Manually create a new garden dataset of additional variables `additional_variables` for the new version, and update its metadata. Then create a new grapher dataset too. Manually update all other datasets that use any faostat dataset as a dependency.

!!! note

In the future this could be handled automatically by one of the existing scripts.

15. Update titles and descriptions of snapshot origins (to use the custom dataset titles and descriptions defined in garden). Also, attributions will be added to origins.
11. Update titles and descriptions of snapshot origins (to use the custom dataset titles and descriptions defined in garden). Also, attributions will be added to origins.

```bash
python etl/scripts/faostat/update_snapshots_metadata.py
Expand All @@ -349,11 +326,22 @@ which there is new data (let us call the new dataset version to be created `YYYY

The current workflow is a bit convoluted: we fetch snapshots, create meadow and garden steps, and the edit snapshots again. But for now, this workflow is the safest working solution.

16. Manually update the version of any `faostat` used as dependency in unrelated datasets (`faostat_rl` is used in `weekly_wildfires` and `population`).
12. From the ETL Wizard, use Anomalist to visually inspect potential data issues.

13. From the ETL Wizard, use Anomalist and Chart Diff to visually inspect changes between the old and new versions of updated charts, and
accept or reject changes. Inspect also changes in the global food explorer using Explorer Diff.

14. Manually update the version of any `faostat` used as dependency in unrelated datasets (`faostat_rl` is used in `weekly_wildfires`).

17. From the ETL dashboard, select archivable, namespace `faostat`, and archive all old steps.
15. Update other steps in the `agriculture` namespace that rely on any `faostat_*` step.

16. Archive old steps.

```bash
etl archive faostat/YYYY-MM-DD --include-usages
```

18. After merging all code and once production is up-to-date, archive unnecessary grapher datasets.
17. After merging all code and once production is up-to-date, archive unnecessary grapher datasets.

## Workflow to make changes to a dataset

Expand Down
17 changes: 14 additions & 3 deletions etl/scripts/faostat/create_new_snapshots.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@
```
uv run python -m create_new_snapshots
```
* To create new snapshots for all datasets, even if the source data was not updated:
```
uv run python -m create_new_snapshots -a
```

"""

Expand Down Expand Up @@ -280,7 +284,7 @@ def to_snapshot(self) -> None:
snap.create_snapshot(filename=f.name, upload=True)


def main(read_only: bool = False) -> None:
def main(read_only: bool = False, include_all_datasets: bool = False) -> None:
# Load list of existing snapshots related to current NAMESPACE.
existing_snapshots = [
snapshot for snapshot in list(snapshot_catalog(match=NAMESPACE)) if "backport/" not in snapshot.uri
Expand All @@ -303,7 +307,7 @@ def main(read_only: bool = False) -> None:
dataset_code = description["DatasetCode"].lower()
if dataset_code in INCLUDED_DATASETS_CODES:
faostat_dataset = FAODataset(description)
if is_dataset_already_up_to_date(
if not include_all_datasets and is_dataset_already_up_to_date(
existing_snapshots=existing_snapshots,
source_data_url=faostat_dataset.source_data_url,
source_modification_date=faostat_dataset.modification_date,
Expand Down Expand Up @@ -337,5 +341,12 @@ def main(read_only: bool = False) -> None:
action="store_true",
help="If given, simply check for updates without creating snapshots.",
)
argument_parser.add_argument(
"-a",
"--include_all_datasets",
default=False,
action="store_true",
help="If given, create snapshots for all datasets, even if the source data was not updated.",
)
args = argument_parser.parse_args()
main(read_only=args.read_only)
main(read_only=args.read_only, include_all_datasets=args.include_all_datasets)
Loading