Skip to content

Commit d30571b

Browse files
authored
📊 agriculture: Update FAOSTAT data (#5715)
* 📊 agriculture: Update FAOSTAT data * Create snapshot and meadow steps * Improve scripts to create snapshots * Fix bug in create_new_steps script * Add garden steps (WIP) * Harmonize countries * Remove unnecessary failing sanity check * Fix issue of missing data in latest FBS dataset * Remove anomaly fix that has been removed in the data * Fix missing indicator in CISP * Fix bug related to legacy food_explorer step * Fix renamed indicator in SDGB * Add grapher steps * Let detect_anomalies inspect anomalies in the browser * Fix failing update_custom_metadata script * Fix missing dataset description in FBS * Update custom dataset descriptions * Improve script to update metadata * Fix metadata * Improve format of custom elements file * Revert custom elements file format to the 2025 format * Update elements * Improve script to update metadata * Update items metadata * Improve format * Dummy commit to rerun faostat pipeline * Improve format * Update global food explorer * Ensure additional_variables is also updated by create_new_steps * Update additional variables steps * Improve create_new_steps script to handle explorers dependencies * Improve docs * Update snapshot metadata * Update docs * Update docs * Fix missing data in QV for EU countries * Add description_processing * Minor fixes in comments and metadata * Minor change in docstring * Fix spurious underscore in metadata of additional_variables * Improve metadata of additional_variables * Avoid deprecation warning
1 parent 947a842 commit d30571b

File tree

77 files changed

+11549
-367
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+11549
-367
lines changed

dag/faostat.yml

Lines changed: 165 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,5 +159,168 @@ steps:
159159
# Global food explorer.
160160
#
161161
export://explorers/faostat/latest/global_food:
162-
- data://grapher/faostat/2025-03-17/faostat_qcl
163-
- data://grapher/faostat/2025-03-17/faostat_fbsc
162+
- data://grapher/faostat/2026-02-25/faostat_qcl
163+
- data://grapher/faostat/2026-02-25/faostat_fbsc
164+
#
165+
# FAOSTAT meadow steps for version 2026-02-25
166+
#
167+
data://meadow/faostat/2026-02-25/faostat_cisp:
168+
- snapshot://faostat/2026-02-25/faostat_cisp.zip
169+
##################################################################################################################
170+
# NOTE: The latest version of FBS is missing data for various countries that are currently under review.
171+
# For now, import the data for those missing countries from the previous version:
172+
##################################################################################################################
173+
data://meadow/faostat/2026-02-25/faostat_fbs:
174+
- snapshot://faostat/2026-02-25/faostat_fbs.zip
175+
- snapshot://faostat/2025-03-17/faostat_fbs.zip
176+
data://meadow/faostat/2026-02-25/faostat_fbsh:
177+
- snapshot://faostat/2026-02-25/faostat_fbsh.zip
178+
data://meadow/faostat/2026-02-25/faostat_fs:
179+
- snapshot://faostat/2026-02-25/faostat_fs.zip
180+
data://meadow/faostat/2026-02-25/faostat_lc:
181+
- snapshot://faostat/2026-02-25/faostat_lc.zip
182+
data://meadow/faostat/2026-02-25/faostat_metadata:
183+
- snapshot://faostat/2026-02-25/faostat_metadata.json
184+
data://meadow/faostat/2026-02-25/faostat_qcl:
185+
- snapshot://faostat/2026-02-25/faostat_qcl.zip
186+
data://meadow/faostat/2026-02-25/faostat_qi:
187+
- snapshot://faostat/2026-02-25/faostat_qi.zip
188+
##################################################################################################################
189+
# NOTE: The latest version of QV is missing post-2017 data for all EU-27 countries.
190+
# For now, import the data for those missing country-years from the previous version:
191+
##################################################################################################################
192+
data://meadow/faostat/2026-02-25/faostat_qv:
193+
- snapshot://faostat/2026-02-25/faostat_qv.zip
194+
- snapshot://faostat/2025-03-17/faostat_qv.zip
195+
data://meadow/faostat/2026-02-25/faostat_rfn:
196+
- snapshot://faostat/2026-02-25/faostat_rfn.zip
197+
data://meadow/faostat/2026-02-25/faostat_rl:
198+
- snapshot://faostat/2026-02-25/faostat_rl.zip
199+
data://meadow/faostat/2026-02-25/faostat_rp:
200+
- snapshot://faostat/2026-02-25/faostat_rp.zip
201+
data://meadow/faostat/2026-02-25/faostat_sdgb:
202+
- snapshot://faostat/2026-02-25/faostat_sdgb.zip
203+
#
204+
# FAOSTAT garden steps for version 2026-02-25
205+
#
206+
data://garden/faostat/2026-02-25/faostat_cisp:
207+
- data://meadow/faostat/2026-02-25/faostat_cisp
208+
- data://garden/demography/2024-07-15/population
209+
- data://garden/faostat/2026-02-25/faostat_metadata
210+
- data://garden/wb/2025-07-01/income_groups
211+
- data://garden/regions/2023-01-01/regions
212+
data://garden/faostat/2026-02-25/faostat_fbsc:
213+
- data://meadow/faostat/2026-02-25/faostat_fbsh
214+
- data://garden/demography/2024-07-15/population
215+
- data://garden/faostat/2026-02-25/faostat_metadata
216+
- data://garden/wb/2025-07-01/income_groups
217+
- data://meadow/faostat/2026-02-25/faostat_fbs
218+
- data://garden/regions/2023-01-01/regions
219+
data://garden/faostat/2026-02-25/faostat_fs:
220+
- data://meadow/faostat/2026-02-25/faostat_fs
221+
- data://garden/demography/2024-07-15/population
222+
- data://garden/faostat/2026-02-25/faostat_metadata
223+
- data://garden/wb/2025-07-01/income_groups
224+
- data://garden/regions/2023-01-01/regions
225+
data://garden/faostat/2026-02-25/faostat_lc:
226+
- data://garden/demography/2024-07-15/population
227+
- data://meadow/faostat/2026-02-25/faostat_lc
228+
- data://garden/faostat/2026-02-25/faostat_metadata
229+
- data://garden/wb/2025-07-01/income_groups
230+
- data://garden/regions/2023-01-01/regions
231+
data://garden/faostat/2026-02-25/faostat_metadata:
232+
- data://meadow/faostat/2026-02-25/faostat_rp
233+
- data://meadow/faostat/2026-02-25/faostat_fbsh
234+
- data://meadow/faostat/2026-02-25/faostat_sdgb
235+
- data://meadow/faostat/2026-02-25/faostat_cisp
236+
- data://meadow/faostat/2026-02-25/faostat_metadata
237+
- data://meadow/faostat/2026-02-25/faostat_qi
238+
- data://meadow/faostat/2026-02-25/faostat_fs
239+
- data://meadow/faostat/2026-02-25/faostat_rfn
240+
- data://meadow/faostat/2026-02-25/faostat_rl
241+
- data://meadow/faostat/2026-02-25/faostat_lc
242+
- data://meadow/faostat/2026-02-25/faostat_qcl
243+
- data://meadow/faostat/2026-02-25/faostat_fbs
244+
- data://meadow/faostat/2026-02-25/faostat_qv
245+
data://garden/faostat/2026-02-25/faostat_qcl:
246+
- data://garden/demography/2024-07-15/population
247+
- data://meadow/faostat/2026-02-25/faostat_qcl
248+
- data://garden/faostat/2026-02-25/faostat_metadata
249+
- data://garden/wb/2025-07-01/income_groups
250+
- data://garden/regions/2023-01-01/regions
251+
data://garden/faostat/2026-02-25/faostat_qi:
252+
- data://meadow/faostat/2026-02-25/faostat_qi
253+
- data://garden/demography/2024-07-15/population
254+
- data://garden/faostat/2026-02-25/faostat_metadata
255+
- data://garden/wb/2025-07-01/income_groups
256+
- data://garden/regions/2023-01-01/regions
257+
data://garden/faostat/2026-02-25/faostat_qv:
258+
- data://garden/regions/2023-01-01/regions
259+
- data://garden/demography/2024-07-15/population
260+
- data://garden/faostat/2026-02-25/faostat_metadata
261+
- data://garden/wb/2025-07-01/income_groups
262+
- data://meadow/faostat/2026-02-25/faostat_qv
263+
data://garden/faostat/2026-02-25/faostat_rfn:
264+
- data://garden/demography/2024-07-15/population
265+
- data://meadow/faostat/2026-02-25/faostat_rfn
266+
- data://garden/faostat/2026-02-25/faostat_metadata
267+
- data://garden/wb/2025-07-01/income_groups
268+
- data://garden/regions/2023-01-01/regions
269+
data://garden/faostat/2026-02-25/faostat_rl:
270+
- data://garden/demography/2024-07-15/population
271+
- data://meadow/faostat/2026-02-25/faostat_rl
272+
- data://garden/faostat/2026-02-25/faostat_metadata
273+
- data://garden/wb/2025-07-01/income_groups
274+
- data://garden/regions/2023-01-01/regions
275+
data://garden/faostat/2026-02-25/faostat_rp:
276+
- data://meadow/faostat/2026-02-25/faostat_rp
277+
- data://garden/demography/2024-07-15/population
278+
- data://garden/faostat/2026-02-25/faostat_metadata
279+
- data://garden/wb/2025-07-01/income_groups
280+
- data://garden/regions/2023-01-01/regions
281+
data://garden/faostat/2026-02-25/faostat_sdgb:
282+
- data://meadow/faostat/2026-02-25/faostat_sdgb
283+
- data://garden/demography/2024-07-15/population
284+
- data://garden/faostat/2026-02-25/faostat_metadata
285+
- data://garden/wb/2025-07-01/income_groups
286+
- data://garden/regions/2023-01-01/regions
287+
#
288+
# FAOSTAT grapher steps for version 2026-02-25
289+
#
290+
data://grapher/faostat/2026-02-25/faostat_cisp:
291+
- data://garden/faostat/2026-02-25/faostat_cisp
292+
data://grapher/faostat/2026-02-25/faostat_fbsc:
293+
- data://garden/faostat/2026-02-25/faostat_fbsc
294+
data://grapher/faostat/2026-02-25/faostat_fs:
295+
- data://garden/faostat/2026-02-25/faostat_fs
296+
data://grapher/faostat/2026-02-25/faostat_lc:
297+
- data://garden/faostat/2026-02-25/faostat_lc
298+
data://grapher/faostat/2026-02-25/faostat_qcl:
299+
- data://garden/faostat/2026-02-25/faostat_qcl
300+
data://grapher/faostat/2026-02-25/faostat_qi:
301+
- data://garden/faostat/2026-02-25/faostat_qi
302+
data://grapher/faostat/2026-02-25/faostat_qv:
303+
- data://garden/faostat/2026-02-25/faostat_qv
304+
data://grapher/faostat/2026-02-25/faostat_rfn:
305+
- data://garden/faostat/2026-02-25/faostat_rfn
306+
data://grapher/faostat/2026-02-25/faostat_rl:
307+
- data://garden/faostat/2026-02-25/faostat_rl
308+
data://grapher/faostat/2026-02-25/faostat_rp:
309+
- data://garden/faostat/2026-02-25/faostat_rp
310+
data://grapher/faostat/2026-02-25/faostat_sdgb:
311+
- data://garden/faostat/2026-02-25/faostat_sdgb
312+
#
313+
# FAOSTAT garden step for additional variables for version 2026-02-25
314+
#
315+
data://garden/faostat/2026-02-25/additional_variables:
316+
- data://garden/faostat/2026-02-25/faostat_rl
317+
- data://garden/faostat/2026-02-25/faostat_qi
318+
- data://garden/faostat/2026-02-25/faostat_qcl
319+
- data://garden/faostat/2026-02-25/faostat_sdgb
320+
- data://garden/faostat/2026-02-25/faostat_fbsc
321+
- data://garden/faostat/2026-02-25/faostat_rfn
322+
#
323+
# FAOSTAT grapher step for additional variables for version 2026-02-25
324+
#
325+
data://grapher/faostat/2026-02-25/additional_variables:
326+
- data://garden/faostat/2026-02-25/additional_variables

docs/data/faostat.md

Lines changed: 28 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -206,12 +206,6 @@ Using data from garden, we create an additional dataset in the `explorers` chann
206206
These are the steps OWID follows to ensure that FAOSTAT data is up-to-date, or to update one or more datasets for
207207
which there is new data (let us call the new dataset version to be created `YYYY-MM-DD`):
208208

209-
0. Activate the etl virtual environment (from the root folder of the etl repository):
210-
211-
```bash
212-
. .venv/bin/activate
213-
```
214-
215209
1. Execute the ingestion script, to fetch data for any dataset that may have been updated in FAOSTAT.
216210
If no dataset requires an update, the workflow stops here.
217211

@@ -220,7 +214,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
220214
This can be executed with the `-r` flag to simply check for updates without writing anything.
221215

222216
```bash
223-
python etl/scripts/faostat/create_new_snapshots.py
217+
python etl/scripts/faostat/create_new_snapshots.py -a
224218
```
225219

226220
!!! note
@@ -233,17 +227,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
233227
downloading this domain, add it to the list `INCLUDED_DATASETS_CODES`. Then replace variables used in those
234228
charts with the new ones.
235229

236-
2. Manually inspect the snapshot metadata files, and fix common issues in dataset descriptions:
237-
238-
- Insert line break after first sentence (which usually is the general description of the dataset).
239-
- Remove spurious symbols.
240-
- Insert spaces where missing (e.g. "end of sentence.Start of next sentence").
241-
- Remove double spaces (e.g. "end of sentence. Start of next sentence").
242-
- Insert line breaks to create paragraphs (by context).
243-
- Remove incomplete sentences (sometimes there are half sentences that may have been added by mistake).
244-
- Remove mentions to links in FAOSTAT page (since they will not be seen from grapher).
245-
246-
3. Create new meadow steps.
230+
2. Create new meadow steps.
247231

248232
!!! note
249233

@@ -253,13 +237,13 @@ which there is new data (let us call the new dataset version to be created `YYYY
253237
python etl/scripts/faostat/create_new_steps.py -c meadow -a
254238
```
255239

256-
4. Run the new etl meadow steps, to generate the meadow datasets.
240+
3. Run the new etl meadow steps, to generate the meadow datasets.
257241

258242
```bash
259243
etl run meadow/faostat/YYYY-MM-DD
260244
```
261245

262-
5. Create new garden steps.
246+
4. Create new garden steps.
263247

264248
```bash
265249
python etl/scripts/faostat/create_new_steps.py -c garden
@@ -272,7 +256,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
272256
This way we can be aware of any unexpected FAO changes in units.
273257
If any changes are made to `custom_*.csv` files, you may need to force-run the garden `faostat_metadata` step to implement those changes.
274258

275-
6. Run the new etl garden steps, to generate the garden datasets.
259+
5. Run the new etl garden steps, to generate the garden datasets.
276260

277261
```bash
278262
etl run garden/faostat/YYYY-MM-DD
@@ -296,7 +280,7 @@ which there is new data (let us call the new dataset version to be created `YYYY
296280

297281
TODO: The descriptions of anomalies used to appear in `description`, but now they are not included in any indicator metadata. Ideally they should appear in `description_processing`. Consider doing this in the next update.
298282

299-
7. Inspect and update any possible changes of dataset/item/element/unit names and descriptions.
283+
6. Inspect and update any possible changes of dataset/item/element/unit names and descriptions.
300284

301285
```bash
302286
python etl/scripts/faostat/update_custom_metadata.py
@@ -308,38 +292,31 @@ which there is new data (let us call the new dataset version to be created `YYYY
308292
etl run garden/faostat/YYYY-MM-DD
309293
```
310294

311-
8. Create new grapher steps.
295+
7. Create new grapher steps.
312296

313297
```bash
314298
python etl/scripts/faostat/create_new_steps.py -c grapher
315299
```
316300

317-
9. Run the new etl grapher steps, to generate the grapher charts.
301+
8. Run the new etl grapher steps, to generate the grapher charts.
318302

319303
```bash
320304
etl run faostat/YYYY-MM-DD --grapher
321305
```
322306

323-
10. From the ETL Wizard, use Indicator Upgrader for each of the grapher datasets to replace variables in charts to their latest versions.
307+
9. Replace variables in charts to their latest versions.
324308

325-
11. Update the versions of the dependencies of the explorers step `export://explorers/faostat/latest/global_food` in the dag (for the moment, this has to be done manually).
309+
```bash
310+
etl indicator-upgrade auto
311+
```
326312

327-
12. Run the explorers step, to update the global food explorer.
313+
10. Run the explorers step, to update the global food explorer.
328314

329315
```bash
330316
etl run explorers/faostat/latest/global_food --export
331317
```
332318

333-
13. From the ETL Wizard, use Chart Diff to visually inspect changes between the old and new versions of updated charts, and
334-
accept or reject changes. Inspect also changes in the global food explorer using Explorer Diff.
335-
336-
14. Manually create a new garden dataset of additional variables `additional_variables` for the new version, and update its metadata. Then create a new grapher dataset too. Manually update all other datasets that use any faostat dataset as a dependency.
337-
338-
!!! note
339-
340-
In the future this could be handled automatically by one of the existing scripts.
341-
342-
15. Update titles and descriptions of snapshot origins (to use the custom dataset titles and descriptions defined in garden). Also, attributions will be added to origins.
319+
11. Update titles and descriptions of snapshot origins (to use the custom dataset titles and descriptions defined in garden). Also, attributions will be added to origins.
343320

344321
```bash
345322
python etl/scripts/faostat/update_snapshots_metadata.py
@@ -349,11 +326,22 @@ which there is new data (let us call the new dataset version to be created `YYYY
349326

350327
The current workflow is a bit convoluted: we fetch snapshots, create meadow and garden steps, and the edit snapshots again. But for now, this workflow is the safest working solution.
351328

352-
16. Manually update the version of any `faostat` used as dependency in unrelated datasets (`faostat_rl` is used in `weekly_wildfires` and `population`).
329+
12. From the ETL Wizard, use Anomalist to visually inspect potential data issues.
330+
331+
13. From the ETL Wizard, use Anomalist and Chart Diff to visually inspect changes between the old and new versions of updated charts, and
332+
accept or reject changes. Inspect also changes in the global food explorer using Explorer Diff.
333+
334+
14. Manually update the version of any `faostat` used as dependency in unrelated datasets (`faostat_rl` is used in `weekly_wildfires`).
353335

354-
17. From the ETL dashboard, select archivable, namespace `faostat`, and archive all old steps.
336+
15. Update other steps in the `agriculture` namespace that rely on any `faostat_*` step.
337+
338+
16. Archive old steps.
339+
340+
```bash
341+
etl archive faostat/YYYY-MM-DD --include-usages
342+
```
355343

356-
18. After merging all code and once production is up-to-date, archive unnecessary grapher datasets.
344+
17. After merging all code and once production is up-to-date, archive unnecessary grapher datasets.
357345

358346
## Workflow to make changes to a dataset
359347

etl/scripts/faostat/create_new_snapshots.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@
2020
```
2121
uv run python -m create_new_snapshots
2222
```
23+
* To create new snapshots for all datasets, even if the source data was not updated:
24+
```
25+
uv run python -m create_new_snapshots -a
26+
```
2327
2428
"""
2529

@@ -280,7 +284,7 @@ def to_snapshot(self) -> None:
280284
snap.create_snapshot(filename=f.name, upload=True)
281285

282286

283-
def main(read_only: bool = False) -> None:
287+
def main(read_only: bool = False, include_all_datasets: bool = False) -> None:
284288
# Load list of existing snapshots related to current NAMESPACE.
285289
existing_snapshots = [
286290
snapshot for snapshot in list(snapshot_catalog(match=NAMESPACE)) if "backport/" not in snapshot.uri
@@ -303,7 +307,7 @@ def main(read_only: bool = False) -> None:
303307
dataset_code = description["DatasetCode"].lower()
304308
if dataset_code in INCLUDED_DATASETS_CODES:
305309
faostat_dataset = FAODataset(description)
306-
if is_dataset_already_up_to_date(
310+
if not include_all_datasets and is_dataset_already_up_to_date(
307311
existing_snapshots=existing_snapshots,
308312
source_data_url=faostat_dataset.source_data_url,
309313
source_modification_date=faostat_dataset.modification_date,
@@ -337,5 +341,12 @@ def main(read_only: bool = False) -> None:
337341
action="store_true",
338342
help="If given, simply check for updates without creating snapshots.",
339343
)
344+
argument_parser.add_argument(
345+
"-a",
346+
"--include_all_datasets",
347+
default=False,
348+
action="store_true",
349+
help="If given, create snapshots for all datasets, even if the source data was not updated.",
350+
)
340351
args = argument_parser.parse_args()
341-
main(read_only=args.read_only)
352+
main(read_only=args.read_only, include_all_datasets=args.include_all_datasets)

0 commit comments

Comments
 (0)