Skip to content

Latest commit

 

History

History
54 lines (52 loc) · 25.9 KB

File metadata and controls

54 lines (52 loc) · 25.9 KB

Data Sources

Track every external dataset here before or when it is added.

Source URL Access date License note Redistribution note Local script/path Status
Maddison Project Database 2023 https://doi.org/10.34894/INZBF2 2026-03-08 CC-BY-4.0 per Dataverse metadata Raw Stata file is fetched locally into data_raw/; tidy outputs may be redistributed with attribution review src/geoluck/etl/fetch_maddison.py, data_intermediate/maddison/ active
Natural Earth Admin 0 Countries 110m https://www.naturalearthdata.com/downloads/110m-cultural-vectors/ 2026-03-08 Public domain / Natural Earth terms Raw zip is fetched locally into data_raw/; derived geometry/reference exports are repo outputs src/geoluck/etl/fetch_natural_earth.py, src/geoluck/features/build_country_reference.py, data_intermediate/natural_earth/ active
Natural Earth 110m physical vectors https://www.naturalearthdata.com/downloads/110m-physical-vectors/ 2026-03-09 Public domain / Natural Earth terms Raw zips are fetched locally into data_raw/natural_earth/physical/; normalized physical layers and derived hydro features are exported to parquet src/geoluck/etl/fetch_natural_earth_physical.py, src/geoluck/features/build_hydro_terrain_features.py, data_intermediate/natural_earth/ active
World Bank WDI https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation 2026-03-09 World Bank open data terms; confirm per series before redistribution Raw API payloads are fetched locally into data_raw/wdi/; derived feature tables now include land, water, fisheries, resource rents, depletion, and primary-resource export mix src/geoluck/etl/fetch_wdi.py, src/geoluck/features/build_wdi_features.py, data_intermediate/wdi/ active
World Bank WGI https://databank.worldbank.org/data/download/WGI_CSV.zip 2026-03-09 World Bank terms apply; confirm redistribution expectations for derived governance tables Raw ZIP is fetched locally into data_raw/wgi/; normalized country-year governance estimates and decade features may be redistributed as derived tables with source citation review src/geoluck/etl/fetch_wgi.py, src/geoluck/features/build_wgi_features.py, data_intermediate/wgi/ active
Alesina fractionalization (2003) https://www.anderson.ucla.edu/faculty_pages/romain.wacziarg/downloads/2003_fractionalization.xls 2026-03-09 Public academic workbook; cite the Alesina et al. release and verify redistribution expectations for derived aggregates Raw XLS is fetched locally into data_raw/alesina_fractionalization/; matched country-level ethnic, linguistic, and religious fractionalization features may be redistributed as compact derived tables after citation review src/geoluck/etl/fetch_alesina_fractionalization.py, src/geoluck/features/build_alesina_fractionalization_features.py, data_intermediate/alesina_fractionalization/ active
WorldClim 2.1 https://www.worldclim.org/data/worldclim21.html 2026-03-09 Free for research and related activities; keep citation and usage terms with ingestion docs Raw zip archives are fetched locally into data_raw/worldclim/; derived country features are exported to parquet src/geoluck/etl/fetch_worldclim.py, src/geoluck/features/build_climate_normals.py, data_intermediate/worldclim/ active
WorldClim monthly weather https://www.worldclim.org/data/monthlywth.html 2026-03-08 Free for research and related activities; confirm redistribution for derived summaries Prefer derived country-year climate summaries planned raster ETL under src/geoluck/etl/ planned
CRU CY 4.09 Country Averages https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.09/crucy.2503061057.v4.09/countries 2026-03-09 Confirm UEA terms before redistribution of derived tables Raw country files are fetched locally into data_raw/cru_cy/; derived decade features are exported to parquet src/geoluck/etl/fetch_cru_cy.py, src/geoluck/features/build_climate_variability.py, data_intermediate/cru_cy/ active
FAO HWSD v2 https://www.isric.org/sites/default/files/HWSD2.sqlite 2026-03-10 HWSD v2 metadata indicates CC BY-NC-SA terms; keep the non-commercial/share-alike restriction explicit until a newer license clarification is verified Raw SQLite plus the linked HWSD2_RASTER.zip are fetched locally into data_raw/hwsd/; representative-point soil samples and compact country-level soil features may be redistributed as derived outputs with clear source citation and license review src/geoluck/etl/fetch_hwsd.py, src/geoluck/features/build_hwsd_features.py, data_intermediate/hwsd/ active
USGS Earthquake API https://earthquake.usgs.gov/fdsnws/event/1/ 2026-03-10 USGS event-catalog API is public research data; retain the API citation and review redistribution expectations for republished event subsets Raw paged CSV is fetched locally into data_raw/usgs_earthquakes/; normalized country-event joins and compact country-level hazard features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_usgs_earthquakes.py, src/geoluck/features/build_usgs_earthquake_features.py, data_intermediate/usgs_earthquakes/ active
NOAA IBTrACS v04r01 https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/csv/ibtracs.ALL.list.v04r01.csv 2026-03-10 NOAA/NCEI public archive; retain the IBTrACS citation and review redistribution expectations for republished track subsets Raw CSV is fetched locally into data_raw/ibtracs/; normalized country land-track points and compact country-level cyclone-hazard features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_ibtracs.py, src/geoluck/features/build_ibtracs_features.py, data_intermediate/ibtracs/ active
Marine Regions World EEZ v12 https://www.marineregions.org/download_file.php?name=World_EEZ_v12_20231025.zip 2026-03-10 Marine Regions distributes the World EEZ download behind a click-through form; keep the attribution and reuse terms with the dataset provenance and review redistribution expectations for republished geometry Raw ZIP is fetched locally into data_raw/eez/; normalized sovereign-claim joins and compact country-level EEZ summary features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_marine_regions_eez.py, src/geoluck/features/build_marine_regions_eez_features.py, data_intermediate/eez/ active
NOAA ERDDAP monthly ocean NPP https://erddap.marine.usf.edu/erddap/griddap/moda_npp_mo_glob.html 2026-03-10 ERDDAP dataset is publicly accessible with attribution/disclaimer expectations from the host page; keep the dataset page and time window in provenance Raw point-query JSONL is fetched locally into data_raw/ocean_npp/; normalized monthly EEZ-claim point time series and compact country-level maritime-productivity features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_ocean_npp.py, src/geoluck/features/build_ocean_npp_features.py, data_intermediate/ocean_npp/ active
Energy Institute Statistical Review all-data workbook https://www.energyinst.org/statistical-review/resources-and-data-downloads 2026-03-10 Energy Institute workbook is publicly downloadable behind a browser/email-gated page; keep the workbook citation and reserve-methodology notes in provenance Raw workbook is stored locally in data_raw/energy_institute/; normalized country-year proved-reserve rows and compact decade features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_energy_institute_reserves.py, src/geoluck/features/build_energy_institute_reserves_features.py, data_intermediate/energy_institute/ active
FAO AQUASTAT dams workbooks https://www.fao.org/aquastat/en/databases/ 2026-03-09 FAO terms apply; confirm extract-level redistribution Raw regional workbooks are fetched locally into data_raw/aquastat/dams/; normalized dam inventories and country features are exported to parquet src/geoluck/etl/fetch_aquastat_dams.py, src/geoluck/features/build_aquastat_dams_features.py, data_intermediate/aquastat/ active
HydroATLAS / BasinATLAS https://www.hydrosheds.org/hydroatlas 2026-03-09 HydroATLAS as a compendium is CC-BY 4.0; individual attribute columns may carry CC-BY 4.0 or ODbL 1.0 per the technical documentation and catalog sheets Prefer exporting country-level derived aggregates only; keep the raw BasinATLAS download local because the full package is large, the official mirror may require browser-assisted download, and attribute-level citation requirements vary src/geoluck/etl/fetch_hydroatlas.py, src/geoluck/features/build_hydroatlas_features.py; local raw archive under data_raw/hydroatlas/; CLI: fetch-hydroatlas, build-hydroatlas-features active
EIA Company Level Imports https://www.eia.gov/petroleum/imports/companylevel/ 2026-03-10 US EIA public workbook downloads; keep year-specific archive URLs in provenance and review redistribution expectations for derived country summaries Raw XLSX workbooks are fetched locally into data_raw/eia_company_imports/; normalized country-year crude-import quality proxies and a decade-2020-only derived feature table may be redistributed as compact outputs with clear leakage notes src/geoluck/etl/fetch_eia_company_imports.py, src/geoluck/features/build_eia_oil_quality_features.py, data_intermediate/eia_company_imports/ active
Global Oil and Gas Extraction Tracker March 2026 https://globalenergymonitor.org/projects/global-oil-gas-extraction-tracker/download-data/ 2026-03-10 GEM tracker download is CC BY 4.0 with click-through/manual access; retain the release month and citation in provenance Raw manual-download workbook is kept locally in data_raw/goget/; normalized country-unit tables and compact country-level unit-share/gas-type features may be redistributed as derived outputs with source citation and scope notes src/geoluck/etl/fetch_goget.py, src/geoluck/features/build_goget_features.py, data_intermediate/goget/ active
Global Coal Mine Tracker May 2025 https://globalenergymonitor.org/projects/global-coal-mine-tracker/download-data/ 2026-03-10 GEM tracker download is CC BY 4.0 with click-through/manual access; retain the release month and supplement citation in provenance Raw manual-download workbooks are kept locally under data_raw/gcmt*/; normalized country-mine tables and compact country-level coal-rank, mine-type, and methane features may be redistributed as derived outputs with source citation and scope notes src/geoluck/etl/fetch_gcmt.py, src/geoluck/features/build_gcmt_features.py, data_intermediate/gcmt/ active
Global Energy Ownership Tracker February 2026 https://globalenergymonitor.org/projects/global-energy-ownership-tracker/ 2026-03-10 GEM tracker download is CC BY 4.0 with click-through/manual access; retain the release month and citation in provenance Raw manual-download workbook is kept locally under data_raw/geot*/; normalized parent-headquarters ownership rows and compact country-level ownership-structure/sector-footprint features may be redistributed as derived outputs with source citation and modern-ownership scope notes src/geoluck/etl/fetch_geot.py, src/geoluck/features/build_geot_features.py, data_intermediate/geot/ active
OPEC Annual Statistical Bulletin 2025 https://www.opec.org/assets/assetdb/asb-2025.pdf 2026-03-10 OPEC official publication; keep source citation and review redistribution expectations for extracted country conversion-factor tables Raw PDF is kept locally in data_raw/opec_asb/; extracted OPEC-member crude conversion factors plus implied density/API features may be redistributed as compact derived outputs with a note that sulfur was not machine-readable in this pass src/geoluck/etl/fetch_opec_asb.py, src/geoluck/features/build_opec_asb_features.py, data_intermediate/opec_asb/ active
Global Solar Atlas https://globalsolaratlas.info/ 2026-03-10 World Bank Group / Solargis service; site indicates free public use and the backlog notes CC BY 4.0 for the broader atlas outputs, but retain the service citation and review exact redistribution terms for bulk-derived country tables Raw country-point API responses are fetched locally into data_raw/global_solar_atlas/; normalized representative-point samples and compact country-level solar features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_global_solar_atlas.py, src/geoluck/features/build_global_solar_atlas_features.py, data_intermediate/global_solar_atlas/ active
OpenEI country wind supply curves https://data.openei.org/submissions/273 2026-03-10 OpenEI-hosted workbook from NREL supply-curve modeling; retain the source page citation and review redistribution expectations for the country tables before publishing bulk extracts Raw workbook is fetched locally into data_raw/openei_wind/; normalized country-scope onshore/offshore tables and compact country-level wind-potential features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_openei_wind.py, src/geoluck/features/build_openei_wind_features.py, data_intermediate/openei_wind/ active
World Coal Quality Inventory https://www.usgs.gov/data/world-coal-quality-inventory-version-10 2026-03-10 USGS public research data; keep the direct workbook URL and note the sample-based nature of the inventory in provenance Raw XLS workbook is fetched locally into data_raw/wocqi/; normalized country-sample coal chemistry rows and compact country-level coal-quality summaries may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_wocqi.py, src/geoluck/features/build_wocqi_features.py, data_intermediate/wocqi/ active
Barro-Lee educational attainment https://raw.githubusercontent.com/barrolee/BarroLeeDataSet/master/BLData/BL2013_MF1599_v2.2.dta 2026-03-09 Public academic mirror; cite the Barro-Lee release and review final redistribution expectations before publishing derived tables Raw Stata file is fetched locally into data_raw/barro_lee/; derived schooling features may be redistributed as compact aggregates after citation review src/geoluck/etl/fetch_barro_lee.py, src/geoluck/features/build_barro_lee_features.py, data_intermediate/barro_lee/ active
La Porta legal origins https://faculty.tuck.dartmouth.edu/images/uploads/faculty/rafael-laporta/EconomicCon_data.xls 2026-03-09 Author-posted academic workbook; cite the La Porta source paper and review redistribution expectations for derived extracts Raw XLS is fetched locally into data_raw/laporta_legal_origins/; compact country-level legal-origin features may be redistributed as derived tables after citation review src/geoluck/etl/fetch_laporta_legal_origins.py, src/geoluck/features/build_laporta_legal_origins_features.py, data_intermediate/laporta_legal_origins/ active
Freedom House Freedom in the World https://freedomhouse.org/sites/default/files/2025-10/All_data_FIW_2013-2025.xlsx 2026-03-09 Publicly downloadable Freedom House research data; cite the report and confirm redistribution expectations for derived extracts Raw XLSX is fetched locally into data_raw/freedom_house/; normalized country-year and country-decade democracy/civil-liberties features may be redistributed as compact derived tables after citation review src/geoluck/etl/fetch_freedom_house.py, src/geoluck/features/build_freedom_house_features.py, data_intermediate/freedom_house/ active
Fragile States Index https://fragilestatesindex.org/excel/ 2026-03-10 Public yearly workbooks from the Fragile States Index site; cite the report series and review redistribution expectations for derived extracts Raw XLSX workbooks are fetched locally into data_raw/fsi/; normalized country-year fragility scores and compact decade features may be redistributed as derived outputs after citation review src/geoluck/etl/fetch_fsi.py, src/geoluck/features/build_fsi_features.py, data_intermediate/fsi/ active
V-Dem Core v15 Country-Year https://www.v-dem.net/data/the-v-dem-dataset/ 2026-03-10 V-Dem public research data; cite V-Dem v15 and review redistribution expectations for derived country-year extracts Raw CSV ZIP is fetched locally into data_raw/vdem/; normalized country-year democracy indicators and compact decade features may be redistributed as derived outputs after citation review src/geoluck/etl/fetch_vdem.py, src/geoluck/features/build_vdem_features.py, data_intermediate/vdem/ active
UCDP Organized Violence Country-Year 25.1 https://ucdp.uu.se/downloads/ 2026-03-10 UCDP public research downloads; cite UCDP release 25.1 and review redistribution expectations for derived country-year extracts Raw ZIP is fetched locally into data_raw/ucdp_conflict/; normalized country-year organized-violence panel and compact decade features may be redistributed as derived outputs after citation review src/geoluck/etl/fetch_ucdp_conflict.py, src/geoluck/features/build_ucdp_conflict_features.py, data_intermediate/ucdp_conflict/ active
Pew Research Center religious composition https://www.pewresearch.org/wp-content/uploads/sites/20/2025/06/Religious-Composition-2010-2020-dataset.zip 2026-03-09 Publicly downloadable Pew research dataset; cite the report and review redistribution expectations for derived extracts Raw ZIP is fetched locally into data_raw/pew_religion/; normalized country-decade religion-share and diversity tables may be redistributed as compact derived outputs after citation review src/geoluck/etl/fetch_pew_religion.py, src/geoluck/features/build_pew_religion_features.py, data_intermediate/pew_religion/ active
CEPII GeoDist http://www.cepii.fr/distance/dist_cepii.zip 2026-03-09 Freely downloadable CEPII research data; cite CEPII and review redistribution expectations for derived extracts Raw ZIP is fetched locally into data_raw/cepii/; normalized bilateral ties and compact country-level distance/colonial-history summaries may be redistributed as derived outputs after citation review src/geoluck/etl/fetch_cepii_geodist.py, src/geoluck/features/build_cepii_geodist_features.py, data_intermediate/cepii/ active
USGS MRDS https://mrdata.usgs.gov/mrds/mrds-csv.zip 2026-03-09 USGS open/public-domain data; attribution requested Raw ZIP is fetched locally into data_raw/mrds/; normalized site-level records and compact country-level deposit-presence summaries may be redistributed as derived outputs src/geoluck/etl/fetch_mrds.py, src/geoluck/features/build_mrds_features.py, data_intermediate/mrds/ active
Open database on global coal and metal mine production https://zenodo.org/records/7369478 2026-03-10 CC BY 4.0 per the Zenodo record and associated Nature Scientific Data descriptor; the maintained scripted fetch uses the authors' public Fineprint Global GitHub mirror for the workbook and price table when the direct Zenodo file endpoint is bot-protected Raw workbook plus price CSV are fetched locally into data_raw/open_mine_production/; normalized country-year commodity rows and compact country-level mine-production/value proxy features may be redistributed as derived outputs with source citation and undercoverage notes src/geoluck/etl/fetch_open_mine_production.py, src/geoluck/features/build_open_mine_production_features.py, data_intermediate/open_mine_production/ active
Kiszewski malaria ecology index https://www.dropbox.com/s/sj3c3kiqjvuxilc/ME.dta?dl=1 2026-03-09 Author-hosted academic Stata file; cite Kiszewski et al. and review redistribution expectations for derived extracts Raw .dta is fetched locally into data_raw/kiszewski/; normalized country-level malaria-ecology features may be redistributed as compact derived outputs after citation review src/geoluck/etl/fetch_kiszewski.py, src/geoluck/features/build_kiszewski_features.py, data_intermediate/kiszewski/ active
EM-DAT https://doc.emdat.be/docs/data-accessibility/ 2026-03-08 Non-commercial and archive licensing varies by product; review before use Access restrictions mean we may only export downstream aggregates planned ETL under src/geoluck/etl/ planned-review
SPEIbase https://spei.csic.es/database.html 2026-03-08 Confirm redistribution terms before derived export Suitable for drought severity aggregates planned raster ETL under src/geoluck/etl/ planned
Penn World Table 10.01 https://www.rug.nl/ggdc/productivity/pwt/pwt-releases/pwt1001 2026-03-09 PWT is openly distributed for research with citation; pin 10.01 specifically because later releases revise history Raw workbook is fetched locally into data_raw/pwt/; normalized country-year human-capital and trade-share tables plus derived decade features may be redistributed as compact derived outputs after citation review src/geoluck/etl/fetch_pwt.py, src/geoluck/features/build_pwt_features.py, data_intermediate/pwt/ active
Polity 5 http://www.systemicpeace.org/inscr/p5v2018.xls 2026-03-11 Public academic workbook distributed from Systemic Peace; keep the Polity 5 citation and note the manual browser download path because scripted requests currently return HTTP 406 Raw XLS is kept locally in data_raw/polity/; normalized country-year regime-authority rows and compact trailing-decade governance features may be redistributed as derived outputs after citation review and with explicit notes about excluded dissolved states src/geoluck/etl/fetch_polity.py, src/geoluck/features/build_polity_features.py, data_intermediate/polity/ active
SWIID 9.91 summary CSV https://fsolt.org/swiid/swiid_downloads/ 2026-03-11 SWIID is distributed for academic use with citation; keep the SWIID version note and source-page / Dataverse references in provenance Raw summary CSV is fetched locally into data_raw/swiid/; normalized country-year inequality rows and merged country-decade outcome targets may be redistributed as compact derived outputs after citation review src/geoluck/etl/fetch_swiid.py, src/geoluck/features/build_outcomes_panel.py, data_intermediate/swiid/ active
World Bank Wealth Accounts produced capital per capita https://api.worldbank.org/v2/indicator/NW.PCA.PC?format=json 2026-03-11 World Bank Wealth Accounts / Changing Wealth of Nations indicator; keep the indicator code NW.PCA.PC, source id 59, and CWON citation in provenance Raw API payloads are fetched locally into data_raw/wealth_accounts/; normalized country-year produced-capital rows and merged country-decade wealth targets may be redistributed as compact derived outputs with source citation review src/geoluck/etl/fetch_wealth_accounts.py, src/geoluck/features/build_outcomes_panel.py, data_intermediate/wealth_accounts/ active
World Bank female labor-force participation rate (SL.TLF.CACT.FE.ZS) https://api.worldbank.org/v2/indicator/SL.TLF.CACT.FE.ZS?format=json 2026-03-13 World Bank open data terms apply; keep the indicator code and source id in provenance Raw API payloads are fetched locally into data_raw/female_lfpr/; normalized country-year female labor-force participation rows and merged country-decade outcome targets may be redistributed as compact derived outputs with source citation review src/geoluck/etl/fetch_female_lfpr.py, src/geoluck/features/build_outcomes_panel.py, data_intermediate/female_lfpr/ active
World Bank Women, Business and the Law (SG.LAW.INDX) https://api.worldbank.org/v2/indicator/SG.LAW.INDX?format=json 2026-03-13 World Bank open data terms apply; keep the indicator code and source id in provenance Raw API payloads are fetched locally into data_raw/women_business_law/; normalized country-year Women, Business and the Law rows and merged country-decade outcome targets may be redistributed as compact derived outputs with source citation review src/geoluck/etl/fetch_women_business_law.py, src/geoluck/features/build_outcomes_panel.py, data_intermediate/women_business_law/ active
UN World Population Prospects 2024 https://population.un.org/wpp/downloads 2026-03-09 UN DESA public downloads; cite WPP 2024 and keep note that workbook schemas may change across releases Raw WPP 2024 workbooks are fetched locally into data_raw/wpp/; normalized country-year demographic tables and compact decade features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_wpp.py, src/geoluck/features/build_wpp_features.py, data_intermediate/wpp/ active
UNDP Gender Inequality Index 2025 https://hdr.undp.org/sites/default/files/2025_HDR/HDR25_Statistical_Annex_GII_Table.xlsx 2026-03-09 UNDP HDR workbook is publicly downloadable; cite HDR 2025 and preserve the mixed component-year note in provenance Raw workbook is fetched locally into data_raw/undp_gii/; normalized country-level GII and component indicators plus derived gender-gap features may be redistributed as compact derived outputs with source citation review src/geoluck/etl/fetch_undp_gii.py, src/geoluck/features/build_undp_gii_features.py, data_intermediate/undp_gii/ active
Glottolog CLDF languages https://raw.githubusercontent.com/glottolog/glottolog-cldf/master/cldf/languages.csv 2026-03-09 Open scholarly dataset with attribution; keep note that the raw GitHub branch path is a moving snapshot unless pinned to a release Raw CSV is fetched locally into data_raw/glottolog/; normalized country-language inventory and compact country-level language-count features may be redistributed as derived outputs with source citation review src/geoluck/etl/fetch_glottolog.py, src/geoluck/features/build_glottolog_features.py, data_intermediate/glottolog/ active
UNDP HDI planned-review