|
| 1 | +# geoluck <img src="logo/geoduck.png" alt="" width="36" /> |
| 2 | + |
| 3 | +**How much of relative country prosperity can be predicted from geography, natural endowments, resource development, and social structure — and who beats their geography?** |
| 4 | + |
| 5 | +Geoluck is an open-source research project that builds a country-decade panel (1900–2020) and trains machine learning models to predict four prosperity outcomes from tiered feature sets. The results are published as an interactive static site. |
| 6 | + |
| 7 | +This is explicitly about **predictive association**, not causal effect. |
| 8 | + |
| 9 | +**[View the live site →](https://smkwray.github.io/geoluck/)** |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## What the site shows |
| 14 | + |
| 15 | +The static site models four outcome metrics, each converted to within-decade percentile ranks: |
| 16 | + |
| 17 | +| Outcome | Definition | Source | |
| 18 | +|---|---|---| |
| 19 | +| **Income** | Log GDP per capita rank | Maddison Project Database 2023 | |
| 20 | +| **Wealth** | Produced capital per capita rank | World Bank Changing Wealth of Nations | |
| 21 | +| **Life expectancy** | Life expectancy at birth rank | World Bank WDI / UN Population Division | |
| 22 | +| **Inequality** | Disposable-income Gini rank (higher = more equal) | SWIID | |
| 23 | + |
| 24 | +Predictor features are organized into three independently toggleable tiers: |
| 25 | + |
| 26 | +- **Nature** — Pure geography: latitude, climate normals, terrain, soil, malaria ecology, seismic activity, wind/solar potential, ocean productivity, cyclone exposure. |
| 27 | +- **Infrastructure** — Resource development: dams, irrigation, oil/gas/coal/mineral extraction, agricultural land use, energy assets. |
| 28 | +- **Society** — Social and institutional structure: governance, democracy, trade openness, colonial history, ethnic/religious fractionalization, gender inequality, demographics. |
| 29 | + |
| 30 | +All seven non-empty tier combinations are modeled independently for each outcome (28 model bundles). The site supports interactive choropleth maps, model comparison, country-level SHAP feature contributions, country-vs-country comparison, feature exploration by data source, full sortable rankings with CSV export, and shareable deep links. |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Repository structure |
| 35 | + |
| 36 | +``` |
| 37 | +src/ Python pipeline — ETL, feature building, modeling, export |
| 38 | +web/ Static frontend — TypeScript, Vite, Leaflet, Chart.js |
| 39 | +docs/ Methodology and payload documentation |
| 40 | +web/public/data/ Precomputed JSON payloads consumed by the frontend |
| 41 | +``` |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +## Data policy |
| 46 | + |
| 47 | +Raw and intermediate research data are **not** stored in the public repository. Only compact, precomputed JSON payloads required by the static site are committed under `web/public/data/`. These are generated by the Python pipeline's export commands. |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Modeling notes |
| 52 | + |
| 53 | +- Models are evaluated **out of sample** using cross-validated R², RMSE, MAE, and Spearman rank correlation. |
| 54 | +- User-facing predictions and residuals use **cross-validated exports**, not in-sample fits. |
| 55 | +- Feature contributions use SHAP values from fold-trained estimators. |
| 56 | +- Results should be interpreted as **predictive structure**, not causal effects. A high R² for Nature-only features means geography is a strong statistical predictor — likely because it correlates with deeper causal channels — not that geography *causes* prosperity. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## GitHub Pages deployment |
| 61 | + |
| 62 | +The site is deployed through **GitHub Actions**, not "Deploy from a branch." |
| 63 | + |
| 64 | +In repository Settings → Pages, set the source to **GitHub Actions**. The workflow builds the frontend from `web/` and publishes the contents of `web/dist/`. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## Local development |
| 69 | + |
| 70 | +```bash |
| 71 | +# Python pipeline |
| 72 | +make sync # Install/sync Python dependencies |
| 73 | +make test # Run tests |
| 74 | + |
| 75 | +# Frontend |
| 76 | +make web-build # Build the static site (output: web/dist/) |
| 77 | +``` |
| 78 | + |
| 79 | +The frontend expects JSON data under `web/public/data/`. These payloads are committed to the repository and are generated by: |
| 80 | + |
| 81 | +```bash |
| 82 | +uv run geoluck export-web-data |
| 83 | +``` |
| 84 | + |
| 85 | +For frontend development with hot reload: |
| 86 | + |
| 87 | +```bash |
| 88 | +cd web && npm run dev |
| 89 | +``` |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +## Documentation |
| 94 | + |
| 95 | +- [`DATA_SOURCES.md`](DATA_SOURCES.md) — Source registry and licensing notes |
| 96 | +- [`docs/MODEL_SPECS.md`](docs/MODEL_SPECS.md) — Model families, feature-set variants, evaluation design |
| 97 | +- [`docs/UI_DATA_PAYLOADS.md`](docs/UI_DATA_PAYLOADS.md) — Frontend JSON payload schemas |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +## License |
| 102 | + |
| 103 | +MIT |
0 commit comments