apoorvalal
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎ECONOMETRICS_ML_ROADMAP.md‎
Lines changed: 180 additions & 0 deletions b/‎ECONOMETRICS_ML_ROADMAP.md‎
Lines changed: 180 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 67 additions & 5 deletions b/‎README.md‎
Lines changed: 67 additions & 5 deletions
diff --git a/‎docs/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎docs/.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/_quarto.yml‎
Lines changed: 86 additions & 0 deletions b/‎docs/_quarto.yml‎
Lines changed: 86 additions & 0 deletions
diff --git a/‎docs/assets/benchmark_conv.png‎
405 KB b/‎docs/assets/benchmark_conv.png‎
405 KB
diff --git a/‎docs/assets/benchmark_time.png‎
946 KB b/‎docs/assets/benchmark_time.png‎
946 KB
@@ -166,3 +166,9 @@ cython_debug/
 .DS_Store
 .claude/settings.local.json
 tmp.md
+ensmallen/
+docs/_site/
+.quarto/
+synthlearners_repo/
+docs/reference/
+docs/objects.json
@@ -0,0 +1,180 @@
+# Econometrics and Supervised Learning Roadmap
+
+This document collects proposed functionality expansions for `pyensmallen`, based on the existing notebooks and current API surface.
+
+## First Tranche
+
+The first set of items to prioritize:
+
+1. Estimator classes for common supervised models
+2. First-class regularization support
+3. Proper stochastic / mini-batch training support
+
+These are the highest-leverage additions for making `pyensmallen` useful beyond optimizer demos and low-level objective wrappers.
+
+## Full Proposal List
+
+### 1. Estimator classes for common supervised models
+
+Add estimator APIs for standard econometrics and ML models:
+
+- `LinearRegression`
+- `LogisticRegression`
+- `PoissonRegression`
+- `MultinomialLogit`
+- `Probit`
+- `NegativeBinomial`
+- optionally `CoxPH`
+
+Each estimator should expose a workflow-level API:
+
+- `fit`
+- `predict`
+- `predict_proba` where applicable
+- `score`
+- fitted coefficients and intercept
+- convergence diagnostics
+- optional standard errors and summaries
+
+Rationale:
+The current API is objective-first. Real workflows usually want model objects, not raw closures.
+
+### 2. First-class regularization support
+
+Add penalized estimation support across core models:
+
+- L1
+- L2
+- elastic net
+- regularization paths
+- cross-validated penalty selection
+
+This should work naturally with existing constrained optimization ideas already present in the package.
+
+Rationale:
+This is central to both supervised learning and modern econometrics, especially in high-dimensional settings.
+
+### 3. Productized JAX bridge
+
+Turn the current notebook pattern into a supported API:
+
+- `JaxObjective`
+- `AutoDiffObjective`
+- or `AutoDiffEstimator`
+
+The wrapper should accept a JAX loss function and automatically provide:
+
+- objective evaluation
+- gradients
+- shape handling
+- low-boilerplate integration with ensmallen optimizers
+
+Rationale:
+The multinomial logit notebook already shows this is useful. It should be library functionality, not notebook glue code.
+
+### 4. Proper stochastic / mini-batch training support
+
+Expose true separable-objective support for first-order optimizers:
+
+- mini-batch iteration
+- batch indexing
+- data shuffling
+- epoch-level callbacks
+- objective tracking
+- early stopping hooks
+
+This is especially important for:
+
+- large supervised-learning problems
+- neural-style differentiable objectives
+- scalable generalized linear models
+
+Rationale:
+The Adam-family bindings exist, but the current wrapper behaves like full-batch optimization. That limits the ML use case substantially.
+
+### 5. Inference utilities beyond point estimation
+
+Expand the econometrics side with reusable inference tools:
+
+- sandwich covariance
+- HC0-HC3 robust standard errors
+- clustered standard errors
+- HAC / Newey-West
+- Wald, likelihood-ratio, and score tests
+- delta method
+- marginal effects
+- bootstrap helpers for MLE models
+
+Rationale:
+The package already goes in this direction for GMM. Extending it to MLE models would make it much more useful for empirical work.
+
+### 6. Model selection and evaluation tools
+
+Add workflow-level evaluation and tuning utilities:
+
+- train / validation splitting
+- K-fold cross-validation
+- time-series cross-validation
+- standard supervised metrics
+- calibration diagnostics
+- hyperparameter search
+- early stopping support
+
+Metrics should include at least:
+
+- RMSE
+- MAE
+- log loss
+- AUC
+
+Rationale:
+Several notebooks currently hand-roll comparison and tuning logic that should live in the library.
+
+### 7. Higher-level causal and panel estimators
+
+Potential estimator layer additions include:
+
+- `SyntheticControl`
+- balancing weights estimators
+- ridge-augmented synthetic control
+- matrix-completion synthetic control
+- DiD and event-study estimators
+- IV / 2SLS / LIML
+- doubly robust or orthogonal-score estimators
+
+Rationale:
+This is a natural applied econometrics extension, though a substantial part of this already exists in the sibling `synthlearners` repository.
+
+### 8. Formula and DataFrame ergonomics
+
+Improve usability for empirical workflows:
+
+- formula interface
+- automatic intercept handling
+- categorical encoding
+- missing-data policy
+- sample weights
+- grouped / clustered identifiers
+- pandas-friendly summaries
+
+Rationale:
+Econometrics users often work from tabular data first, not prebuilt dense matrices.
+
+## Suggested Implementation Order
+
+1. Estimator classes for core GLMs
+2. Regularization support
+3. True separable-objective and mini-batch support
+4. Inference utilities for MLE models
+5. Productized JAX autodiff bridge
+6. Evaluation and model-selection utilities
+7. Selective integration points with `synthlearners`
+8. Additional causal and panel estimators only where they belong in this repo
+
+## Repo Boundary
+
+Current working assumption:
+
+- `pyensmallen` should focus on optimization primitives, reusable objectives, supervised estimators, autodiff integration, and inference utilities.
+- `synthlearners` should remain the home for most panel and synthetic-control estimators, while depending on `pyensmallen` where useful.
+
@@ -8,6 +8,7 @@ Lightweight python bindings for `ensmallen` library. Currently supports
   - constraints are either lp-ball (lasso, ridge, elastic-net) or simplex
 + (Generalized) Method of Moments estimation with ensmallen optimizers.
   - this uses ensmallen for optimization [and relies on `jax` for automatic differentiation to get gradients and jacobians]. This is the main use case for `pyensmallen` and is the reason for the bindings.
++ Estimator classes for linear, logistic, and Poisson regression with classical and robust inference for unregularized fits
 
 See [ensmallen docs](https://ensmallen.org/docs.html) for details. The `notebooks/` directory walks through several statistical examples.
 
@@ -25,15 +26,76 @@ Then,
 __from pypi__
 
 ```
-pip install pyensmallen
+uv pip install pyensmallen
 ```
 
 __from source__
 1. Install `armadillo` and `ensmallen` for your system (build from source, or via conda-forge; I went with the latter)
 2. git clone this repository
-3. `pip install -e .`
-4. Profit? Or at least minimize loss?
+3. If you are using `uv`:
+   - `uv pip install --python .venv/bin/python meson meson-python ninja pybind11`
+   - `uv pip install --python .venv/bin/python --no-build-isolation -e .`
+4. If you are using vanilla `pip` in an activated environment:
+   - `python -m pip install meson meson-python ninja pybind11`
+   - `python -m pip install --no-build-isolation -e .`
+5. Profit? Or at least minimize loss?
+
+__full development environment__
+
+To install everything required to run tests and notebooks:
+
+```bash
+uv pip install --python .venv/bin/python meson meson-python ninja pybind11
+uv pip install --python .venv/bin/python --no-build-isolation -e ".[full]"
+```
+
+Vanilla `pip` equivalent:
+
+```bash
+python -m pip install meson meson-python ninja pybind11
+python -m pip install --no-build-isolation -e ".[full]"
+```
+
+The `full` extra includes the Python dependencies used by:
+
+- the test suite
+- GMM and autodiff examples
+- benchmark notebooks
+- plotting and notebook tooling
+
+__documentation__
+
+### doc-generation
+
+The repository includes a Quarto documentation site in `docs/`. The docs are built from three sources:
+
+- hand-written Quarto pages in `docs/*.qmd`
+- generated API reference pages in `docs/reference/*.qmd`, built from Python and pybind11 docstrings with `quartodoc`
+- executed notebook pages in `docs/notebooks/*.ipynb`
+
+Use the render script instead of calling `quarto render` directly:
+
+```bash
+scripts/render_docs.sh
+```
+
+The script does the following:
+
+- uses the repository `.venv` as the Quarto Python runtime
+- forces JAX onto CPU so notebook execution is stable during docs builds
+- copies the tracked notebooks from `notebooks/` into `docs/notebooks/`
+- runs `quartodoc` to regenerate the API reference pages from docstrings
+- runs `quarto render docs` to execute the notebooks and build the site
+
+If you need the full docs toolchain first:
+
+```bash
+uv pip install --python .venv/bin/python meson meson-python ninja pybind11
+uv pip install --python .venv/bin/python --no-build-isolation -e ".[full]"
+```
+
+The rendered site lands in `docs/_site/`. The generated API source pages land in `docs/reference/`.
 
 __from wheel__
-- download the appropriate `.whl` for your system from the more recent release listed in `Releases` and run `pip install ./pyensmallen...` OR
-- copy the download url and run `pip install https://github.com/apoorvalal/pyensmallen/releases/download/<version>/pyensmallen-<version>-<pyversion>-linux_x86_64.whl`
+- download the appropriate `.whl` for your system from the more recent release listed in `Releases` and run `uv pip install ./pyensmallen...` OR
+- copy the download url and run `uv pip install https://github.com/apoorvalal/pyensmallen/releases/download/<version>/pyensmallen-<version>-<pyversion>-linux_x86_64.whl`
@@ -0,0 +1 @@
+/.quarto/
@@ -0,0 +1,86 @@
+project:
+  type: website
+  output-dir: _site
+  render:
+    - index.qmd
+    - benchmarks.qmd
+    - optimizers.qmd
+    - estimators.qmd
+    - notebooks.qmd
+    - reference/*.qmd
+    - notebooks/example.ipynb
+    - notebooks/banana.ipynb
+    - notebooks/gmm.ipynb
+    - notebooks/autodiff_mnl.ipynb
+    - notebooks/regularization_comparison.ipynb
+
+metadata-files:
+  - reference/_sidebar.yml
+
+website:
+  title: pyensmallen
+  navbar:
+    left:
+      - href: index.qmd
+        text: Home
+      - href: optimizers.qmd
+        text: Optimizers
+      - href: benchmarks.qmd
+        text: Benchmarks
+      - href: estimators.qmd
+        text: Estimators
+      - href: reference/index.qmd
+        text: API
+      - href: notebooks.qmd
+        text: Notebooks
+  page-footer:
+    left: "pyensmallen documentation"
+
+jupyter: python3
+
+execute:
+  enabled: true
+  warning: false
+  error: false
+
+format:
+  html:
+    theme: cosmo
+    css:
+      - styles.css
+      - reference/_styles-quartodoc.css
+    toc: true
+
+quartodoc:
+  package: pyensmallen
+  dir: reference
+  title: API Reference
+  sidebar: reference/_sidebar.yml
+  css: reference/_styles-quartodoc.css
+  parser: numpy
+  dynamic: true
+  sections:
+    - title: Estimators
+      desc: Supervised estimator classes with inference helpers.
+      contents:
+        - LinearRegression
+        - LogisticRegression
+        - PoissonRegression
+    - title: Optimizers
+      desc: Low-level optimizer bindings exposed from ensmallen.
+      contents:
+        - L_BFGS
+        - FrankWolfe
+        - SimplexFrankWolfe
+        - Adam
+        - AdaMax
+        - AMSGrad
+        - OptimisticAdam
+        - Nadam
+    - title: Objectives and GMM
+      desc: Low-level objectives and the GMM estimator interface.
+      contents:
+        - linear_obj
+        - logistic_obj
+        - poisson_obj
+        - EnsmallenEstimator