You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/roadmap.qmd
+49-2Lines changed: 49 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,31 @@
1
+
---
2
+
title: Roadmap
3
+
---
4
+
1
5
# Econometrics and Supervised Learning Roadmap
2
6
3
7
This document collects proposed functionality expansions for `pyensmallen`, based on the existing notebooks and current API surface.
4
8
9
+
## Current Status
10
+
11
+
Implemented on `master`:
12
+
13
+
- Estimator classes for `LinearRegression`, `LogisticRegression`, and `PoissonRegression`
14
+
- fitted attributes including `coef_`, `intercept_`, covariance estimates, confidence intervals, and `summary()`
15
+
- classical and robust sandwich standard errors for unregularized OLS, logit, and Poisson
16
+
- exact L1 and L2 regularization for the core estimator classes via backend solver switching
17
+
- Quarto docs, generated API reference, benchmark page, and executed notebook pages
18
+
- macOS wheel repair for vendored BLAS linkage and post-patch ad-hoc codesigning
19
+
20
+
Still outstanding from the original roadmap:
21
+
22
+
- true separable-objective and mini-batch training support
23
+
- productized JAX objective bridge
24
+
- richer inference utilities beyond the current robust covariance path
25
+
- workflow-level evaluation and model-selection helpers
26
+
- formula and DataFrame ergonomics
27
+
- additional estimator classes beyond the current linear / logit / Poisson set
28
+
5
29
## First Tranche
6
30
7
31
The first set of items to prioritize:
@@ -10,12 +34,15 @@ The first set of items to prioritize:
10
34
2. First-class regularization support
11
35
3. Proper stochastic / mini-batch training support
12
36
13
-
These are the highest-leverage additions for making `pyensmallen` useful beyond optimizer demos and low-level objective wrappers.
37
+
The first two are now in place. The remaining item in this tranche is proper stochastic / mini-batch training support.
14
38
15
39
## Full Proposal List
16
40
17
41
### 1. Estimator classes for common supervised models
18
42
43
+
Status:
44
+
Partially complete. `LinearRegression`, `LogisticRegression`, and `PoissonRegression` now exist. Multinomial and other nonlinear estimators remain open.
45
+
19
46
Add estimator APIs for standard econometrics and ML models:
20
47
21
48
-`LinearRegression`
@@ -41,6 +68,9 @@ The current API is objective-first. Real workflows usually want model objects, n
41
68
42
69
### 2. First-class regularization support
43
70
71
+
Status:
72
+
Partially complete. Exact L1 and L2 support is implemented for the core estimator classes. Mixed elastic net, regularization paths, and CV selection remain open.
73
+
44
74
Add penalized estimation support across core models:
45
75
46
76
- L1
@@ -56,6 +86,9 @@ This is central to both supervised learning and modern econometrics, especially
56
86
57
87
### 3. Productized JAX bridge
58
88
89
+
Status:
90
+
Not started as library surface. The notebook pattern exists, but there is still no supported wrapper API.
91
+
59
92
Turn the current notebook pattern into a supported API:
60
93
61
94
-`JaxObjective`
@@ -74,6 +107,9 @@ The multinomial logit notebook already shows this is useful. It should be librar
74
107
75
108
### 4. Proper stochastic / mini-batch training support
76
109
110
+
Status:
111
+
Not started. This remains the next major ML-side gap.
112
+
77
113
Expose true separable-objective support for first-order optimizers:
78
114
79
115
- mini-batch iteration
@@ -94,6 +130,9 @@ The Adam-family bindings exist, but the current wrapper behaves like full-batch
94
130
95
131
### 5. Inference utilities beyond point estimation
96
132
133
+
Status:
134
+
Partially complete. Classical and robust sandwich covariance are available for unregularized OLS, logit, and Poisson. Clustered, HAC, tests, marginal effects, and bootstrap helpers remain open.
135
+
97
136
Expand the econometrics side with reusable inference tools:
98
137
99
138
- sandwich covariance
@@ -110,6 +149,9 @@ The package already goes in this direction for GMM. Extending it to MLE models w
110
149
111
150
### 6. Model selection and evaluation tools
112
151
152
+
Status:
153
+
Not started as library functionality.
154
+
113
155
Add workflow-level evaluation and tuning utilities:
114
156
115
157
- train / validation splitting
@@ -132,6 +174,9 @@ Several notebooks currently hand-roll comparison and tuning logic that should li
132
174
133
175
### 7. Higher-level causal and panel estimators
134
176
177
+
Status:
178
+
Still mostly out of scope for this repo; the sibling `synthlearners` repository remains the main home for panel estimators.
179
+
135
180
Potential estimator layer additions include:
136
181
137
182
-`SyntheticControl`
@@ -147,6 +192,9 @@ This is a natural applied econometrics extension, though a substantial part of t
147
192
148
193
### 8. Formula and DataFrame ergonomics
149
194
195
+
Status:
196
+
Not started.
197
+
150
198
Improve usability for empirical workflows:
151
199
152
200
- formula interface
@@ -177,4 +225,3 @@ Current working assumption:
177
225
178
226
-`pyensmallen` should focus on optimization primitives, reusable objectives, supervised estimators, autodiff integration, and inference utilities.
179
227
-`synthlearners` should remain the home for most panel and synthetic-control estimators, while depending on `pyensmallen` where useful.
0 commit comments