generated from catalyst-cooperative/cheshire
-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Overview
I've settled on splink model parameters in a notebook for linking SEC to EIA. Now it's time to get this working in our pipeline and create an output table that's usable by the masses.
Success Criteria
How will we know that we're done?
- Record linkage of SEC to EIA runs in production
- Output table of SEC filers and subsidiaries links to EIA
- Validation metrics are logged in
MLflow
### Next steps
- [x] Create a module that formats the SEC table into a denormalized, flattened output table
- [x] Create a preprocessing module for EIA
- [ ] Create a record linkage module that runs the splink notebook and integrate the notebook into Dagster
- [ ] Add a column that links the SEC table to EIA utilities
- [ ] Make a PUDL name cleaner improvements PR
- [ ] Integrate MLflow to log model metrics
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
In progress