Skip to content

Integrate record linkage into production #119

@katie-lamb

Description

@katie-lamb

Overview

I've settled on splink model parameters in a notebook for linking SEC to EIA. Now it's time to get this working in our pipeline and create an output table that's usable by the masses.

Success Criteria

How will we know that we're done?

  • Record linkage of SEC to EIA runs in production
  • Output table of SEC filers and subsidiaries links to EIA
  • Validation metrics are logged in MLflow
### Next steps
- [x] Create a module that formats the SEC table into a denormalized, flattened output table
- [x] Create a preprocessing module for EIA
- [ ] Create a record linkage module that runs the splink notebook and integrate the notebook into Dagster
- [ ] Add a column that links the SEC table to EIA utilities
- [ ] Make a PUDL name cleaner improvements PR
- [ ] Integrate MLflow to log model metrics

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions