Ingest NGED primary live data using Dagster

- [x] Maybe `NgedCkanClient` doesn't need to be a class?
- [x] Replace `Sensor` with an `Asset` that checks the NGED CKAN API for new data, and saves the JSON
- [x] Don't de-dupe all rows. Instead crop new dataframe to start of old dataframe, and de-dupe the cropped new df, then append
- [ ] PAUSE to see if we can learn more about NGED's new data structures planned for S3.
- [ ] Convert from "raw" Parquet to Delta Lake
- [ ] Set the job to run once a day. Set a `Schedule`.
- [ ] Implement Dagster pipeline to download & merge substation locations. Move logic out of `packages/dashboard/main.py`
- [ ] Maybe the dagster substation partition should be on substation _number_?
- [ ] Parquets should use Hive partitioning & be partitioned by month and substation number.
- [ ] Use separate function for checking data? With an asset check decorator
- [ ] Configure where CSVs and Parquet go. Perhaps using IOManager? We'll want different paths in dev and prod and local. And we want all the code in `nged-substation-forecast` to have access to these paths. Maybe just use `.env`? Maybe create a `.env.template`?
- [ ] Write unit tests that run Dagster
- [ ] retry any network GETs 
- [ ] Define Dagster asset spec for NGED'S API??
- [ ] Use dependency injection (using Resources?) to separate the CKAN client, so it's easy to mock in tests


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ingest NGED primary live data using Dagster #12

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ingest NGED primary live data using Dagster #12

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions