generated from openclimatefix/ocf-template
-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
- Maybe
NgedCkanClientdoesn't need to be a class? - Replace
Sensorwith anAssetthat checks the NGED CKAN API for new data, and saves the JSON - Don't de-dupe all rows. Instead crop new dataframe to start of old dataframe, and de-dupe the cropped new df, then append
- PAUSE to see if we can learn more about NGED's new data structures planned for S3.
- Convert from "raw" Parquet to Delta Lake
- Set the job to run once a day. Set a
Schedule. - Implement Dagster pipeline to download & merge substation locations. Move logic out of
packages/dashboard/main.py - Maybe the dagster substation partition should be on substation number?
- Parquets should use Hive partitioning & be partitioned by month and substation number.
- Use separate function for checking data? With an asset check decorator
- Configure where CSVs and Parquet go. Perhaps using IOManager? We'll want different paths in dev and prod and local. And we want all the code in
nged-substation-forecastto have access to these paths. Maybe just use.env? Maybe create a.env.template? - Write unit tests that run Dagster
- retry any network GETs
- Define Dagster asset spec for NGED'S API??
- Use dependency injection (using Resources?) to separate the CKAN client, so it's easy to mock in tests
Reactions are currently unavailable