-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Overview
- Once upon a time we accidentally deployed all of the pre-consolidation EPA CEMS parquet files to our S3 bucket, and there are more than 1000 of them.
- The paths to these objects now show up as possible parquet tables in the usage metrics, which clogs up the dashboard display/legend.
- These paths should never have appeared and I think in most cases were never downloaded, so we can remove them from the logging data during the ETL and have a cleaner, simpler output to work with.
- Currently these paths are being filtered out of the dashboard display because they have 0 downloads, but if we go back to showing all valid paths, they'll reappear.
- They do appear in the dropdown on the lefthand side of the User Metrics dashboard (which is why there are 1600+ tables)
- It's possible that there was an intentional deployment of a state-year partitioned version of the data at some point way back when before we settled on the current output format, but there shouldn't be any under
nightlyorstableor any of the existing versioned releases. - We should also keep an eye out for other accidentally deployed files. I think it was just the partitioned parquet files that were a problem, but it's possible there were others.
Success Criteria
No more zombie EPA CEMS parquet paths show up where they shouldn't.
- post-ETL data
- the dropdowns in the sidebar
- the data visualizations
### Next steps
* [ ] ...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels