forked from USF-COT/glider_utils
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
There are several steps involved in migration to parquet for intermediate processing of slocum glider data.
- ensure
dbdreaderreproduces similar results to slocum binaries (Feature: optionally include first record in data payloadย smerckel/dbdreader#18) - replacement of
convertDbds.shwithdbdreader/parquet - desired storage pattern for
parquet(just using tables for now)
Is there a particular storage pattern or design desired for the parquet data structures?
REF: https://arrow.apache.org/docs/python/parquet.html#parquet-file-writing-options
- Enforce version 2.4? 2.6?
- Ensure the structure is queryable by time for speedy subsetting?
- If enforcing 2.6, timestamp units become less an issue
- Partitioning? Glider ID, Deployment ID, Process method (rt vs. delayed), QC'd (Level 0, 1, ...)
- local file system or plug into directly to duckdb (https://duckdb.org/docs/guides/python/filesystems.html) or allow both
Have to nail down a potential dbdreader issue first.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels