-
Notifications
You must be signed in to change notification settings - Fork 13
feat: Introduce functionality to read and write typed parquets #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #66 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 41 41
Lines 2216 2255 +39
=========================================
+ Hits 2216 2255 +39 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
AndreasAlbertQC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice and clear, thanks @borchero !
delsner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
Sorry, I didn't get to work on this again but I still meant to add proper support for partitioned datasets which also introduce some additional challenges at read-time 👀 I'll mark this as draft for now |
Motivation
In #57, we added support for serializing schemas to JSON. This PR now adds utility functions to serialize the schema as parquet metadata and leverage that schema when reading files.
Changes
{read,write,scan,sink}_parquetmethods to theSchemaclass