feat: add MEDS Parquet support and test#26
Merged
tompollard merged 2 commits intomainfrom Nov 26, 2025
Merged
Conversation
Collaborator
Author
|
FYI: MIMICIV MEDS demo taken from here: https://physionet.org/content/mimic-iv-demo-meds/0.0.1/ |
tompollard
reviewed
Nov 26, 2025
| if not file_path.exists(): | ||
| raise FileNotFoundError(f"Parquet file not found: {file_path}") | ||
|
|
||
| try: |
Member
There was a problem hiding this comment.
I'd just move these to the top.
(If pyarrow is a dependency of the package it will be installed anyway. If it is not a dependency, we probably want it to be).
Collaborator
Author
There was a problem hiding this comment.
good point, fixed!
tompollard
reviewed
Nov 26, 2025
| "num_rows": num_rows, | ||
| "num_columns": len(columns), | ||
| "columns": columns, | ||
| "sample_data": [], # Avoid reading data; schema-only for lean operation |
Member
There was a problem hiding this comment.
Might be a good idea to load sample_data here for consistency with other handlers?
Collaborator
Author
There was a problem hiding this comment.
I need it in other handlers, so I removed it here for now since its not needed here but I will perhaps refactor in a separate PR and try to also remove it in other handlers to reduce the memory load and make it more consistent
tompollard
approved these changes
Nov 26, 2025
tompollard
approved these changes
Nov 26, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Add MEDS Parquet support
Adds Parquet file handler to support MEDS (Medical Event Data Standard) datasets and other Parquet-based formats.
Changes
ParquetHandlerusing pyarrow for schema-based type inferencepyarrowdependencyValue
Enables automatic Croissant metadata generation from local Parquet datasets (e.g., MEDS event streams)
Spec Compliance
Closes #20