Skip to content

Edawson/scdl schema#1030

Merged
polinabinder1 merged 44 commits intomainfrom
edawson/scdl-schema
Aug 20, 2025
Merged

Edawson/scdl schema#1030
polinabinder1 merged 44 commits intomainfrom
edawson/scdl-schema

Conversation

@edawson
Copy link
Collaborator

@edawson edawson commented Aug 8, 2025

Description

This MR implements a strict schema-defined header for SCDL archives. This header stores metadata about the archive and its composite arrays, including a version, the array lengths and data types, and information about the RowFeatureIndexes. This adds the features necessary to fix #999 as well as implement simple bit-packing of the rowptr, colptr, and data arrays. It also should make SCDL more secure, enable strict compatibility checking, and open the door to more performance improvements.

Note: I am still wiring up the header to the archive. I will make a note here when the MR is ready.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

This change is opaque to the user - the headers are not human-readable on disk. For a full description of the format and how to interact with it, see the schema directory in SCDL's source directory.

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Eric T. Dawson added 2 commits August 14, 2025 15:10
Signed-off-by: Eric T. Dawson <edawson@nvidia.com>
Signed-off-by: Eric T. Dawson <edawson@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
@yzhang123 yzhang123 removed this pull request from the merge queue due to a manual request Aug 19, 2025
@polinabinder1
Copy link
Collaborator

/ok to test a720862

@polinabinder1
Copy link
Collaborator

/ok to test 9719cee

@polinabinder1 polinabinder1 added this pull request to the merge queue Aug 19, 2025
@yzhang123 yzhang123 removed this pull request from the merge queue due to a manual request Aug 19, 2025
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
@polinabinder1
Copy link
Collaborator

/ok to test 6698414

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
@yzhang123
Copy link
Collaborator

/ok to test 33d6fa8

@yzhang123 yzhang123 removed the SKIP_CI label Aug 19, 2025
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Collaborator

/ok to test 74af02e

@polinabinder1
Copy link
Collaborator

/ok to test e4acd0f

@polinabinder1 polinabinder1 added this pull request to the merge queue Aug 20, 2025
Merged via the queue into main with commit 2956b15 Aug 20, 2025
14 checks passed
@polinabinder1 polinabinder1 deleted the edawson/scdl-schema branch August 20, 2025 04:59
@polinabinder1 polinabinder1 restored the edawson/scdl-schema branch August 20, 2025 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Duplicate parquet files when concatenating 10 or more h5ads

6 participants