Skip to content

Clarify relationship between Quilt versioning and S3 versioning in docs #4545

@drernie

Description

@drernie

The documentation currently does not clearly explain how Quilt versioning integrates with Amazon S3 versioning. This lack of clarity makes it difficult for users to understand the guarantees Quilt provides around package updates, reproducibility, and storage.

Clarifications Needed

  1. S3 Versioned Buckets

    • State explicitly that Quilt requires using S3 buckets with versioning enabled.
    • Without this, users may mistakenly assume Quilt handles all versioning internally.
  2. Package Revisions and S3 Version IDs

    • Document that each Quilt package revision maps to a specific S3 version ID for the underlying objects.
    • Clarify that a revision “locks” the package to those object versions for reproducibility.
  3. Updating Packages

    • Provide a concrete description of how updates work.
    • Example: when a user runs aws s3 sync (or the Quilt equivalent), files are updated in place, but the package revision remains tied to fixed S3 version IDs.
  4. General Applicability

    • Emphasize that this workflow applies broadly to all datasets managed in Quilt on S3, not just specific examples.

Suggested Improvement

Add a dedicated section (or expand an existing one) to explicitly walk through:

  • Enabling versioning on the S3 bucket.
  • Creating a new Quilt package revision.
  • Showing how the revision ties back to S3 version IDs.
  • Demonstrating how updates preserve reproducibility while allowing incremental changes.

This would make the versioning model much clearer and reduce confusion for new users.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions