Skip to content

feat: support build the model artifact by raw file format#188

Merged
chlins merged 1 commit intomainfrom
feat/raw
Jun 4, 2025
Merged

feat: support build the model artifact by raw file format#188
chlins merged 1 commit intomainfrom
feat/raw

Conversation

@chlins
Copy link
Member

@chlins chlins commented May 27, 2025

This pull request introduces several enhancements and fixes across multiple components, including improved metadata handling, better file permission management, and the addition of a new "raw" build mode. The changes also include updates to dependencies and test coverage improvements.

Build process enhancements:

  • Added a new Raw flag to the Build configuration, allowing model artifact layers to be built in raw format. This includes corresponding updates to cmd/build.go to expose the flag via CLI. (cmd/build.go: [1] pkg/config/build.go: [2] [3]
  • Updated getProcessors to support the new Raw flag, modifying media types based on the flag's state. (pkg/backend/build.go: [1] [2]

Metadata handling improvements:

  • Introduced a getFileMetadata function to retrieve and store file metadata (e.g., permissions, modification time, UID/GID) during the build process. This metadata is serialized into annotations for OCI descriptors. (pkg/backend/build/builder.go: [1] [2]
  • Enhanced the raw codec to restore file metadata during decoding, ensuring that file attributes are preserved. (pkg/codec/raw.go: [1] [2]

File permission and directory handling:

  • Improved directory handling in the Untar function by explicitly setting permissions and modification times after directory creation. (pkg/archiver/archiver.go: pkg/archiver/archiver.goL164-R174)
  • Fixed a potential issue with file handling by ensuring defer file.Close() is used consistently. (pkg/archiver/archiver.go: pkg/archiver/archiver.goR185-R197)

Dependency updates:

  • Upgraded the github.com/CloudNativeAI/model-spec dependency from v0.0.3 to v0.0.5 to support new media types and annotations. (go.mod: go.modL6-R6)

Test coverage improvements:

Summary by CodeRabbit

  • New Features

    • Added a new option to build model artifacts in raw format.
    • Build layer descriptors now include detailed file metadata, such as permissions, size, modification time, and ownership.
  • Improvements

    • Enhanced extraction to preserve file permissions and timestamps for both files and directories.
    • Updated build command defaults to reflect current configuration values.
    • Refined archive extraction by removing symlink support for improved security.
  • Dependency Updates

    • Upgraded a core dependency for improved compatibility and features.
  • Bug Fixes

    • Corrected a typo in test data to ensure accurate model inspection.
  • Tests

    • Added comprehensive tests for file metadata extraction and validation.

@coderabbitai
Copy link

coderabbitai bot commented May 27, 2025

Walkthrough

The changes introduce a new "raw" build mode, reflected in configuration, command-line flags, and processing logic. File metadata is now collected during build layer creation and stored in descriptor annotations. The decoding interfaces and implementations are updated to accept and utilize this metadata to restore file attributes. Tests are added for the metadata extraction logic. Symlink extraction was removed from untar logic. Dependency versions were updated and a typo in test data was fixed.

Changes

File(s) Change Summary
cmd/build.go Default flag values for "target" and "modelfile" now use buildConfig fields; added a new "raw" boolean flag to build command, bound to buildConfig.Raw.
go.mod Updated github.com/CloudNativeAI/model-spec dependency from v0.0.3 to v0.0.5.
pkg/archiver/archiver.go Modified Untar to set permissions and timestamps on files/directories; removed symlink extraction and related helper function.
pkg/backend/build.go getProcessors now accepts build config and conditionally selects media types based on the new Raw flag.
pkg/backend/build/builder.go BuildLayer collects file metadata and stores it in descriptor annotations; added getFileMetadata helper function.
pkg/backend/build/builder_test.go Added tests and helpers for getFileMetadata, verifying metadata extraction including UID/GID on Unix.
pkg/backend/build_test.go Updated TestGetProcessors to pass build config to getProcessors.
pkg/backend/extract.go Updated codec.Decode call: reordered parameters and added descriptor argument.
pkg/codec/codec.go Changed Codec.Decode interface: reordered parameters, added descriptor argument.
pkg/codec/raw.go Updated Decode signature; restores file permissions and timestamps from descriptor metadata after extraction.
pkg/codec/tar.go Updated Decode signature to match new interface; implementation unchanged.
pkg/config/build.go Added Raw field to Build struct; constructor initializes Raw and explicitly sets SourceURL and SourceRevision.
pkg/backend/inspect_test.go Fixed typo in JSON test data key from "puantization" to "quantization".

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant BuildConfig
    participant Backend
    participant Builder
    participant Descriptor
    participant Codec

    User->>CLI: Run build command (with --raw flag)
    CLI->>BuildConfig: Set Raw = true/false
    CLI->>Backend: Start Build(cfg)
    Backend->>Builder: BuildLayer(..., cfg)
    Builder->>Descriptor: Collect file metadata
    Builder->>Descriptor: Annotate with metadata
    Backend->>Codec: Decode(outputDir, filePath, reader, desc)
    Codec->>Descriptor: Read metadata
    Codec->>FileSystem: Restore permissions/timestamps
Loading

Suggested reviewers

  • gaius-qi

Poem

In burrows deep, where models grow,
A "raw" new flag now helps us know—
File modes and times, UID, GID,
Are tucked in layers, never hid.
With metadata snug and neat,
Our builds are now complete!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d7f5a7 and 5b65065.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • cmd/build.go (1 hunks)
  • go.mod (1 hunks)
  • pkg/archiver/archiver.go (2 hunks)
  • pkg/backend/build.go (2 hunks)
  • pkg/backend/build/builder.go (3 hunks)
  • pkg/backend/build/builder_test.go (3 hunks)
  • pkg/backend/build_test.go (2 hunks)
  • pkg/backend/extract.go (1 hunks)
  • pkg/backend/inspect_test.go (1 hunks)
  • pkg/codec/codec.go (2 hunks)
  • pkg/codec/raw.go (3 hunks)
  • pkg/codec/tar.go (2 hunks)
  • pkg/config/build.go (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • go.mod
  • pkg/backend/inspect_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • pkg/backend/extract.go
  • pkg/config/build.go
  • pkg/backend/build_test.go
  • pkg/codec/codec.go
  • pkg/codec/raw.go
  • pkg/codec/tar.go
  • cmd/build.go
  • pkg/backend/build/builder.go
  • pkg/backend/build.go
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Analyze (go)
  • GitHub Check: Lint
  • GitHub Check: Test
🔇 Additional comments (4)
pkg/backend/build/builder_test.go (4)

25-25: LGTM: Appropriate imports for platform-specific testing.

The added runtime and syscall imports are necessary for the new cross-platform file metadata testing functionality.

Also applies to: 28-28


131-133: Good fix for test assertion reliability.

Comparing individual fields instead of the entire struct helps avoid potential flaky test behavior due to unexported fields or implementation details that might differ between test runs.


294-310: Well-implemented test helper functions.

The helper functions follow Go testing best practices:

  • Proper use of t.Helper() for better error reporting
  • Comprehensive error checking
  • Clear and focused functionality

312-371: Excellent comprehensive test coverage for getFileMetadata.

The test implementation demonstrates several strengths:

  • Tests both regular files and directories
  • Validates all metadata fields (name, size, mode, modification time, type flag)
  • Includes platform-specific testing for UID/GID with appropriate fallbacks for Windows
  • Uses proper time comparison with reasonable tolerance
  • Clear test structure and good error messages
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
pkg/codec/raw.go (1)

67-73: Consider adding error context for JSON unmarshaling.

The JSON unmarshaling error could benefit from additional context to help with debugging metadata issues.

-		if err := json.Unmarshal([]byte(fm), &fileMetadata); err != nil {
-			return err
-		}
+		if err := json.Unmarshal([]byte(fm), &fileMetadata); err != nil {
+			return fmt.Errorf("failed to unmarshal file metadata: %w", err)
+		}

You'll need to import fmt if not already imported.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0be2a48 and 2d9aab0.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • cmd/build.go (1 hunks)
  • go.mod (1 hunks)
  • pkg/archiver/archiver.go (2 hunks)
  • pkg/backend/build.go (2 hunks)
  • pkg/backend/build/builder.go (3 hunks)
  • pkg/backend/build/builder_test.go (2 hunks)
  • pkg/backend/build_test.go (2 hunks)
  • pkg/backend/extract.go (1 hunks)
  • pkg/codec/codec.go (2 hunks)
  • pkg/codec/raw.go (3 hunks)
  • pkg/codec/tar.go (2 hunks)
  • pkg/config/build.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
cmd/build.go (1)
pkg/codec/codec.go (1)
  • Raw (31-31)
pkg/config/build.go (1)
pkg/codec/codec.go (1)
  • Raw (31-31)
pkg/backend/build.go (7)
pkg/modelfile/modelfile.go (1)
  • Modelfile (37-86)
pkg/config/build.go (1)
  • Build (26-37)
pkg/codec/codec.go (1)
  • Raw (31-31)
pkg/backend/processor/model_config.go (1)
  • NewModelConfigProcessor (33-42)
pkg/backend/processor/model.go (1)
  • NewModelProcessor (33-42)
pkg/backend/processor/code.go (1)
  • NewCodeProcessor (33-42)
pkg/backend/processor/doc.go (1)
  • NewDocProcessor (33-42)
pkg/backend/build_test.go (1)
pkg/config/build.go (1)
  • Build (26-37)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Analyze (go)
  • GitHub Check: Test
  • GitHub Check: Lint
🔇 Additional comments (22)
go.mod (1)

6-6:

Details

❓ Verification inconclusive

Verify the dependency upgrade supports the new features.

The version bump from v0.0.3 to v0.0.5 aligns with the PR objectives to support new media types and annotations for metadata handling.


🏁 Script executed:

#!/bin/bash
# Description: Verify the model-spec v0.0.5 exists and check for any security advisories

# Check if the version exists on GitHub
gh api repos/CloudNativeAI/model-spec/releases | jq '.[] | select(.tag_name == "v0.0.5")'

# Check for any recent security advisories
gh api graphql -f query='
{
  repository(owner: "CloudNativeAI", name: "model-spec") {
    vulnerabilityAlerts(first: 10) {
      nodes {
        createdAt
        vulnerableManifestPath
        securityVulnerability {
          advisory {
            summary
            severity
          }
        }
      }
    }
  }
}'

Length of output: 478


#!/bin/bash

Description: Verify the v0.0.5 tag exists and retrieve its release notes

Check for the existence of the v0.0.5 tag

gh api repos/CloudNativeAI/model-spec/git/refs/tags/v0.0.5

Fetch release notes for v0.0.5 (if a release was created)

gh api repos/CloudNativeAI/model-spec/releases/tags/v0.0.5 --jq .body


🏁 Script executed:

#!/bin/bash
# Description: Confirm the v0.0.5 tag exists and retrieve its release notes for CloudNativeAI/model-spec

# Check if the tag v0.0.5 exists
echo "=== Checking for git tag v0.0.5 ==="
gh api repos/CloudNativeAI/model-spec/git/refs/tags/v0.0.5

# If the tag exists, fetch the associated release notes (if any)
echo -e "\n=== Fetching release notes for v0.0.5 ==="
gh api repos/CloudNativeAI/model-spec/releases/tags/v0.0.5 --jq '.body // "No release notes found."'

Length of output: 850


Missing release notes for model-spec v0.0.5

The tag v0.0.5 exists and there are no outstanding vulnerability alerts, but GitHub returns a 404 for the release notes. Please manually verify that this version includes the new media‐type support and Raw-flag functionality—e.g. by reviewing the CHANGELOG.md or the commit diff between v0.0.3 and v0.0.5 in the CloudNativeAI/model-spec repo—before approving the upgrade.

pkg/backend/build_test.go (1)

22-22: Import addition looks good.

The config package import is correctly added to support the new function signature.

pkg/backend/extract.go (1)

101-101:

Details

✅ Verification successful

Interface update correctly implemented.

The addition of the desc parameter to codec.Decode enables metadata restoration from descriptor annotations, which aligns with the PR's metadata handling improvements.

Verify that all codec implementations have been updated to match this interface:


🏁 Script executed:

#!/bin/bash
# Description: Verify all codec implementations have the updated Decode signature
# Expected: All Decode methods should accept (outputDir, filepath string, reader io.Reader, desc ocispec.Descriptor)

ast-grep --pattern $'func ($_, $_) Decode($$$) error {
  $$$
}'

Length of output: 64


🏁 Script executed:

#!/bin/bash
# Search for all Decode method implementations to confirm updated signature
rg -n --context 2 "^func .* Decode\(" -n pkg/backend/codec

Length of output: 118


🏁 Script executed:

#!/bin/bash
# Search the entire repo for Decode method implementations to verify the signature update
rg -n "^func .*Decode\(" -n .

Length of output: 275


All Decode implementations are updated with the new desc parameter.

Verified that both codec implementations use the updated signature:

  • pkg/codec/raw.go:48
  • pkg/codec/tar.go:46

No further action required.

cmd/build.go (2)

55-56: Good improvement using config values instead of hardcoded strings.

Using buildConfig.Target and buildConfig.Modelfile instead of hardcoded strings ensures consistency between the flag defaults and the actual configuration values.


64-64: Raw flag correctly implemented.

The new --raw flag is properly integrated with the build configuration and includes a clear description. The default value of false maintains backward compatibility.

pkg/config/build.go (2)

36-36: LGTM: Clean addition of Raw flag to configuration.

The new Raw boolean field is appropriately added to the Build struct and follows the existing naming conventions.


39-52: LGTM: Explicit field initialization improves code clarity.

The explicit initialization of all fields in the NewBuild constructor is a good practice that improves code readability and maintainability. The Raw field defaulting to false ensures backward compatibility.

pkg/codec/raw.go (2)

48-48: LGTM: Interface signature update is consistent.

The method signature change aligns with the updated Codec interface, maintaining consistency across implementations.


86-90: LGTM: Modification time restoration is correctly implemented.

The restoration of modification time using os.Chtimes is appropriate and handles the zero-time check correctly.

pkg/codec/tar.go (1)

46-50: LGTM: Interface signature update maintains consistency.

The method signature change correctly aligns with the updated Codec interface. It's appropriate that the tar implementation ignores the desc parameter since tar files contain their own metadata internally.

pkg/codec/codec.go (1)

46-46:

Details

✅ Verification successful

LGTM: Interface signature enhancement supports metadata handling.

The updated Decode method signature appropriately adds the OCI descriptor parameter to enable metadata-aware decoding, and the parameter reordering is logical.

Please verify that all implementations of the Codec interface have been updated to match this new signature:


🏁 Script executed:

#!/bin/bash
# Description: Verify all Codec interface implementations match the new Decode signature
# Expected: All implementations should have the new signature with desc parameter

echo "Searching for Decode method implementations..."
ast-grep --pattern $'func ($_ *$_) Decode($$$) error {
  $$$
}'

echo -e "\nSearching for potential old signature usage..."
rg -A 2 "Decode.*io\.Reader.*string" --type go

Length of output: 2780


All Codec implementations now use the updated Decode signature

  • pkg/codec/tar.go
  • pkg/codec/raw.go

No occurrences of the old method signature remain. Ready to merge.

pkg/backend/build.go (3)

93-93: LGTM: Method call updated correctly.

The call to getProcessors has been properly updated to pass the build configuration parameter, which aligns with the updated method signature.


157-157: LGTM: Method signature updated appropriately.

The function signature has been correctly updated to accept the build configuration parameter, enabling the conditional media type selection logic.


161-165: LGTM: Raw media type selection implemented consistently.

The conditional logic for selecting raw vs standard media types is well-implemented and follows a consistent pattern across all processor types (config, model, code, doc). The logic correctly checks the cfg.Raw flag and selects the appropriate media type variant.

Also applies to: 169-173, 177-181, 185-189

pkg/archiver/archiver.go (2)

164-174: LGTM: Directory metadata preservation implemented correctly.

The code properly creates the directory with initial permissions and then explicitly sets the correct permissions and modification time from the tar header. This ensures directory attributes are preserved during extraction.


185-185: LGTM: Improved resource management.

Adding defer file.Close() immediately after file creation ensures proper resource cleanup even if subsequent operations fail.

pkg/backend/build/builder_test.go (3)

25-25: LGTM: Required imports added.

The runtime and syscall imports are appropriately added to support platform-specific testing logic for UID/GID handling.

Also applies to: 28-28


292-308: LGTM: Well-designed helper functions.

The helper functions createTempFile and createTempDir are well-implemented with proper error handling and use of t.Helper() for better test failure reporting.


310-369: LGTM: Comprehensive test coverage for file metadata.

The test thoroughly validates the getFileMetadata function for both regular files and directories. Key strengths:

  1. Tests all metadata fields (name, size, mode, typeflag, modtime)
  2. Handles platform-specific UID/GID testing appropriately
  3. Uses proper time comparison with tolerance
  4. Compares against ground truth from os.Stat()
  5. Includes helpful logging for debugging on non-Windows systems

The test design ensures robust validation of the metadata extraction functionality.

pkg/backend/build/builder.go (3)

23-23: LGTM: Required imports added.

The errors and syscall imports are appropriately added to support the new file metadata extraction functionality.

Also applies to: 29-29


209-225: LGTM: File metadata annotation implementation.

The metadata collection and annotation logic is well-implemented:

  1. Metadata is retrieved after successful layer processing
  2. Proper error handling for metadata extraction and JSON marshaling
  3. Annotations map is safely initialized if nil
  4. Uses the standard annotation key from modelspec

The metadata will be available for downstream processing like extraction and decoding.


329-361: LGTM: Robust file metadata extraction function.

The getFileMetadata function is well-designed with several strengths:

  1. Comprehensive metadata collection: Captures name, permissions, size, modification time, and type
  2. Proper type handling: Correctly identifies regular files (0), directories (5), and symlinks (2) using standard tar typeflag values
  3. Platform awareness: Safely handles Unix-specific UID/GID extraction with type assertion
  4. Error handling: Returns appropriate errors for stat failures and unknown file types
  5. Security consideration: Only handles known file types, rejecting unknown ones

The implementation aligns well with tar header semantics and provides the necessary metadata for file attribute preservation during extraction.

@chlins chlins added the enhancement New feature or request label May 27, 2025
@chlins chlins force-pushed the feat/raw branch 2 times, most recently from dca0901 to 5430592 Compare May 29, 2025 06:22
aftersnow
aftersnow previously approved these changes Jun 3, 2025
Copy link
Contributor

@aftersnow aftersnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: chlins <chlins.zhang@gmail.com>
Copy link
Contributor

@BraveY BraveY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chlins chlins merged commit 9adffa9 into main Jun 4, 2025
6 checks passed
@chlins chlins deleted the feat/raw branch June 4, 2025 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants