Skip to content

Conversation

@ilopezluna
Copy link
Contributor

@ilopezluna ilopezluna commented Oct 16, 2025

This pull request adds support for including arbitrary directories as tar archives when packaging a model, allowing users to specify one or more directories to be preserved and included in the package. It introduces a new CLI flag, implements the logic for creating and handling these tar archives, and ensures they are properly unpacked on extraction.

MODEL_RUNNER_HOST=http://localhost:13434 docker model package --safetensors-dir /Users/ilopezluna/Desktop/models/gemma-3-300m-embedding/ \
  --dir-tar 1_Pooling \
  --dir-tar 2_Dense \
  ignaciolopezluna020/embedding:dir-test --push

Summary by Sourcery

Enable packaging and unpacking of arbitrary subfolders by archiving specified directories into tar layers, integrating support end-to-end in the CLI, builder, unpack logic, documentation, and tests

New Features:

  • Add CLI flag --dir-tar to include directories as tar archives in model packaging
  • Implement CreateDirectoryTarArchive to generate tar archives of specified directories
  • Introduce Builder.WithDirTar to append directory tar layers to model artifacts
  • Extend unpack logic to extract directory tar archives during bundle unpacking

Enhancements:

  • Refactor extractTarArchive to use a streaming reader implementation

Documentation:

  • Document the new --dir-tar option in CLI reference and markdown docs

Tests:

  • Add unit tests for CreateDirectoryTarArchive covering normal, non-existent, and invalid directory cases

Copilot AI review requested due to automatic review settings October 16, 2025 12:03
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 16, 2025

Reviewer's Guide

This PR implements support for including arbitrary directories as tar archives in model packages by adding a new CLI flag, extending the packaging builder and media types, providing directory tar creation and extraction logic, updating unpack routines, and adding corresponding documentation and tests.

Sequence diagram for packaging model with subfolders using --dir-tar

sequenceDiagram
    actor User
    participant CLI
    participant Packaging
    User->>CLI: Run 'model package' with --dir-tar options
    CLI->>Packaging: CreateDirectoryTarArchive for each specified directory
    Packaging-->>CLI: Return path to tar archive
    CLI->>Packaging: WithDirTar to add tar archive as layer
    Packaging-->>CLI: Model package includes directory tar layers
    CLI-->>User: Model package created
Loading

Sequence diagram for unpacking model with subfolder tar archives

sequenceDiagram
    actor User
    participant Unpack
    participant Bundle
    User->>Unpack: Unpack model package
    Unpack->>Bundle: unpackDirTarArchives
    Bundle->>Unpack: Provide directory tar layers
    Unpack->>Bundle: extractTarArchiveFromReader for each tar layer
    Bundle-->>Unpack: Subfolders restored in model directory
    Unpack-->>User: Model unpacked with subfolders
Loading

Entity relationship diagram for model package layers including directory tar archives

erDiagram
    MODEL_PACKAGE ||--o{ LAYER : contains
    LAYER }|--|| MEDIA_TYPE : has
    MEDIA_TYPE ||--|| MEDIA_TYPE_DIR_TAR : is
    MODEL_PACKAGE {
        string id
        string name
    }
    LAYER {
        string path
        MediaType mediaType
    }
    MEDIA_TYPE {
        string type
    }
    MEDIA_TYPE_DIR_TAR {
        string type
    }
Loading

Class diagram for new and updated packaging and builder types

classDiagram
    class Builder {
        +WithConfigArchive(path string) (*Builder, error)
        +WithDirTar(path string) (*Builder, error)
    }
    class packaging {
        +CreateDirectoryTarArchive(dirPath string) (string, error)
    }
    class ModelArtifact {
        +Layers() ([]Layer, error)
    }
    class Layer {
        +MediaType() (MediaType, error)
        +Uncompressed() (io.ReadCloser, error)
    }
    class MediaTypeDirTar
    Builder --> packaging : uses
    Builder --> ModelArtifact : appends Layer
    ModelArtifact --> Layer : contains
    Layer --> MediaTypeDirTar : MediaType
    MediaTypeDirTar : application/vnd.docker.ai.dir.tar
Loading

File-Level Changes

Change Details Files
Add CLI flag and processing for directory tars
  • Register --dir-tar option in the package command
  • Resolve relative paths against the base model directory
  • Call CreateDirectoryTarArchive for each specified directory
  • Attach tar archives to the package via WithDirTar and defer temp file cleanup
cmd/cli/commands/package.go
Extend builder and media types to support directory tar layers
  • Introduce WithDirTar method to append DirTar layers
  • Define new MediaTypeDirTar constant for directory tar archives
pkg/distribution/builder/builder.go
pkg/distribution/types/config.go
Implement directory tar creation and add tests
  • Create CreateDirectoryTarArchive to walk a directory and write tar entries
  • Add unit tests for valid packaging and error cases (non-existent path, non-directory)
pkg/distribution/packaging/dirtar.go
pkg/distribution/packaging/dirtar_test.go
Extend unpack logic to extract directory tar layers
  • Add unpackDirTarArchives to iterate and extract DirTar layers
  • Implement streaming extraction via extractTarArchiveFromReader
  • Refactor extractTarArchive to delegate to the reader-based version
pkg/distribution/internal/bundle/unpack.go
Update CLI documentation for the new option
  • Document --dir-tar in the Docker model package YAML reference
  • Add entry for --dir-tar in the Markdown CLI guide
cmd/cli/docs/reference/docker_model_package.yaml
cmd/cli/docs/reference/model_package.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for packaging arbitrary subdirectories as tar archives and including them as layers in the model artifact, then unpacking them into the bundle on extraction.

  • Introduces a new media type for directory tar layers.
  • Adds CLI flag --dir-tar to include one or more directories (relative to the model base) as tars.
  • Implements creation, addition, and unpacking of these tar layers, plus unit tests for tar creation.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pkg/distribution/types/config.go Adds MediaTypeDirTar to represent directory tar layers.
pkg/distribution/packaging/safetensors.go Adds CreateDirectoryTarArchive utility to create tars from directories.
pkg/distribution/packaging/dirtar_test.go Tests the creation and contents of directory tar archives and error cases.
pkg/distribution/internal/bundle/unpack.go Adds unpackDirTarArchives and calls it during bundle unpack to extract dir tar layers.
pkg/distribution/builder/builder.go Adds Builder.WithDirTar to append dir tar layers to the model artifact.
cmd/cli/commands/package.go Adds --dir-tar flag and logic to create tars and attach them to the build, with temp-file cleanup.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 16, 2025 12:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ilopezluna ilopezluna changed the title [WIP] Package subfolders Package subfolders Oct 16, 2025
@ilopezluna ilopezluna marked this pull request as ready for review October 16, 2025 12:35
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Add unit or integration tests for unpackDirTarArchives to verify that directory tar layers are correctly extracted during model unpacking.
  • In unpackDirTarArchives, defer closing the uncompressed reader right after calling layer.Uncompressed() to simplify cleanup and avoid potential resource leaks.
  • Consolidate extractTarArchive and extractTarArchiveFromReader to remove duplicate code and clarify the extraction API.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Add unit or integration tests for unpackDirTarArchives to verify that directory tar layers are correctly extracted during model unpacking.
- In unpackDirTarArchives, defer closing the uncompressed reader right after calling layer.Uncompressed() to simplify cleanup and avoid potential resource leaks.
- Consolidate extractTarArchive and extractTarArchiveFromReader to remove duplicate code and clarify the extraction API.

## Individual Comments

### Comment 1
<location> `pkg/distribution/packaging/dirtar.go:43-44` </location>
<code_context>
+	tw := tar.NewWriter(tmpFile)
+
+	// Walk the directory tree
+	err = filepath.Walk(dirPath, func(path string, info os.FileInfo, err error) error {
+		// Skip symlinks - they're not needed for model distribution and are
+		// skipped during extraction for security reasons
+		if info.Mode()&os.ModeSymlink != 0 {
</code_context>

<issue_to_address>
**issue (bug_risk):** Symlink skipping logic may not handle broken symlinks correctly.

Check for errors before accessing info to prevent panics with broken symlinks, and handle them explicitly to ensure robust packaging.
</issue_to_address>

### Comment 2
<location> `pkg/distribution/packaging/dirtar.go:74-75` </location>
<code_context>
+		}
+
+		// If it's a file, write its contents
+		if !info.IsDir() {
+			file, err := os.Open(path)
+			if err != nil {
+				return fmt.Errorf("open file %s: %w", path, err)
</code_context>

<issue_to_address>
**issue (bug_risk):** Deferring file.Close() in a loop can lead to resource exhaustion.

Explicitly close each file after processing to avoid exhausting file descriptors when handling many files.
</issue_to_address>

### Comment 3
<location> `pkg/distribution/internal/bundle/unpack.go:254-259` </location>
<code_context>
+
+	// Iterate through layers and extract directory tar archives
+	for _, layer := range layers {
+		mediaType, err := layer.MediaType()
+		if err != nil {
+			continue
+		}
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Silent continue on MediaType error may mask underlying issues.

Consider logging or reporting the error from layer.MediaType() to aid in debugging malformed layers.

```suggestion
	// Iterate through layers and extract directory tar archives
	for _, layer := range layers {
		mediaType, err := layer.MediaType()
		if err != nil {
			log.Printf("error getting media type for layer: %v", err)
			continue
		}
```
</issue_to_address>

### Comment 4
<location> `pkg/distribution/internal/bundle/unpack.go:273-274` </location>
<code_context>
+		}
+
+		// Stream directly to tar extraction - no temp file needed
+		if err := extractTarArchiveFromReader(uncompressed, modelDir); err != nil {
+			uncompressed.Close()
+			return fmt.Errorf("extract directory tar archive: %w", err)
+		}
</code_context>

<issue_to_address>
**issue (bug_risk):** Potential resource leak if extractTarArchiveFromReader returns nil error.

Consider using defer to close uncompressed, ensuring the resource is released even if an early return or panic occurs.
</issue_to_address>

### Comment 5
<location> `pkg/distribution/packaging/dirtar_test.go:11` </location>
<code_context>
+	"testing"
+)
+
+func TestCreateDirTarArchive(t *testing.T) {
+	// Create a temporary directory with some test files
+	tempDir, err := os.MkdirTemp("", "dirtar-test-*")
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty directories.

Please add a test to confirm that empty directories are archived and extracted correctly, preserving their structure.

Suggested implementation:

```golang
func TestCreateDirTarArchive(t *testing.T) {
	// Create a temporary directory with some test files
	tempDir, err := os.MkdirTemp("", "dirtar-test-*")
	if err != nil {
		t.Fatalf("Failed to create temp dir: %v", err)
	}
	defer os.RemoveAll(tempDir)

	// Create test directory structure
	testDir := filepath.Join(tempDir, "test_directory")
	if err := os.MkdirAll(filepath.Join(testDir, "subdir"), 0755); err != nil {
		t.Fatalf("Failed to create test directory: %v", err)
	}

	// Test archiving and extracting an empty directory
	t.Run("empty directory is archived and extracted correctly", func(t *testing.T) {
		emptyDir := filepath.Join(tempDir, "empty_dir")
		if err := os.Mkdir(emptyDir, 0755); err != nil {
			t.Fatalf("Failed to create empty directory: %v", err)
		}

		// Create tar archive of the empty directory
		tarPath := filepath.Join(tempDir, "empty_dir.tar")
		tarFile, err := os.Create(tarPath)
		if err != nil {
			t.Fatalf("Failed to create tar file: %v", err)
		}
		defer tarFile.Close()

		if err := CreateDirTarArchive(emptyDir, tarFile); err != nil {
			t.Fatalf("Failed to archive empty directory: %v", err)
		}
		tarFile.Close()

		// Extract the tar archive to a new location
		extractDir := filepath.Join(tempDir, "extracted_empty_dir")
		if err := os.Mkdir(extractDir, 0755); err != nil {
			t.Fatalf("Failed to create extraction directory: %v", err)
		}

		tarFileRead, err := os.Open(tarPath)
		if err != nil {
			t.Fatalf("Failed to open tar file for reading: %v", err)
		}
		defer tarFileRead.Close()

		if err := ExtractTarArchive(tarFileRead, extractDir); err != nil {
			t.Fatalf("Failed to extract tar archive: %v", err)
		}

		// Check that the empty directory exists and is empty
		extractedEmptyDir := filepath.Join(extractDir, filepath.Base(emptyDir))
		info, err := os.Stat(extractedEmptyDir)
		if err != nil {
			t.Fatalf("Extracted empty directory does not exist: %v", err)
		}
		if !info.IsDir() {
			t.Fatalf("Extracted path is not a directory")
		}
		entries, err := os.ReadDir(extractedEmptyDir)
		if err != nil {
			t.Fatalf("Failed to read extracted empty directory: %v", err)
		}
		if len(entries) != 0 {
			t.Fatalf("Extracted empty directory is not empty")
		}
	})

```

This change assumes the existence of `CreateDirTarArchive` and `ExtractTarArchive` functions in your codebase. If their signatures differ or they do not exist, you will need to adjust the calls accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ilopezluna ilopezluna requested a review from p1-0tr October 16, 2025 13:17
Copilot AI review requested due to automatic review settings October 16, 2025 17:20
ilopezluna and others added 2 commits October 16, 2025 19:28
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings October 16, 2025 17:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 17, 2025 07:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +110 to +119
// Validate dir-tar paths are relative (not absolute)
for _, dirPath := range opts.dirTarPaths {
if filepath.IsAbs(dirPath) {
return fmt.Errorf(
"dir-tar path must be relative, got absolute path: %s\n\n"+
"See 'docker model package --help' for more information",
dirPath,
)
}
}
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Absolute-path validation for --dir-tar is duplicated here and again during processing. Consider centralizing this check (e.g., in PreRunE only or a small helper) to avoid drift and keep validation in one place.

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +253
for _, relDirPath := range opts.dirTarPaths {
// Reject absolute paths
if filepath.IsAbs(relDirPath) {
return fmt.Errorf("dir-tar path must be relative: %s", relDirPath)
}
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This re-enforces the absolute-path check already performed in PreRunE. To reduce duplication, either rely on the earlier validation or extract a shared helper used by both call sites.

Copilot uses AI. Check for mistakes.
Comment on lines +260 to +266
relPath, err := filepath.Rel(baseDir, fullDirPath)
if err != nil {
return fmt.Errorf("dir-tar path %q could not be validated: %w", relDirPath, err)
}
// Check if the relative path tries to escape the base directory
if relPath == ".." || len(relPath) >= 3 && relPath[:3] == ".."+string(filepath.Separator) {
return fmt.Errorf("dir-tar path %q escapes base directory", relDirPath)
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The manual prefix check for '..' is a bit brittle to read. Consider using strings.HasPrefix(relPath, ".."+string(filepath.Separator)) or extracting a small helper for clarity and consistency with similar validations.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings October 17, 2025 07:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

c.Flags().StringVar(&opts.safetensorsDir, "safetensors-dir", "", "absolute path to directory containing safetensors files and config")
c.Flags().StringVar(&opts.chatTemplatePath, "chat-template", "", "absolute path to chat template file (must be Jinja format)")
c.Flags().StringArrayVarP(&opts.licensePaths, "license", "l", nil, "absolute path to a license file")
c.Flags().StringArrayVar(&opts.dirTarPaths, "dir-tar", nil, "relative path to directory to package as tar (can be specified multiple times)")
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify what 'relative' is relative to. Suggest: 'relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times'.

Suggested change
c.Flags().StringArrayVar(&opts.dirTarPaths, "dir-tar", nil, "relative path to directory to package as tar (can be specified multiple times)")
c.Flags().StringArrayVar(&opts.dirTarPaths, "dir-tar", nil, "relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times")

Copilot uses AI. Check for mistakes.
|:--------------------|:--------------|:--------|:---------------------------------------------------------------------------------------|
| `--chat-template` | `string` | | absolute path to chat template file (must be Jinja format) |
| `--context-size` | `uint64` | `0` | context size in tokens |
| `--dir-tar` | `stringArray` | | relative path to directory to package as tar (can be specified multiple times) |
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text doesn't specify what the path is relative to. Suggest: 'relative (to --safetensors-dir, or to the directory containing --gguf) path to a directory to package as a tar archive; can be specified multiple times'.

Suggested change
| `--dir-tar` | `stringArray` | | relative path to directory to package as tar (can be specified multiple times) |
| `--dir-tar` | `stringArray` | | relative (to --safetensors-dir, or to the directory containing --gguf) path to a directory to package as a tar archive; can be specified multiple times |

Copilot uses AI. Check for mistakes.
value_type: stringArray
default_value: '[]'
description: |
relative path to directory to package as tar (can be specified multiple times)
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify the base for the relative path. Suggest:\n'relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times'.

Suggested change
relative path to directory to package as tar (can be specified multiple times)
relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +116
// WithDirTar adds a directory tar archive to the artifact.
// Multiple directory tar archives can be added by calling this method multiple times.
func (b *Builder) WithDirTar(path string) (*Builder, error) {
dirTarLayer, err := partial.NewLayer(path, types.MediaTypeDirTar)
if err != nil {
return nil, fmt.Errorf("dir tar layer from %q: %w", path, err)
}
return &Builder{
model: mutate.AppendLayers(b.model, dirTarLayer),
}, nil
}

Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The method name WithDirTar suggests it accepts a directory path, but it actually expects a tar file path. To avoid confusion for API consumers, either rename to WithDirTarArchive(path string) or accept a directory path and perform the archiving internally.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ilopezluna ilopezluna merged commit bc843ca into main Oct 17, 2025
9 checks passed
@ilopezluna ilopezluna deleted the package-subfolders branch October 17, 2025 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants