-
Notifications
You must be signed in to change notification settings - Fork 79
Package subfolders #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package subfolders #249
Conversation
Reviewer's GuideThis PR implements support for including arbitrary directories as tar archives in model packages by adding a new CLI flag, extending the packaging builder and media types, providing directory tar creation and extraction logic, updating unpack routines, and adding corresponding documentation and tests. Sequence diagram for packaging model with subfolders using --dir-tarsequenceDiagram
actor User
participant CLI
participant Packaging
User->>CLI: Run 'model package' with --dir-tar options
CLI->>Packaging: CreateDirectoryTarArchive for each specified directory
Packaging-->>CLI: Return path to tar archive
CLI->>Packaging: WithDirTar to add tar archive as layer
Packaging-->>CLI: Model package includes directory tar layers
CLI-->>User: Model package created
Sequence diagram for unpacking model with subfolder tar archivessequenceDiagram
actor User
participant Unpack
participant Bundle
User->>Unpack: Unpack model package
Unpack->>Bundle: unpackDirTarArchives
Bundle->>Unpack: Provide directory tar layers
Unpack->>Bundle: extractTarArchiveFromReader for each tar layer
Bundle-->>Unpack: Subfolders restored in model directory
Unpack-->>User: Model unpacked with subfolders
Entity relationship diagram for model package layers including directory tar archiveserDiagram
MODEL_PACKAGE ||--o{ LAYER : contains
LAYER }|--|| MEDIA_TYPE : has
MEDIA_TYPE ||--|| MEDIA_TYPE_DIR_TAR : is
MODEL_PACKAGE {
string id
string name
}
LAYER {
string path
MediaType mediaType
}
MEDIA_TYPE {
string type
}
MEDIA_TYPE_DIR_TAR {
string type
}
Class diagram for new and updated packaging and builder typesclassDiagram
class Builder {
+WithConfigArchive(path string) (*Builder, error)
+WithDirTar(path string) (*Builder, error)
}
class packaging {
+CreateDirectoryTarArchive(dirPath string) (string, error)
}
class ModelArtifact {
+Layers() ([]Layer, error)
}
class Layer {
+MediaType() (MediaType, error)
+Uncompressed() (io.ReadCloser, error)
}
class MediaTypeDirTar
Builder --> packaging : uses
Builder --> ModelArtifact : appends Layer
ModelArtifact --> Layer : contains
Layer --> MediaTypeDirTar : MediaType
MediaTypeDirTar : application/vnd.docker.ai.dir.tar
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for packaging arbitrary subdirectories as tar archives and including them as layers in the model artifact, then unpacking them into the bundle on extraction.
- Introduces a new media type for directory tar layers.
- Adds CLI flag --dir-tar to include one or more directories (relative to the model base) as tars.
- Implements creation, addition, and unpacking of these tar layers, plus unit tests for tar creation.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/distribution/types/config.go | Adds MediaTypeDirTar to represent directory tar layers. |
| pkg/distribution/packaging/safetensors.go | Adds CreateDirectoryTarArchive utility to create tars from directories. |
| pkg/distribution/packaging/dirtar_test.go | Tests the creation and contents of directory tar archives and error cases. |
| pkg/distribution/internal/bundle/unpack.go | Adds unpackDirTarArchives and calls it during bundle unpack to extract dir tar layers. |
| pkg/distribution/builder/builder.go | Adds Builder.WithDirTar to append dir tar layers to the model artifact. |
| cmd/cli/commands/package.go | Adds --dir-tar flag and logic to create tars and attach them to the build, with temp-file cleanup. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…ating tar archives from directories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Add unit or integration tests for unpackDirTarArchives to verify that directory tar layers are correctly extracted during model unpacking.
- In unpackDirTarArchives, defer closing the uncompressed reader right after calling layer.Uncompressed() to simplify cleanup and avoid potential resource leaks.
- Consolidate extractTarArchive and extractTarArchiveFromReader to remove duplicate code and clarify the extraction API.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add unit or integration tests for unpackDirTarArchives to verify that directory tar layers are correctly extracted during model unpacking.
- In unpackDirTarArchives, defer closing the uncompressed reader right after calling layer.Uncompressed() to simplify cleanup and avoid potential resource leaks.
- Consolidate extractTarArchive and extractTarArchiveFromReader to remove duplicate code and clarify the extraction API.
## Individual Comments
### Comment 1
<location> `pkg/distribution/packaging/dirtar.go:43-44` </location>
<code_context>
+ tw := tar.NewWriter(tmpFile)
+
+ // Walk the directory tree
+ err = filepath.Walk(dirPath, func(path string, info os.FileInfo, err error) error {
+ // Skip symlinks - they're not needed for model distribution and are
+ // skipped during extraction for security reasons
+ if info.Mode()&os.ModeSymlink != 0 {
</code_context>
<issue_to_address>
**issue (bug_risk):** Symlink skipping logic may not handle broken symlinks correctly.
Check for errors before accessing info to prevent panics with broken symlinks, and handle them explicitly to ensure robust packaging.
</issue_to_address>
### Comment 2
<location> `pkg/distribution/packaging/dirtar.go:74-75` </location>
<code_context>
+ }
+
+ // If it's a file, write its contents
+ if !info.IsDir() {
+ file, err := os.Open(path)
+ if err != nil {
+ return fmt.Errorf("open file %s: %w", path, err)
</code_context>
<issue_to_address>
**issue (bug_risk):** Deferring file.Close() in a loop can lead to resource exhaustion.
Explicitly close each file after processing to avoid exhausting file descriptors when handling many files.
</issue_to_address>
### Comment 3
<location> `pkg/distribution/internal/bundle/unpack.go:254-259` </location>
<code_context>
+
+ // Iterate through layers and extract directory tar archives
+ for _, layer := range layers {
+ mediaType, err := layer.MediaType()
+ if err != nil {
+ continue
+ }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Silent continue on MediaType error may mask underlying issues.
Consider logging or reporting the error from layer.MediaType() to aid in debugging malformed layers.
```suggestion
// Iterate through layers and extract directory tar archives
for _, layer := range layers {
mediaType, err := layer.MediaType()
if err != nil {
log.Printf("error getting media type for layer: %v", err)
continue
}
```
</issue_to_address>
### Comment 4
<location> `pkg/distribution/internal/bundle/unpack.go:273-274` </location>
<code_context>
+ }
+
+ // Stream directly to tar extraction - no temp file needed
+ if err := extractTarArchiveFromReader(uncompressed, modelDir); err != nil {
+ uncompressed.Close()
+ return fmt.Errorf("extract directory tar archive: %w", err)
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential resource leak if extractTarArchiveFromReader returns nil error.
Consider using defer to close uncompressed, ensuring the resource is released even if an early return or panic occurs.
</issue_to_address>
### Comment 5
<location> `pkg/distribution/packaging/dirtar_test.go:11` </location>
<code_context>
+ "testing"
+)
+
+func TestCreateDirTarArchive(t *testing.T) {
+ // Create a temporary directory with some test files
+ tempDir, err := os.MkdirTemp("", "dirtar-test-*")
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty directories.
Please add a test to confirm that empty directories are archived and extracted correctly, preserving their structure.
Suggested implementation:
```golang
func TestCreateDirTarArchive(t *testing.T) {
// Create a temporary directory with some test files
tempDir, err := os.MkdirTemp("", "dirtar-test-*")
if err != nil {
t.Fatalf("Failed to create temp dir: %v", err)
}
defer os.RemoveAll(tempDir)
// Create test directory structure
testDir := filepath.Join(tempDir, "test_directory")
if err := os.MkdirAll(filepath.Join(testDir, "subdir"), 0755); err != nil {
t.Fatalf("Failed to create test directory: %v", err)
}
// Test archiving and extracting an empty directory
t.Run("empty directory is archived and extracted correctly", func(t *testing.T) {
emptyDir := filepath.Join(tempDir, "empty_dir")
if err := os.Mkdir(emptyDir, 0755); err != nil {
t.Fatalf("Failed to create empty directory: %v", err)
}
// Create tar archive of the empty directory
tarPath := filepath.Join(tempDir, "empty_dir.tar")
tarFile, err := os.Create(tarPath)
if err != nil {
t.Fatalf("Failed to create tar file: %v", err)
}
defer tarFile.Close()
if err := CreateDirTarArchive(emptyDir, tarFile); err != nil {
t.Fatalf("Failed to archive empty directory: %v", err)
}
tarFile.Close()
// Extract the tar archive to a new location
extractDir := filepath.Join(tempDir, "extracted_empty_dir")
if err := os.Mkdir(extractDir, 0755); err != nil {
t.Fatalf("Failed to create extraction directory: %v", err)
}
tarFileRead, err := os.Open(tarPath)
if err != nil {
t.Fatalf("Failed to open tar file for reading: %v", err)
}
defer tarFileRead.Close()
if err := ExtractTarArchive(tarFileRead, extractDir); err != nil {
t.Fatalf("Failed to extract tar archive: %v", err)
}
// Check that the empty directory exists and is empty
extractedEmptyDir := filepath.Join(extractDir, filepath.Base(emptyDir))
info, err := os.Stat(extractedEmptyDir)
if err != nil {
t.Fatalf("Extracted empty directory does not exist: %v", err)
}
if !info.IsDir() {
t.Fatalf("Extracted path is not a directory")
}
entries, err := os.ReadDir(extractedEmptyDir)
if err != nil {
t.Fatalf("Failed to read extracted empty directory: %v", err)
}
if len(entries) != 0 {
t.Fatalf("Extracted empty directory is not empty")
}
})
```
This change assumes the existence of `CreateDirTarArchive` and `ExtractTarArchive` functions in your codebase. If their signatures differ or they do not exist, you will need to adjust the calls accordingly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
…ation, it's already done via defer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Validate dir-tar paths are relative (not absolute) | ||
| for _, dirPath := range opts.dirTarPaths { | ||
| if filepath.IsAbs(dirPath) { | ||
| return fmt.Errorf( | ||
| "dir-tar path must be relative, got absolute path: %s\n\n"+ | ||
| "See 'docker model package --help' for more information", | ||
| dirPath, | ||
| ) | ||
| } | ||
| } |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Absolute-path validation for --dir-tar is duplicated here and again during processing. Consider centralizing this check (e.g., in PreRunE only or a small helper) to avoid drift and keep validation in one place.
| for _, relDirPath := range opts.dirTarPaths { | ||
| // Reject absolute paths | ||
| if filepath.IsAbs(relDirPath) { | ||
| return fmt.Errorf("dir-tar path must be relative: %s", relDirPath) | ||
| } |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This re-enforces the absolute-path check already performed in PreRunE. To reduce duplication, either rely on the earlier validation or extract a shared helper used by both call sites.
| relPath, err := filepath.Rel(baseDir, fullDirPath) | ||
| if err != nil { | ||
| return fmt.Errorf("dir-tar path %q could not be validated: %w", relDirPath, err) | ||
| } | ||
| // Check if the relative path tries to escape the base directory | ||
| if relPath == ".." || len(relPath) >= 3 && relPath[:3] == ".."+string(filepath.Separator) { | ||
| return fmt.Errorf("dir-tar path %q escapes base directory", relDirPath) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The manual prefix check for '..' is a bit brittle to read. Consider using strings.HasPrefix(relPath, ".."+string(filepath.Separator)) or extracting a small helper for clarity and consistency with similar validations.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| c.Flags().StringVar(&opts.safetensorsDir, "safetensors-dir", "", "absolute path to directory containing safetensors files and config") | ||
| c.Flags().StringVar(&opts.chatTemplatePath, "chat-template", "", "absolute path to chat template file (must be Jinja format)") | ||
| c.Flags().StringArrayVarP(&opts.licensePaths, "license", "l", nil, "absolute path to a license file") | ||
| c.Flags().StringArrayVar(&opts.dirTarPaths, "dir-tar", nil, "relative path to directory to package as tar (can be specified multiple times)") |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify what 'relative' is relative to. Suggest: 'relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times'.
| c.Flags().StringArrayVar(&opts.dirTarPaths, "dir-tar", nil, "relative path to directory to package as tar (can be specified multiple times)") | |
| c.Flags().StringArrayVar(&opts.dirTarPaths, "dir-tar", nil, "relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times") |
| |:--------------------|:--------------|:--------|:---------------------------------------------------------------------------------------| | ||
| | `--chat-template` | `string` | | absolute path to chat template file (must be Jinja format) | | ||
| | `--context-size` | `uint64` | `0` | context size in tokens | | ||
| | `--dir-tar` | `stringArray` | | relative path to directory to package as tar (can be specified multiple times) | |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The help text doesn't specify what the path is relative to. Suggest: 'relative (to --safetensors-dir, or to the directory containing --gguf) path to a directory to package as a tar archive; can be specified multiple times'.
| | `--dir-tar` | `stringArray` | | relative path to directory to package as tar (can be specified multiple times) | | |
| | `--dir-tar` | `stringArray` | | relative (to --safetensors-dir, or to the directory containing --gguf) path to a directory to package as a tar archive; can be specified multiple times | |
| value_type: stringArray | ||
| default_value: '[]' | ||
| description: | | ||
| relative path to directory to package as tar (can be specified multiple times) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the base for the relative path. Suggest:\n'relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times'.
| relative path to directory to package as tar (can be specified multiple times) | |
| relative path (from --safetensors-dir, or from the GGUF file's directory) to a directory to package as a tar archive; can be specified multiple times |
| // WithDirTar adds a directory tar archive to the artifact. | ||
| // Multiple directory tar archives can be added by calling this method multiple times. | ||
| func (b *Builder) WithDirTar(path string) (*Builder, error) { | ||
| dirTarLayer, err := partial.NewLayer(path, types.MediaTypeDirTar) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("dir tar layer from %q: %w", path, err) | ||
| } | ||
| return &Builder{ | ||
| model: mutate.AppendLayers(b.model, dirTarLayer), | ||
| }, nil | ||
| } | ||
|
|
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The method name WithDirTar suggests it accepts a directory path, but it actually expects a tar file path. To avoid confusion for API consumers, either rename to WithDirTarArchive(path string) or accept a directory path and perform the archiving internally.
doringeman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This pull request adds support for including arbitrary directories as tar archives when packaging a model, allowing users to specify one or more directories to be preserved and included in the package. It introduces a new CLI flag, implements the logic for creating and handling these tar archives, and ensures they are properly unpacked on extraction.
Summary by Sourcery
Enable packaging and unpacking of arbitrary subfolders by archiving specified directories into tar layers, integrating support end-to-end in the CLI, builder, unpack logic, documentation, and tests
New Features:
Enhancements:
Documentation:
Tests: