allow writing and assembling unzipped directory structure#73
allow writing and assembling unzipped directory structure#73a2800276 wants to merge 2 commits intogo-shiori:mainfrom
Conversation
Added `WriteDirectory` and `AssembleDirectory` methods to allow working with unzipped ePub directory structures.
There was a problem hiding this comment.
Pull request overview
This pull request adds functionality to write and assemble unzipped EPUB directory structures, addressing issue #72. The changes allow users to export EPUBs as exploded directory structures for inspection and manual editing, and to reassemble those directories back into zipped EPUB files.
Changes:
- Refactored common EPUB writing logic into a new
writeContentmethod shared byWriteToandWriteDirectory - Added
WriteDirectorymethod to export EPUB content as an unzipped directory structure - Added
AssembleDirectoryfunction to zip an exploded EPUB directory back into an EPUB file - Added comprehensive test coverage for the new functionality including error cases and round-trip testing
- Updated README to document the new capabilities
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| write.go | Core implementation: refactored writing logic, added WriteDirectory method, AssembleDirectory function, and helper functions |
| write_test.go | Test coverage for new functionality including happy path, error cases, and round-trip validation |
| README.md | Documentation updates to list new features |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| srcFile, err := filesystem.Open(path) | ||
| if err != nil { | ||
| return fmt.Errorf("Error opening source file %s: %w", path, err) | ||
| } | ||
| defer srcFile.Close() | ||
|
|
||
| destFile, err := os.Create(destPath) | ||
| if err != nil { | ||
| return fmt.Errorf("Error creating destination file %s: %w", destPath, err) | ||
| } | ||
| defer destFile.Close() |
There was a problem hiding this comment.
Resource leak risk in walk function: The deferred Close() calls for srcFile and destFile inside the walk function closure may not execute until the entire walk completes, potentially causing resource exhaustion when processing directories with many files. Consider explicitly closing files after the copy operation or restructuring to ensure timely resource cleanup.
There was a problem hiding this comment.
This is not correct, defer executes when the surrounding function returns.
| return fmt.Errorf("error creating zip writer: %w", err) | ||
| } | ||
|
|
||
| r, err := os.Open(path) | ||
| if err != nil { | ||
| return fmt.Errorf("error opening file %v being added to EPUB: %w", path, err) | ||
| } | ||
| defer func() { | ||
| if err := r.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| }() | ||
|
|
||
| _, err = io.Copy(w, r) | ||
| if err != nil { | ||
| return fmt.Errorf("error copying contents of file being added EPUB: %w", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // Add the mimetype file first | ||
| mimetypeFilePath := filepath.Join(rootEpubDir, mimetypeFilename) | ||
| mimetypeInfo, err := os.Stat(mimetypeFilePath) | ||
| if err != nil { | ||
| if err := z.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| return counter.Total, fmt.Errorf("unable to get FileInfo for mimetype file: %w", err) | ||
| } | ||
| err = addFileToZip(mimetypeFilePath, fileInfoToDirEntry(mimetypeInfo), nil) | ||
| if err != nil { | ||
| if err := z.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| return counter.Total, fmt.Errorf("unable to add mimetype file to EPUB: %w", err) | ||
| } | ||
|
|
||
| skipMimetypeFile = true | ||
|
|
||
| err = filepath.WalkDir(rootEpubDir, addFileToZip) | ||
| if err != nil { | ||
| if err := z.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| return counter.Total, fmt.Errorf("unable to add file to EPUB: %w", err) |
There was a problem hiding this comment.
Inconsistent error message style: The error messages in this function use lowercase initial letters (lines 398, 403, 413, 425, 432, 442) which is inconsistent with the codebase convention of capitalizing error messages (e.g., "Error creating temp directory" at lines 116, 145, 185). Additionally, line 403 has a grammatical error: "being added to EPUB" should be "being added to the EPUB" or "being added to EPUB". Line 413 has the same issue: "being added EPUB" should be "being added to the EPUB".
| // writeEpubFromDirectory is a helper that zips an exploded EPUB directory to an io.Writer. | ||
| // It's similar to writeEpub but works with an existing directory instead of a temp directory. | ||
| func writeEpubFromDirectory(rootEpubDir string, dst io.Writer) (int64, error) { | ||
| counter := &writeCounter{} | ||
| teeWriter := io.MultiWriter(counter, dst) | ||
|
|
||
| z := zip.NewWriter(teeWriter) | ||
|
|
||
| skipMimetypeFile := false | ||
|
|
||
| // addFileToZip adds the file present at path to the zip archive | ||
| addFileToZip := func(path string, d fs.DirEntry, err error) error { | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // Get the path of the file relative to the folder we're zipping | ||
| relativePath, err := filepath.Rel(rootEpubDir, path) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| relativePath = filepath.ToSlash(relativePath) | ||
|
|
||
| // Only include regular files, not directories | ||
| info, err := d.Info() | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if !info.Mode().IsRegular() { | ||
| return nil | ||
| } | ||
|
|
||
| var w io.Writer | ||
| if filepath.FromSlash(path) == filepath.Join(rootEpubDir, mimetypeFilename) { | ||
| // Skip the mimetype file if it's already been written | ||
| if skipMimetypeFile { | ||
| return nil | ||
| } | ||
| // The mimetype file must be uncompressed according to the EPUB spec | ||
| w, err = z.CreateHeader(&zip.FileHeader{ | ||
| Name: relativePath, | ||
| Method: zip.Store, | ||
| }) | ||
| } else { | ||
| w, err = z.Create(relativePath) | ||
| } | ||
| if err != nil { | ||
| return fmt.Errorf("error creating zip writer: %w", err) | ||
| } | ||
|
|
||
| r, err := os.Open(path) | ||
| if err != nil { | ||
| return fmt.Errorf("error opening file %v being added to EPUB: %w", path, err) | ||
| } | ||
| defer func() { | ||
| if err := r.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| }() | ||
|
|
||
| _, err = io.Copy(w, r) | ||
| if err != nil { | ||
| return fmt.Errorf("error copying contents of file being added EPUB: %w", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // Add the mimetype file first | ||
| mimetypeFilePath := filepath.Join(rootEpubDir, mimetypeFilename) | ||
| mimetypeInfo, err := os.Stat(mimetypeFilePath) | ||
| if err != nil { | ||
| if err := z.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| return counter.Total, fmt.Errorf("unable to get FileInfo for mimetype file: %w", err) | ||
| } | ||
| err = addFileToZip(mimetypeFilePath, fileInfoToDirEntry(mimetypeInfo), nil) | ||
| if err != nil { | ||
| if err := z.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| return counter.Total, fmt.Errorf("unable to add mimetype file to EPUB: %w", err) | ||
| } | ||
|
|
||
| skipMimetypeFile = true | ||
|
|
||
| err = filepath.WalkDir(rootEpubDir, addFileToZip) | ||
| if err != nil { | ||
| if err := z.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| return counter.Total, fmt.Errorf("unable to add file to EPUB: %w", err) | ||
| } | ||
|
|
||
| err = z.Close() | ||
| return counter.Total, err | ||
| } |
There was a problem hiding this comment.
Code duplication: The writeEpubFromDirectory function contains significant code duplication with the existing writeEpub function (lines 451-546). The main difference is that writeEpubFromDirectory uses os.Open and filepath.WalkDir while writeEpub uses filesystem.Open and fs.WalkDir(filesystem, ...). Consider refactoring to extract common logic or have writeEpubFromDirectory accept a filesystem parameter to reduce duplication and improve maintainability.
|
|
||
| // The order in which the file is assembled matters for some files. | ||
| // Containers, CSS, media, and sections can only be written after the | ||
| // necessary folders are created (createEPubFolders). |
There was a problem hiding this comment.
Typo in comment: "createEPubFolders" should be "createEpubFolders" to match the actual function name (note the lowercase 'p' in 'Epub').
| // necessary folders are created (createEPubFolders). | |
| // necessary folders are created (createEpubFolders). |
| defer func() { | ||
| if err := r.Close(); err != nil { | ||
| log.Println(err) | ||
| } | ||
| }() | ||
|
|
||
| _, err = io.Copy(w, r) |
There was a problem hiding this comment.
Resource leak risk in walk function: The deferred Close() call for the file reader inside the walk function closure may not execute until the entire walk completes, potentially causing resource exhaustion when processing directories with many files. Consider explicitly closing the file after the copy operation or restructuring to ensure timely resource cleanup.
| defer func() { | |
| if err := r.Close(); err != nil { | |
| log.Println(err) | |
| } | |
| }() | |
| _, err = io.Copy(w, r) | |
| _, err = io.Copy(w, r) | |
| if cerr := r.Close(); cerr != nil { | |
| log.Println(cerr) | |
| } |
There was a problem hiding this comment.
This seems to be a hallucination, or I am not understanding correctly. defer executes when the immediate enclosing scope is left, so there should be no risk of resource exhaustion.
| func TestWriteDirectory(t *testing.T) { | ||
| e, err := NewEpub("Test WriteDirectory") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| e.SetAuthor("Test Author") | ||
| _, err = e.AddSection("<h1>Chapter 1</h1><p>Content here.</p>", "Chapter 1", "", "") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| tempDir := t.TempDir() | ||
| println(tempDir) | ||
| epubDir := filepath.Join(tempDir, "test-epub") | ||
|
|
||
| err = e.WriteDirectory(epubDir) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| // Verify the directory structure was created | ||
| expectedPaths := []string{ | ||
| filepath.Join(epubDir, "mimetype"), | ||
| filepath.Join(epubDir, "META-INF", "container.xml"), | ||
| filepath.Join(epubDir, "EPUB", "package.opf"), | ||
| filepath.Join(epubDir, "EPUB", "xhtml"), | ||
| } | ||
|
|
||
| for _, path := range expectedPaths { | ||
| if _, err := os.Stat(path); os.IsNotExist(err) { | ||
| t.Errorf("Expected path does not exist: %s", path) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func TestAssembleDirectory(t *testing.T) { | ||
| // First create an exploded EPUB directory | ||
| e, err := NewEpub("Test AssembleDirectory") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| e.SetAuthor("Test Author") | ||
| _, err = e.AddSection("<h1>Chapter 1</h1><p>Content here.</p>", "Chapter 1", "", "") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| tempDir := t.TempDir() | ||
| epubDir := filepath.Join(tempDir, "test-epub") | ||
|
|
||
| err = e.WriteDirectory(epubDir) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| // Now assemble it into a zipped EPUB | ||
| epubFile := filepath.Join(tempDir, "test.epub") | ||
| err = AssembleDirectory(epubDir, epubFile) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| // Verify the EPUB file was created | ||
| info, err := os.Stat(epubFile) |
There was a problem hiding this comment.
Missing test coverage for error cases: The WriteDirectory function lacks test coverage for error scenarios. Consider adding tests for cases such as: inability to create the destination directory due to permission issues, failure during the copy operation, or when writeContent fails. This would improve robustness and match the error testing pattern used for other write methods.
This turned up a small bug in the WriteTo chain. Writing an epub several times readds existing entries to ToC and spine. This is also fixed. Write calls now reset ToC, etc. prior to writing.
Added
WriteDirectoryandAssembleDirectorymethods to allow working with unzipped ePub directory structures.As described in #72
The changes are backwards compatible and tests are included for the functionality.