Skip to content

Desktop: Importing from OneNote: Support large attachments#15195

Open
personalizedrefrigerator wants to merge 19 commits intolaurent22:devfrom
personalizedrefrigerator:pr/desktop/onenote-import/improve-large-attachment-support
Open

Desktop: Importing from OneNote: Support large attachments#15195
personalizedrefrigerator wants to merge 19 commits intolaurent22:devfrom
personalizedrefrigerator:pr/desktop/onenote-import/improve-large-attachment-support

Conversation

@personalizedrefrigerator
Copy link
Copy Markdown
Collaborator

@personalizedrefrigerator personalizedrefrigerator commented Apr 24, 2026

Problem

.one files with large (e.g. multi-gigabyte) attachments failed to import. Previously, the onenote-converter package loaded full attachments into memory before writing them to disk.

Solution

Read attachments in chunks.

Follow-up to #15117.

Testing

  1. Attach a large (2.5-ish GiB) file to a note within OneNote.
  2. Export the page containing the note to a .one file.
  3. Import the .one file from Joplin.
  4. Verify that Joplin imports the note successfully.
Before After
Screencast.from.2026-04-24.12-25-29.webm
Screencast.from.2026-04-24.12-37-22.webm

Comment on lines +156 to +159
let png_data = fs::read(output_dir.join("Printout").join("test4_1.pdf.png"))
.expect("should read the PNG");
assert_eq!(png_data[0..4], [0x89, 0x50, 0x4E, 0x47], "PNG should have the correct initial bytes");
assert_eq!(png_data.len(), 14_005, "PNG should have the correct byte length");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that these values are consistent with the output before this pull request.

@github-actions github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026
@coderabbitai coderabbitai Bot added bug It's a bug desktop All desktop platforms renderer About the note renderer import Related to importing files such as ENEX, JEX, etc. performance Performance issues labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
packages/onenote-converter/renderer/src/page/image.rs (1)

95-101: Optional: accept &mut dyn Read instead of &mut Box<dyn Read>.

Minor nit — read_file_start doesn't need ownership-shape of the Box; taking &mut dyn Read would make it trivially reusable with other readers (e.g. tests) without boxing. No behaviour change.

♻️ Optional refactor
-fn read_file_start(reader: &mut Box<dyn Read>) -> Result<Vec<u8>> {
+fn read_file_start(reader: &mut dyn Read) -> Result<Vec<u8>> {
     let size: usize = 1024;
     let mut sub_reader = reader.by_ref().take(size as u64);
     let mut bytes = Vec::with_capacity(size);
     sub_reader.read_to_end(&mut bytes)?;
     Ok(bytes)
 }

Call site would become read_file_start(&mut *reader).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/renderer/src/page/image.rs` around lines 95 - 101,
Change the function signature of read_file_start from taking &mut Box<dyn Read>
to &mut dyn Read and update any call sites to pass the inner reader (e.g.
read_file_start(&mut *reader) when you have a Box). Inside read_file_start keep
the existing logic (use reader.by_ref().take(...), read_to_end, etc.) but
operate on the &mut dyn Read parameter; this removes unnecessary boxing and
makes the function usable with any mutable reader type without ownership
changes.
packages/onenote-converter/parser-utils/src/reader.rs (2)

338-355: Nice regression test.

Good coverage for the two invariants that matter here: the data ref reads from the offset captured at as_data_ref time, and consuming it doesn't perturb the parent Reader's position. Worth adding a sibling case for the ReaderData::File branch later (e.g. via Reader::try_from(Box<dyn FileHandle>)) since the BufferRef and File paths are implemented quite differently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/reader.rs` around lines 338 -
355, Add a sibling test that exercises the ReaderData::File branch to assert the
same two invariants (as_data_ref captures offset and reading the DataRef does
not advance the parent Reader): construct a Reader from a boxed FileHandle via
Reader::try_from(Box<dyn FileHandle>) (or a test FileHandle impl), call
reader.advance(...) then let data_ref = reader.as_data_ref(n).unwrap().read(),
consume data_ref into a buffer and assert its contents match the captured slice,
then assert reader.get_u8() returns the next byte; target symbols: Reader,
Reader::try_from, Reader::as_data_ref, ReaderData::File and BufferRef.

280-309: Save/seek/restore pattern preserves the reader invariant; one tiny nit.

Wrapping the read in a closure so the restore-seek always runs is the right shape for keeping the ReaderData::File invariant (file cursor == Reader.data_offset) intact.

Nit: the early-out if offset > self.end_offset uses strict >, so the exact-EOF case (offset == end_offset) falls through into a no-op read that still performs two seeks on the underlying file handle. Using >= would short-circuit cleanly at the boundary without changing any observable behaviour.

Suggested micro-tweak
-        let offset = self.start_offset;
-        if offset > self.end_offset {
-            return Ok(0);
-        }
+        let offset = self.start_offset;
+        if offset >= self.end_offset {
+            return Ok(0);
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/reader.rs` around lines 280 -
309, In ReaderFilePointer's Read impl (the read method), change the early-out
comparison from offset > self.end_offset to offset >= self.end_offset so the
exact-EOF case (start_offset == end_offset) returns immediately and avoids the
extra seek/restore; update the check that computes remaining accordingly so the
rest of the logic (the closure that seeks, reads, and restores file position and
the update of self.start_offset) remains unchanged.
packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs (1)

151-176: Streaming writer looks correct; a couple of optional robustness notes.

The truncate-then-append loop, the EOF termination, and the gradual buffer growth (capped at 50 MiB) all look right and correctly address the multi-GiB attachment case.

Two small, optional refinements to consider (not blocking):

  • data.read(...)? propagates ErrorKind::Interrupted as a hard error. For long streams this is worth retrying — you can get the same effect for free by delegating the read side to std::io::copy into a small adapter Write that forwards to append_file, which handles Interrupted internally.
  • The destination file is truncated up front and then grown in place via appends, so a mid-stream failure leaves a partially-written attachment on disk. If that matters for imports, consider streaming into a sibling temp path and renaming on success. This mirrors the pre-existing write_file behaviour, so it's not a regression introduced here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs` around
lines 151 - 176, stream_to_file currently truncates the target and appends
chunks directly; implement two optional robustness improvements: 1) stream into
a sibling temporary path (e.g., path + ".tmp") instead of truncating the final
file, using write_file(temp_path, &[]) to create/clear it, then rename the temp
to the final path on success and remove the temp on error; 2) replace the manual
read loop with std::io::copy by implementing a small Write adapter that forwards
writes to append_file(temp_path, buf) and maps errors through handle_error so
Interrupted is handled consistently; keep ApiResult return semantics and
reference stream_to_file, write_file, append_file and handle_error when making
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs`:
- Around line 151-176: stream_to_file currently truncates the target and appends
chunks directly; implement two optional robustness improvements: 1) stream into
a sibling temporary path (e.g., path + ".tmp") instead of truncating the final
file, using write_file(temp_path, &[]) to create/clear it, then rename the temp
to the final path on success and remove the temp on error; 2) replace the manual
read loop with std::io::copy by implementing a small Write adapter that forwards
writes to append_file(temp_path, buf) and maps errors through handle_error so
Interrupted is handled consistently; keep ApiResult return semantics and
reference stream_to_file, write_file, append_file and handle_error when making
changes.

In `@packages/onenote-converter/parser-utils/src/reader.rs`:
- Around line 338-355: Add a sibling test that exercises the ReaderData::File
branch to assert the same two invariants (as_data_ref captures offset and
reading the DataRef does not advance the parent Reader): construct a Reader from
a boxed FileHandle via Reader::try_from(Box<dyn FileHandle>) (or a test
FileHandle impl), call reader.advance(...) then let data_ref =
reader.as_data_ref(n).unwrap().read(), consume data_ref into a buffer and assert
its contents match the captured slice, then assert reader.get_u8() returns the
next byte; target symbols: Reader, Reader::try_from, Reader::as_data_ref,
ReaderData::File and BufferRef.
- Around line 280-309: In ReaderFilePointer's Read impl (the read method),
change the early-out comparison from offset > self.end_offset to offset >=
self.end_offset so the exact-EOF case (start_offset == end_offset) returns
immediately and avoids the extra seek/restore; update the check that computes
remaining accordingly so the rest of the logic (the closure that seeks, reads,
and restores file position and the update of self.start_offset) remains
unchanged.

In `@packages/onenote-converter/renderer/src/page/image.rs`:
- Around line 95-101: Change the function signature of read_file_start from
taking &mut Box<dyn Read> to &mut dyn Read and update any call sites to pass the
inner reader (e.g. read_file_start(&mut *reader) when you have a Box). Inside
read_file_start keep the existing logic (use reader.by_ref().take(...),
read_to_end, etc.) but operate on the &mut dyn Read parameter; this removes
unnecessary boxing and makes the function usable with any mutable reader type
without ownership changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d020cb4d-6ab0-4450-8e3c-449c8dbb0b28

📥 Commits

Reviewing files that changed from the base of the PR and between 85044ee and 84453b0.

📒 Files selected for processing (11)
  • packages/onenote-converter/parser-utils/node_functions.js
  • packages/onenote-converter/parser-utils/src/file_api/api.rs
  • packages/onenote-converter/parser-utils/src/file_api/native_driver.rs
  • packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs
  • packages/onenote-converter/parser-utils/src/reader.rs
  • packages/onenote-converter/parser/src/onenote/embedded_file.rs
  • packages/onenote-converter/parser/src/onenote/image.rs
  • packages/onenote-converter/parser/src/shared/file_data_ref.rs
  • packages/onenote-converter/renderer/src/page/embedded_file.rs
  • packages/onenote-converter/renderer/src/page/image.rs
  • packages/onenote-converter/renderer/tests/convert.rs

@github-actions github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026
Comment thread packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs
Comment thread packages/onenote-converter/parser-utils/src/reader.rs
@github-actions github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/onenote-converter/parser-utils/src/file_api/api.rs (1)

19-21: Optional: document truncate/create semantics in the trait doc.

The doc comment captures the partial-write caveat nicely, but it leaves the create/truncate contract implicit. Since both implementations (native and WASM) truncate the destination before streaming, it would help future implementers to state that explicitly on the trait so alternative backends don't diverge (e.g. appending instead of overwriting).

📝 Proposed doc tweak
-    /// Writes data from `stream` to the file at `path`.
-    /// Note: If `stream.read` fails, the file may be left in a partially-written state.
+    /// Writes data from `stream` to the file at `path`, creating the file if it
+    /// does not exist and truncating it if it does.
+    /// Note: If `stream.read` fails, the file may be left in a partially-written state.
     fn stream_to_file(&self, path: &str, stream: &mut dyn Read) -> ApiResult<()>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/file_api/api.rs` around lines 19
- 21, Update the trait documentation for stream_to_file in the Api trait to
explicitly state its create/truncate semantics: document that implementations
must create the destination file if it does not exist and truncate/overwrite any
existing file before writing the incoming stream (so partial-write caveat still
applies), and mention that implementations should not append. Reference the
stream_to_file method signature and ensure this contract is clear so native and
WASM backends (and future backends) follow the same overwrite behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/onenote-converter/parser-utils/src/file_api/api.rs`:
- Around line 19-21: Update the trait documentation for stream_to_file in the
Api trait to explicitly state its create/truncate semantics: document that
implementations must create the destination file if it does not exist and
truncate/overwrite any existing file before writing the incoming stream (so
partial-write caveat still applies), and mention that implementations should not
append. Reference the stream_to_file method signature and ensure this contract
is clear so native and WASM backends (and future backends) follow the same
overwrite behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6156c400-2983-49ce-974c-2d9e0963fd04

📥 Commits

Reviewing files that changed from the base of the PR and between 26908c2 and f5fad1e.

📒 Files selected for processing (1)
  • packages/onenote-converter/parser-utils/src/file_api/api.rs

@github-actions github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug It's a bug desktop All desktop platforms import Related to importing files such as ENEX, JEX, etc. performance Performance issues renderer About the note renderer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant