Desktop: Importing from OneNote: Support large attachments by personalizedrefrigerator · Pull Request #15195 · laurent22/joplin

personalizedrefrigerator · 2026-04-24T19:46:43Z

Problem

.one files with large (e.g. multi-gigabyte) attachments failed to import. Previously, the onenote-converter package loaded full attachments into memory before writing them to disk.

Solution

Read attachments in chunks.

Follow-up to #15117.

Testing

Attach a large (2.5-ish GiB) file to a note within OneNote.
Export the page containing the note to a .one file.
Import the .one file from Joplin.
Verify that Joplin imports the note successfully.

Before	After
Screencast.from.2026-04-24.12-25-29.webm	Screencast.from.2026-04-24.12-37-22.webm

…mport/improve-large-attachment-support

personalizedrefrigerator · 2026-04-24T19:48:26Z

+    let png_data = fs::read(output_dir.join("Printout").join("test4_1.pdf.png"))
+        .expect("should read the PNG");
+    assert_eq!(png_data[0..4], [0x89, 0x50, 0x4E, 0x47], "PNG should have the correct initial bytes");
+    assert_eq!(png_data.len(), 14_005, "PNG should have the correct byte length");


I've verified that these values are consistent with the output before this pull request.

coderabbitai

🧹 Nitpick comments (4)

packages/onenote-converter/renderer/src/page/image.rs (1)
95-101: Optional: accept &mut dyn Read instead of &mut Box<dyn Read>.

Minor nit — read_file_start doesn't need ownership-shape of the Box; taking &mut dyn Read would make it trivially reusable with other readers (e.g. tests) without boxing. No behaviour change.
♻️ Optional refactor
-fn read_file_start(reader: &mut Box<dyn Read>) -> Result<Vec<u8>> {
+fn read_file_start(reader: &mut dyn Read) -> Result<Vec<u8>> {
     let size: usize = 1024;
     let mut sub_reader = reader.by_ref().take(size as u64);
     let mut bytes = Vec::with_capacity(size);
     sub_reader.read_to_end(&mut bytes)?;
     Ok(bytes)
 }
Call site would become read_file_start(&mut *reader).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/renderer/src/page/image.rs` around lines 95 - 101,
Change the function signature of read_file_start from taking &mut Box<dyn Read>
to &mut dyn Read and update any call sites to pass the inner reader (e.g.
read_file_start(&mut *reader) when you have a Box). Inside read_file_start keep
the existing logic (use reader.by_ref().take(...), read_to_end, etc.) but
operate on the &mut dyn Read parameter; this removes unnecessary boxing and
makes the function usable with any mutable reader type without ownership
changes.
packages/onenote-converter/parser-utils/src/reader.rs (2)
338-355: Nice regression test.

Good coverage for the two invariants that matter here: the data ref reads from the offset captured at as_data_ref time, and consuming it doesn't perturb the parent Reader's position. Worth adding a sibling case for the ReaderData::File branch later (e.g. via Reader::try_from(Box<dyn FileHandle>)) since the BufferRef and File paths are implemented quite differently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/reader.rs` around lines 338 -
355, Add a sibling test that exercises the ReaderData::File branch to assert the
same two invariants (as_data_ref captures offset and reading the DataRef does
not advance the parent Reader): construct a Reader from a boxed FileHandle via
Reader::try_from(Box<dyn FileHandle>) (or a test FileHandle impl), call
reader.advance(...) then let data_ref = reader.as_data_ref(n).unwrap().read(),
consume data_ref into a buffer and assert its contents match the captured slice,
then assert reader.get_u8() returns the next byte; target symbols: Reader,
Reader::try_from, Reader::as_data_ref, ReaderData::File and BufferRef.
280-309: Save/seek/restore pattern preserves the reader invariant; one tiny nit.

Wrapping the read in a closure so the restore-seek always runs is the right shape for keeping the ReaderData::File invariant (file cursor == Reader.data_offset) intact.

Nit: the early-out if offset > self.end_offset uses strict >, so the exact-EOF case (offset == end_offset) falls through into a no-op read that still performs two seeks on the underlying file handle. Using >= would short-circuit cleanly at the boundary without changing any observable behaviour.
Suggested micro-tweak
-        let offset = self.start_offset;
-        if offset > self.end_offset {
-            return Ok(0);
-        }
+        let offset = self.start_offset;
+        if offset >= self.end_offset {
+            return Ok(0);
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/reader.rs` around lines 280 -
309, In ReaderFilePointer's Read impl (the read method), change the early-out
comparison from offset > self.end_offset to offset >= self.end_offset so the
exact-EOF case (start_offset == end_offset) returns immediately and avoids the
extra seek/restore; update the check that computes remaining accordingly so the
rest of the logic (the closure that seeks, reads, and restores file position and
the update of self.start_offset) remains unchanged.
packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs (1)
151-176: Streaming writer looks correct; a couple of optional robustness notes.

The truncate-then-append loop, the EOF termination, and the gradual buffer growth (capped at 50 MiB) all look right and correctly address the multi-GiB attachment case.

Two small, optional refinements to consider (not blocking):

data.read(...)? propagates ErrorKind::Interrupted as a hard error. For long streams this is worth retrying — you can get the same effect for free by delegating the read side to std::io::copy into a small adapter Write that forwards to append_file, which handles Interrupted internally.

The destination file is truncated up front and then grown in place via appends, so a mid-stream failure leaves a partially-written attachment on disk. If that matters for imports, consider streaming into a sibling temp path and renaming on success. This mirrors the pre-existing write_file behaviour, so it's not a regression introduced here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs` around
lines 151 - 176, stream_to_file currently truncates the target and appends
chunks directly; implement two optional robustness improvements: 1) stream into
a sibling temporary path (e.g., path + ".tmp") instead of truncating the final
file, using write_file(temp_path, &[]) to create/clear it, then rename the temp
to the final path on success and remove the temp on error; 2) replace the manual
read loop with std::io::copy by implementing a small Write adapter that forwards
writes to append_file(temp_path, buf) and maps errors through handle_error so
Interrupted is handled consistently; keep ApiResult return semantics and
reference stream_to_file, write_file, append_file and handle_error when making
changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs`:
- Around line 151-176: stream_to_file currently truncates the target and appends
chunks directly; implement two optional robustness improvements: 1) stream into
a sibling temporary path (e.g., path + ".tmp") instead of truncating the final
file, using write_file(temp_path, &[]) to create/clear it, then rename the temp
to the final path on success and remove the temp on error; 2) replace the manual
read loop with std::io::copy by implementing a small Write adapter that forwards
writes to append_file(temp_path, buf) and maps errors through handle_error so
Interrupted is handled consistently; keep ApiResult return semantics and
reference stream_to_file, write_file, append_file and handle_error when making
changes.

In `@packages/onenote-converter/parser-utils/src/reader.rs`:
- Around line 338-355: Add a sibling test that exercises the ReaderData::File
branch to assert the same two invariants (as_data_ref captures offset and
reading the DataRef does not advance the parent Reader): construct a Reader from
a boxed FileHandle via Reader::try_from(Box<dyn FileHandle>) (or a test
FileHandle impl), call reader.advance(...) then let data_ref =
reader.as_data_ref(n).unwrap().read(), consume data_ref into a buffer and assert
its contents match the captured slice, then assert reader.get_u8() returns the
next byte; target symbols: Reader, Reader::try_from, Reader::as_data_ref,
ReaderData::File and BufferRef.
- Around line 280-309: In ReaderFilePointer's Read impl (the read method),
change the early-out comparison from offset > self.end_offset to offset >=
self.end_offset so the exact-EOF case (start_offset == end_offset) returns
immediately and avoids the extra seek/restore; update the check that computes
remaining accordingly so the rest of the logic (the closure that seeks, reads,
and restores file position and the update of self.start_offset) remains
unchanged.

In `@packages/onenote-converter/renderer/src/page/image.rs`:
- Around line 95-101: Change the function signature of read_file_start from
taking &mut Box<dyn Read> to &mut dyn Read and update any call sites to pass the
inner reader (e.g. read_file_start(&mut *reader) when you have a Box). Inside
read_file_start keep the existing logic (use reader.by_ref().take(...),
read_to_end, etc.) but operate on the &mut dyn Read parameter; this removes
unnecessary boxing and makes the function usable with any mutable reader type
without ownership changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d020cb4d-6ab0-4450-8e3c-449c8dbb0b28

📥 Commits

Reviewing files that changed from the base of the PR and between 85044ee and 84453b0.

📒 Files selected for processing (11)

packages/onenote-converter/parser-utils/node_functions.js
packages/onenote-converter/parser-utils/src/file_api/api.rs
packages/onenote-converter/parser-utils/src/file_api/native_driver.rs
packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs
packages/onenote-converter/parser-utils/src/reader.rs
packages/onenote-converter/parser/src/onenote/embedded_file.rs
packages/onenote-converter/parser/src/onenote/image.rs
packages/onenote-converter/parser/src/shared/file_data_ref.rs
packages/onenote-converter/renderer/src/page/embedded_file.rs
packages/onenote-converter/renderer/src/page/image.rs
packages/onenote-converter/renderer/tests/convert.rs

According to the documentation, ErrorKind::Interrupted can "typically be retried". (https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.Interrupted)

coderabbitai

🧹 Nitpick comments (1)

packages/onenote-converter/parser-utils/src/file_api/api.rs (1)
19-21: Optional: document truncate/create semantics in the trait doc.

The doc comment captures the partial-write caveat nicely, but it leaves the create/truncate contract implicit. Since both implementations (native and WASM) truncate the destination before streaming, it would help future implementers to state that explicitly on the trait so alternative backends don't diverge (e.g. appending instead of overwriting).
📝 Proposed doc tweak
-    /// Writes data from `stream` to the file at `path`.
-    /// Note: If `stream.read` fails, the file may be left in a partially-written state.
+    /// Writes data from `stream` to the file at `path`, creating the file if it
+    /// does not exist and truncating it if it does.
+    /// Note: If `stream.read` fails, the file may be left in a partially-written state.
     fn stream_to_file(&self, path: &str, stream: &mut dyn Read) -> ApiResult<()>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/file_api/api.rs` around lines 19
- 21, Update the trait documentation for stream_to_file in the Api trait to
explicitly state its create/truncate semantics: document that implementations
must create the destination file if it does not exist and truncate/overwrite any
existing file before writing the incoming stream (so partial-write caveat still
applies), and mention that implementations should not append. Reference the
stream_to_file method signature and ensure this contract is clear so native and
WASM backends (and future backends) follow the same overwrite behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/onenote-converter/parser-utils/src/file_api/api.rs`:
- Around line 19-21: Update the trait documentation for stream_to_file in the
Api trait to explicitly state its create/truncate semantics: document that
implementations must create the destination file if it does not exist and
truncate/overwrite any existing file before writing the incoming stream (so
partial-write caveat still applies), and mention that implementations should not
append. Reference the stream_to_file method signature and ensure this contract
is clear so native and WASM backends (and future backends) follow the same
overwrite behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6156c400-2983-49ce-974c-2d9e0963fd04

📥 Commits

Reviewing files that changed from the base of the PR and between 26908c2 and f5fad1e.

📒 Files selected for processing (1)

packages/onenote-converter/parser-utils/src/file_api/api.rs

personalizedrefrigerator added 16 commits April 24, 2026 09:39

Desktop: Importing from OneNote: Improve support for large attachments

35d6eaa

Fix writing zero-size files

5ccb8be

Gradually increase the buffer size

7a22934

Fix reader offset

6f65e96

Merge remote-tracking branch 'upstream/dev' into pr/desktop/onenote-i…

1be8533

…mport/improve-large-attachment-support

Don't allow reading past the end of the file pointer

d071b6d

cargo fmt

9c0a18a

Fix writing images smaller than 1KB

a2042f0

Fix clippy warning

0e38df5

Refactoring

5435044

Add test, refactor

9ad4dc5

Refactor: reader -> read

7dfaada

Fix image reading

283ed50

cargo fmt

2aa5f30

Add a regression test for attachment reading

27e41f0

cargo fmt

84453b0

personalizedrefrigerator commented Apr 24, 2026

View reviewed changes

github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026

coderabbitai Bot added bug It's a bug desktop All desktop platforms renderer About the note renderer import Related to importing files such as ENEX, JEX, etc. performance Performance issues labels Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026

personalizedrefrigerator added 2 commits April 24, 2026 13:15

Implement CodeRabbit review feedback: Retry ErrorKind::Interrupted

810be9d

According to the documentation, ErrorKind::Interrupted can "typically be retried". (https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.Interrupted)

Apply CodeRabbit feedback: Stop read early if at end_offset

26908c2

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs

Comment thread packages/onenote-converter/parser-utils/src/reader.rs

github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026

Apply CodeRabbit feedback: Document stream_to_file edge case

f5fad1e

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

github-actions Bot deleted a comment from coderabbitai Bot Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Desktop: Importing from OneNote: Support large attachments#15195

Desktop: Importing from OneNote: Support large attachments#15195
personalizedrefrigerator wants to merge 19 commits intolaurent22:devfrom
personalizedrefrigerator:pr/desktop/onenote-import/improve-large-attachment-support

personalizedrefrigerator commented Apr 24, 2026 •

edited

Loading

Uh oh!

personalizedrefrigerator Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

personalizedrefrigerator commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Uh oh!

personalizedrefrigerator Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

personalizedrefrigerator commented Apr 24, 2026 •

edited

Loading