LibPDF-js
diff --git a/‎.agents/ARCHITECTURE.md‎
Lines changed: 146 additions & 129 deletions b/‎.agents/ARCHITECTURE.md‎
Lines changed: 146 additions & 129 deletions
diff --git a/‎.agents/GOALS.md‎
Lines changed: 23 additions & 1 deletion b/‎.agents/GOALS.md‎
Lines changed: 23 additions & 1 deletion
diff --git a/‎.agents/README.md‎
Lines changed: 6 additions & 0 deletions b/‎.agents/README.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎.agents/plans/001-scanner.md‎
Lines changed: 14 additions & 4 deletions b/‎.agents/plans/001-scanner.md‎
Lines changed: 14 additions & 4 deletions
diff --git a/‎.agents/plans/002-pdf-objects.md‎
Lines changed: 19 additions & 11 deletions b/‎.agents/plans/002-pdf-objects.md‎
Lines changed: 19 additions & 11 deletions
@@ -5,52 +5,61 @@ This document captures the high-level goals for @libpdf/core. Use this to steer
 ## Core Capabilities
 
 ### 1. Encryption & Security
+
 - [x] **Load encrypted PDFs** — Support password-protected documents (user password, owner password)
 - [x] **Decrypt on load** — Handle all standard encryption handlers (RC4, AES-128, AES-256)
 - [ ] **Encrypt on save** — Apply encryption when writing PDFs (encryption logic done, needs writer)
 
 ### 2. Digital Signatures
+
 - [x] **Add digital signatures** — Sign PDFs with certificates (P12, CryptoKey signers)
 - [ ] **Verify signatures** — Validate existing signatures
 - [x] **LTV (Long-Term Validation)** — Embed CRLs, OCSP responses for long-term validity
 - [x] **DSS (Document Security Store)** — Full DSS support for archival signatures
 - [x] **PAdES compliance** — Support PAdES B-B, B-T, B-LT, B-LTA profiles
 
 ### 3. Modification
+
 - [x] **Add/remove pages** — Insert, delete, reorder pages
 - [x] **Add/remove content** — Draw text, images, graphics on pages
 - [ ] **Add/remove annotations** — Comments, highlights, stamps, etc.
 - [x] **Add/remove form fields** — Text fields, checkboxes, dropdowns, etc.
 - [x] **Incremental updates** — Append changes without rewriting (critical for signatures)
 
 ### 4. Forms
+
 - [x] **Complete form filling** — Fill all field types (text, checkbox, radio, dropdown, etc.)
 - [x] **Read form data** — Extract current field values
 - [x] **Flatten forms** — Convert form fields to static content
 - [ ] **Calculate fields** — Support JavaScript calculations (stretch)
 
 ### 5. Flattening
+
 - [x] **Flatten forms** — Bake form field appearances into page content
 - [ ] **Flatten annotations** — Bake annotation appearances into page content
 - [x] **Flatten layers** — Merge optional content groups (required before signing to prevent hidden content attacks)
 
 ### 6. Attachments
+
 - [x] **Extract attachments** — Get embedded files from PDF
 - [x] **Embed attachments** — Add files to PDF
 - [x] **File specifications** — Proper /EmbeddedFiles handling
 
 ### 7. Merging & Splitting
+
 - [x] **Merge PDFs** — Combine pages from multiple documents
 - [x] **Split PDFs** — Extract page ranges into new documents
 - [x] **Page embedding** — Embed pages as Form XObjects for overlays/watermarks
 - [ ] **Page imposition** — N-up, booklet layouts (stretch)
 
-### 8. Text Extraction *(stretch)*
+### 8. Text Extraction _(stretch)_
+
 - [ ] **Extract text** — Get text content from pages
 - [ ] **Preserve reading order** — Handle multi-column layouts
 - [ ] **Extract from annotations** — Include comment text, form values
 
 ### 9. Creation
+
 - [x] **Create from scratch** — Build PDFs programmatically
 - [x] **Add pages** — Create blank or content-filled pages
 - [x] **Draw content** — Text, images, paths, shapes
@@ -63,27 +72,35 @@ This document captures the high-level goals for @libpdf/core. Use this to steer
 ## Priority Tiers
 
 ### Tier 1: Foundation
+
 These enable most other features:
+
 1. **Encryption/Decryption** — Many real-world PDFs are encrypted ✓
 2. **Incremental Updates** — Required for signature preservation ✓
 3. **Object Modification** — Infrastructure for all write operations ✓
 
 ### Tier 2: High Value
+
 Most commonly requested features:
+
 1. **Form Filling** — Very common use case ✓
 2. **Digital Signatures** — Enterprise requirement ✓ (signing done, verification pending)
 3. **Merge/Split** — Common document workflows ✓
 4. **Attachments** — Common for invoices, contracts ✓
 5. **Layer Flattening** — Required before signing (security)
 
 ### Tier 3: Complete Solution
+
 Full-featured library:
+
 1. **Flattening** — Print-ready documents ✓ (forms and layers done, annotations pending)
 2. **Annotation Modification** — Review workflows
 3. **Text Extraction** — Search, indexing, accessibility
 
 ### Tier 4: Stretch
+
 Nice to have:
+
 1. **JavaScript Support** — Complex form calculations
 2. **Page Imposition** — Print production
 
@@ -92,26 +109,31 @@ Nice to have:
 ## Architectural Implications
 
 ### Encryption
+
 - Must integrate early in parsing pipeline
 - Affects object reading and stream decoding
 - Need to track encryption state throughout document
 
 ### Incremental Updates
+
 - Object graph must track modifications
 - Writer needs to serialize only changed objects
 - XRef must support appending new sections
 
 ### Digital Signatures
+
 - Depends on incremental updates (can't rewrite signed content)
 - Need access to raw byte ranges for signature computation
 - Must preserve exact bytes of signed regions
 
 ### Form Filling
+
 - Need appearance stream generation or AP dictionary handling
 - Font subsetting for text fields
 - Widget annotation management
 
 ### Merging
+
 - Object renumbering to avoid conflicts
 - Resource dictionary merging
 - Page tree restructuring
 
@@ -5,17 +5,21 @@ This directory is used by AI agents to track their work, planning, and decision-
 ## Top-Level Files
 
 ### GOALS.md
+
 High-level goals and priorities for the library. Check this before starting new features to ensure work aligns with project direction.
 
 ### ARCHITECTURE.md
+
 Current architecture documentation. Review before making architectural changes; update after significant changes to keep it accurate.
 
 ## Directories
 
 ### plans/
+
 Contains planning documents created during planning mode. These help track the approach and steps for implementing features or solving problems.
 
 **Naming convention**: Use sequential numbering with a descriptive name:
+
 ```
 001-scanner.md
 002-pdf-objects.md
@@ -27,7 +31,9 @@ Contains planning documents created during planning mode. These help track the a
 To find the next number, check the existing files and increment.
 
 ### justifications/
+
 Contains documents explaining why the agent made specific decisions. This provides transparency and helps with future reference when understanding past choices.
 
 ### scratch/
+
 Temporary workspace for notes, drafts, and work-in-progress content that doesn't need to be preserved long-term.
@@ -5,6 +5,7 @@ The scanner is the lowest layer — it reads bytes and provides primitives for t
 ## Goal
 
 Create a `Scanner` that wraps a `Uint8Array` and provides:
+
 - Position tracking with save/restore for backtracking
 - Peeking and advancing through bytes
 - Minimal API — lexer builds higher-level patterns on top
@@ -62,6 +63,7 @@ class Scanner {
 ## Design Decisions
 
 ### EOF Handling: -1 Sentinel
+
 `peek()` and `advance()` return -1 at end of input. This is the classic C-style approach — simple to check and avoids undefined/null complexity.
 
 ```typescript
@@ -71,30 +73,35 @@ while (scanner.peek() !== -1) {
 ```
 
 ### Backtracking: Save/Restore Position
+
 Following pdf-lib's pattern, backtracking is done by saving and restoring `position`:
 
 ```typescript
 const mark = scanner.position;
 // try to parse something
 if (failed) {
-  scanner.moveTo(mark);  // restore
+  scanner.moveTo(mark); // restore
 }
 ```
 
 No mark stack or dedicated mark/reset API — just use the position property directly.
 
 ### Boundary: Bytes Only
+
 Scanner handles only byte-level operations. PDF-specific concepts (whitespace, delimiters, tokens) belong in the lexer. This keeps Scanner simple and reusable.
 
 ### Newlines: No Normalization
+
 Scanner sees raw bytes. CR (0x0D), LF (0x0A), and CRLF sequences are passed through unchanged. The lexer handles newline semantics.
 
 ### Error Behavior: Return Indicators
+
 - `advance()` returns -1 if at end (does not advance)
 - `moveTo()` clamps to valid range instead of throwing
 - Matches lenient parsing philosophy — don't crash on edge cases
 
 ### No slice()
+
 YAGNI. If the lexer needs a byte range, it can use `scanner.bytes.subarray(start, end)` directly.
 
 ## Usage Example
@@ -103,13 +110,15 @@ YAGNI. If the lexer needs a byte range, it can use `scanner.bytes.subarray(start
 const scanner = new Scanner(bytes);
 
 // Read PDF header: %PDF-1.x
-if (scanner.match(0x25)) {  // %
+if (scanner.match(0x25)) {
+  // %
   const mark = scanner.position;
 
-  if (scanner.match(0x50) && scanner.match(0x44) && scanner.match(0x46)) {  // PDF
+  if (scanner.match(0x50) && scanner.match(0x44) && scanner.match(0x46)) {
+    // PDF
     // valid header start
   } else {
-    scanner.moveTo(mark);  // backtrack
+    scanner.moveTo(mark); // backtrack
   }
 }
 
@@ -141,5 +150,6 @@ const header = scanner.bytes.subarray(0, 8);
 ## Next Steps
 
 After scanner is complete:
+
 1. Build lexer on top — `nextToken()` returns typed tokens
 2. Token types: Number, Name, String, HexString, Keyword, Delimiter, Comment, EOF
@@ -6,43 +6,51 @@ Define PDF's low-level object primitives in `src/objects/`. These form the found
 
 ## Types
 
-| Type | Description | Example |
-|------|-------------|---------|
-| `PdfNull` | Null value | `null` |
-| `PdfBool` | Boolean | `true`, `false` |
-| `PdfNumber` | Integer or real | `42`, `-3.14` |
-| `PdfName` | Name token | `/Type`, `/Page` |
-| `PdfString` | Literal or hex string | `(Hello)`, `<48656C6C6F>` |
-| `PdfRef` | Indirect reference | `1 0 R` |
-| `PdfArray` | Array of objects | `[1 2 3]` |
-| `PdfDict` | Dictionary | `<< /Type /Page >>` |
-| `PdfStream` | Dict + binary data | `<< /Length 5 >> stream...` |
+| Type        | Description           | Example                     |
+| ----------- | --------------------- | --------------------------- |
+| `PdfNull`   | Null value            | `null`                      |
+| `PdfBool`   | Boolean               | `true`, `false`             |
+| `PdfNumber` | Integer or real       | `42`, `-3.14`               |
+| `PdfName`   | Name token            | `/Type`, `/Page`            |
+| `PdfString` | Literal or hex string | `(Hello)`, `<48656C6C6F>`   |
+| `PdfRef`    | Indirect reference    | `1 0 R`                     |
+| `PdfArray`  | Array of objects      | `[1 2 3]`                   |
+| `PdfDict`   | Dictionary            | `<< /Type /Page >>`         |
+| `PdfStream` | Dict + binary data    | `<< /Length 5 >> stream...` |
 
 ## Design Decisions
 
 ### Discriminated Union
+
 All types share a `type` field for runtime discrimination. Enables type guards and switch statements.
 
 ### Interning for PdfName and PdfRef
+
 These repeat constantly in PDFs. Interning via a static `.of()` factory:
+
 - Saves memory (one instance per unique value)
 - Enables fast equality via `===`
 
 ### Mutable Containers with Mutation Hook
+
 `PdfArray` and `PdfDict` are mutable — PDF modification requires changing entries. They support an optional `onMutate` callback that fires on changes (`set`, `push`, `delete`, etc.). The document layer wires this up for automatic dirty tracking during incremental saves.
 
 ### PdfString Stores Raw Bytes
+
 PDF strings can contain binary data, and encoding is context-dependent (PDFDocEncoding, UTF-16BE, etc.). Store raw bytes; decode on demand.
 
 ### No Auto-Dereferencing
+
 Containers store `PdfRef` as-is. Callers dereference via xref when needed. Matches PDF structure and enables lazy loading.
 
 ### PdfStream Extends PdfDict
+
 Streams are dictionaries with attached binary data. Inheritance avoids duplication.
 
 ## File Structure
 
 One file per type in `src/objects/`, plus:
+
 - `object.ts` — Union type and type guards
 
 ## Implementation Order