Skip to content

Commit 61a72dd

Browse files
authored
build: migrate from biome to oxc (oxlint + oxfmt) (#1)
Replace Biome with Oxc toolchain for linting and formatting: - oxlint: 50-100x faster than ESLint, with TypeScript and Vitest plugins - oxfmt: 30x faster than Prettier, Prettier-compatible output Config files: - Add .oxlintrc.json with correctness/suspicious rules - Add .oxfmtrc.json matching previous Biome formatting settings - Update .vscode/settings.json for editor integration - Update .lintstagedrc.json for pre-commit hooks Bug fix included: - Add error cause preservation in src/signatures/aia.ts Note: oxfmt reformatted many files (markdown tables, trailing whitespace, import sorting) which accounts for the large diff.
1 parent 5bb7c82 commit 61a72dd

File tree

347 files changed

+5231
-4269
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

347 files changed

+5231
-4269
lines changed

.agents/ARCHITECTURE.md

Lines changed: 146 additions & 129 deletions
Large diffs are not rendered by default.

.agents/GOALS.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,52 +5,61 @@ This document captures the high-level goals for @libpdf/core. Use this to steer
55
## Core Capabilities
66

77
### 1. Encryption & Security
8+
89
- [x] **Load encrypted PDFs** — Support password-protected documents (user password, owner password)
910
- [x] **Decrypt on load** — Handle all standard encryption handlers (RC4, AES-128, AES-256)
1011
- [ ] **Encrypt on save** — Apply encryption when writing PDFs (encryption logic done, needs writer)
1112

1213
### 2. Digital Signatures
14+
1315
- [x] **Add digital signatures** — Sign PDFs with certificates (P12, CryptoKey signers)
1416
- [ ] **Verify signatures** — Validate existing signatures
1517
- [x] **LTV (Long-Term Validation)** — Embed CRLs, OCSP responses for long-term validity
1618
- [x] **DSS (Document Security Store)** — Full DSS support for archival signatures
1719
- [x] **PAdES compliance** — Support PAdES B-B, B-T, B-LT, B-LTA profiles
1820

1921
### 3. Modification
22+
2023
- [x] **Add/remove pages** — Insert, delete, reorder pages
2124
- [x] **Add/remove content** — Draw text, images, graphics on pages
2225
- [ ] **Add/remove annotations** — Comments, highlights, stamps, etc.
2326
- [x] **Add/remove form fields** — Text fields, checkboxes, dropdowns, etc.
2427
- [x] **Incremental updates** — Append changes without rewriting (critical for signatures)
2528

2629
### 4. Forms
30+
2731
- [x] **Complete form filling** — Fill all field types (text, checkbox, radio, dropdown, etc.)
2832
- [x] **Read form data** — Extract current field values
2933
- [x] **Flatten forms** — Convert form fields to static content
3034
- [ ] **Calculate fields** — Support JavaScript calculations (stretch)
3135

3236
### 5. Flattening
37+
3338
- [x] **Flatten forms** — Bake form field appearances into page content
3439
- [ ] **Flatten annotations** — Bake annotation appearances into page content
3540
- [x] **Flatten layers** — Merge optional content groups (required before signing to prevent hidden content attacks)
3641

3742
### 6. Attachments
43+
3844
- [x] **Extract attachments** — Get embedded files from PDF
3945
- [x] **Embed attachments** — Add files to PDF
4046
- [x] **File specifications** — Proper /EmbeddedFiles handling
4147

4248
### 7. Merging & Splitting
49+
4350
- [x] **Merge PDFs** — Combine pages from multiple documents
4451
- [x] **Split PDFs** — Extract page ranges into new documents
4552
- [x] **Page embedding** — Embed pages as Form XObjects for overlays/watermarks
4653
- [ ] **Page imposition** — N-up, booklet layouts (stretch)
4754

48-
### 8. Text Extraction *(stretch)*
55+
### 8. Text Extraction _(stretch)_
56+
4957
- [ ] **Extract text** — Get text content from pages
5058
- [ ] **Preserve reading order** — Handle multi-column layouts
5159
- [ ] **Extract from annotations** — Include comment text, form values
5260

5361
### 9. Creation
62+
5463
- [x] **Create from scratch** — Build PDFs programmatically
5564
- [x] **Add pages** — Create blank or content-filled pages
5665
- [x] **Draw content** — Text, images, paths, shapes
@@ -63,27 +72,35 @@ This document captures the high-level goals for @libpdf/core. Use this to steer
6372
## Priority Tiers
6473

6574
### Tier 1: Foundation
75+
6676
These enable most other features:
77+
6778
1. **Encryption/Decryption** — Many real-world PDFs are encrypted ✓
6879
2. **Incremental Updates** — Required for signature preservation ✓
6980
3. **Object Modification** — Infrastructure for all write operations ✓
7081

7182
### Tier 2: High Value
83+
7284
Most commonly requested features:
85+
7386
1. **Form Filling** — Very common use case ✓
7487
2. **Digital Signatures** — Enterprise requirement ✓ (signing done, verification pending)
7588
3. **Merge/Split** — Common document workflows ✓
7689
4. **Attachments** — Common for invoices, contracts ✓
7790
5. **Layer Flattening** — Required before signing (security)
7891

7992
### Tier 3: Complete Solution
93+
8094
Full-featured library:
95+
8196
1. **Flattening** — Print-ready documents ✓ (forms and layers done, annotations pending)
8297
2. **Annotation Modification** — Review workflows
8398
3. **Text Extraction** — Search, indexing, accessibility
8499

85100
### Tier 4: Stretch
101+
86102
Nice to have:
103+
87104
1. **JavaScript Support** — Complex form calculations
88105
2. **Page Imposition** — Print production
89106

@@ -92,26 +109,31 @@ Nice to have:
92109
## Architectural Implications
93110

94111
### Encryption
112+
95113
- Must integrate early in parsing pipeline
96114
- Affects object reading and stream decoding
97115
- Need to track encryption state throughout document
98116

99117
### Incremental Updates
118+
100119
- Object graph must track modifications
101120
- Writer needs to serialize only changed objects
102121
- XRef must support appending new sections
103122

104123
### Digital Signatures
124+
105125
- Depends on incremental updates (can't rewrite signed content)
106126
- Need access to raw byte ranges for signature computation
107127
- Must preserve exact bytes of signed regions
108128

109129
### Form Filling
130+
110131
- Need appearance stream generation or AP dictionary handling
111132
- Font subsetting for text fields
112133
- Widget annotation management
113134

114135
### Merging
136+
115137
- Object renumbering to avoid conflicts
116138
- Resource dictionary merging
117139
- Page tree restructuring

.agents/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,21 @@ This directory is used by AI agents to track their work, planning, and decision-
55
## Top-Level Files
66

77
### GOALS.md
8+
89
High-level goals and priorities for the library. Check this before starting new features to ensure work aligns with project direction.
910

1011
### ARCHITECTURE.md
12+
1113
Current architecture documentation. Review before making architectural changes; update after significant changes to keep it accurate.
1214

1315
## Directories
1416

1517
### plans/
18+
1619
Contains planning documents created during planning mode. These help track the approach and steps for implementing features or solving problems.
1720

1821
**Naming convention**: Use sequential numbering with a descriptive name:
22+
1923
```
2024
001-scanner.md
2125
002-pdf-objects.md
@@ -27,7 +31,9 @@ Contains planning documents created during planning mode. These help track the a
2731
To find the next number, check the existing files and increment.
2832

2933
### justifications/
34+
3035
Contains documents explaining why the agent made specific decisions. This provides transparency and helps with future reference when understanding past choices.
3136

3237
### scratch/
38+
3339
Temporary workspace for notes, drafts, and work-in-progress content that doesn't need to be preserved long-term.

.agents/plans/001-scanner.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ The scanner is the lowest layer — it reads bytes and provides primitives for t
55
## Goal
66

77
Create a `Scanner` that wraps a `Uint8Array` and provides:
8+
89
- Position tracking with save/restore for backtracking
910
- Peeking and advancing through bytes
1011
- Minimal API — lexer builds higher-level patterns on top
@@ -62,6 +63,7 @@ class Scanner {
6263
## Design Decisions
6364

6465
### EOF Handling: -1 Sentinel
66+
6567
`peek()` and `advance()` return -1 at end of input. This is the classic C-style approach — simple to check and avoids undefined/null complexity.
6668

6769
```typescript
@@ -71,30 +73,35 @@ while (scanner.peek() !== -1) {
7173
```
7274

7375
### Backtracking: Save/Restore Position
76+
7477
Following pdf-lib's pattern, backtracking is done by saving and restoring `position`:
7578

7679
```typescript
7780
const mark = scanner.position;
7881
// try to parse something
7982
if (failed) {
80-
scanner.moveTo(mark); // restore
83+
scanner.moveTo(mark); // restore
8184
}
8285
```
8386

8487
No mark stack or dedicated mark/reset API — just use the position property directly.
8588

8689
### Boundary: Bytes Only
90+
8791
Scanner handles only byte-level operations. PDF-specific concepts (whitespace, delimiters, tokens) belong in the lexer. This keeps Scanner simple and reusable.
8892

8993
### Newlines: No Normalization
94+
9095
Scanner sees raw bytes. CR (0x0D), LF (0x0A), and CRLF sequences are passed through unchanged. The lexer handles newline semantics.
9196

9297
### Error Behavior: Return Indicators
98+
9399
- `advance()` returns -1 if at end (does not advance)
94100
- `moveTo()` clamps to valid range instead of throwing
95101
- Matches lenient parsing philosophy — don't crash on edge cases
96102

97103
### No slice()
104+
98105
YAGNI. If the lexer needs a byte range, it can use `scanner.bytes.subarray(start, end)` directly.
99106

100107
## Usage Example
@@ -103,13 +110,15 @@ YAGNI. If the lexer needs a byte range, it can use `scanner.bytes.subarray(start
103110
const scanner = new Scanner(bytes);
104111

105112
// Read PDF header: %PDF-1.x
106-
if (scanner.match(0x25)) { // %
113+
if (scanner.match(0x25)) {
114+
// %
107115
const mark = scanner.position;
108116

109-
if (scanner.match(0x50) && scanner.match(0x44) && scanner.match(0x46)) { // PDF
117+
if (scanner.match(0x50) && scanner.match(0x44) && scanner.match(0x46)) {
118+
// PDF
110119
// valid header start
111120
} else {
112-
scanner.moveTo(mark); // backtrack
121+
scanner.moveTo(mark); // backtrack
113122
}
114123
}
115124

@@ -141,5 +150,6 @@ const header = scanner.bytes.subarray(0, 8);
141150
## Next Steps
142151

143152
After scanner is complete:
153+
144154
1. Build lexer on top — `nextToken()` returns typed tokens
145155
2. Token types: Number, Name, String, HexString, Keyword, Delimiter, Comment, EOF

.agents/plans/002-pdf-objects.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,43 +6,51 @@ Define PDF's low-level object primitives in `src/objects/`. These form the found
66

77
## Types
88

9-
| Type | Description | Example |
10-
|------|-------------|---------|
11-
| `PdfNull` | Null value | `null` |
12-
| `PdfBool` | Boolean | `true`, `false` |
13-
| `PdfNumber` | Integer or real | `42`, `-3.14` |
14-
| `PdfName` | Name token | `/Type`, `/Page` |
15-
| `PdfString` | Literal or hex string | `(Hello)`, `<48656C6C6F>` |
16-
| `PdfRef` | Indirect reference | `1 0 R` |
17-
| `PdfArray` | Array of objects | `[1 2 3]` |
18-
| `PdfDict` | Dictionary | `<< /Type /Page >>` |
19-
| `PdfStream` | Dict + binary data | `<< /Length 5 >> stream...` |
9+
| Type | Description | Example |
10+
| ----------- | --------------------- | --------------------------- |
11+
| `PdfNull` | Null value | `null` |
12+
| `PdfBool` | Boolean | `true`, `false` |
13+
| `PdfNumber` | Integer or real | `42`, `-3.14` |
14+
| `PdfName` | Name token | `/Type`, `/Page` |
15+
| `PdfString` | Literal or hex string | `(Hello)`, `<48656C6C6F>` |
16+
| `PdfRef` | Indirect reference | `1 0 R` |
17+
| `PdfArray` | Array of objects | `[1 2 3]` |
18+
| `PdfDict` | Dictionary | `<< /Type /Page >>` |
19+
| `PdfStream` | Dict + binary data | `<< /Length 5 >> stream...` |
2020

2121
## Design Decisions
2222

2323
### Discriminated Union
24+
2425
All types share a `type` field for runtime discrimination. Enables type guards and switch statements.
2526

2627
### Interning for PdfName and PdfRef
28+
2729
These repeat constantly in PDFs. Interning via a static `.of()` factory:
30+
2831
- Saves memory (one instance per unique value)
2932
- Enables fast equality via `===`
3033

3134
### Mutable Containers with Mutation Hook
35+
3236
`PdfArray` and `PdfDict` are mutable — PDF modification requires changing entries. They support an optional `onMutate` callback that fires on changes (`set`, `push`, `delete`, etc.). The document layer wires this up for automatic dirty tracking during incremental saves.
3337

3438
### PdfString Stores Raw Bytes
39+
3540
PDF strings can contain binary data, and encoding is context-dependent (PDFDocEncoding, UTF-16BE, etc.). Store raw bytes; decode on demand.
3641

3742
### No Auto-Dereferencing
43+
3844
Containers store `PdfRef` as-is. Callers dereference via xref when needed. Matches PDF structure and enables lazy loading.
3945

4046
### PdfStream Extends PdfDict
47+
4148
Streams are dictionaries with attached binary data. Inheritance avoids duplication.
4249

4350
## File Structure
4451

4552
One file per type in `src/objects/`, plus:
53+
4654
- `object.ts` — Union type and type guards
4755

4856
## Implementation Order

0 commit comments

Comments
 (0)