Skip to content

Commit 0ac60da

Browse files
committed
Add implementation plan, test fixtures, and test utilities
- Create phased implementation plan (.agents/plans/0001-implementation.md) covering 12 phases from io through pdfwriter (~665 classes) - Add 21 PDF test fixtures (254KB) from PDFBox test suite organized by: - basic/: minimal PDFs for core parsing tests - xref/: XRef table and stream parsing (incl. object streams) - filter/: compression tests (Flate, LZW, ASCII85) - encryption/: password-protected PDFs (RC4, AES 40/128/256-bit) - malformed/: error handling and recovery tests - text/: text extraction tests - Add test utilities (src/test-utils.ts) for fixture loading, byte array helpers, and PDF header validation - Enhance vitest.config.ts with path aliases, coverage config, and longer timeout for PDF parsing tests - Expand ARCHITECTURE.md with detailed porting patterns for numeric types, method overloading, equals/hashCode, and more - Add test:coverage script to package.json
1 parent 03cbf09 commit 0ac60da

30 files changed

+1588
-3
lines changed

.agents/ARCHITECTURE.md

Lines changed: 403 additions & 1 deletion
Large diffs are not rendered by default.

.agents/plans/0001-implementation.md

Lines changed: 509 additions & 0 deletions
Large diffs are not rendered by default.

.github/workflows/ci.yml

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,14 @@ on:
66
pull_request:
77
branches: [main]
88

9+
concurrency:
10+
group: ${{ github.workflow }}-${{ github.ref }}
11+
cancel-in-progress: true
12+
913
jobs:
10-
ci:
14+
lint:
15+
name: Lint
1116
runs-on: ubuntu-latest
12-
1317
steps:
1418
- name: Checkout
1519
uses: actions/checkout@v4
@@ -23,8 +27,34 @@ jobs:
2327
- name: Lint
2428
run: bun run lint
2529

30+
typecheck:
31+
name: Type Check
32+
runs-on: ubuntu-latest
33+
steps:
34+
- name: Checkout
35+
uses: actions/checkout@v4
36+
37+
- name: Setup Bun
38+
uses: oven-sh/setup-bun@v2
39+
40+
- name: Install dependencies
41+
run: bun install --frozen-lockfile
42+
2643
- name: Type check
2744
run: bun run typecheck
2845

46+
test:
47+
name: Test
48+
runs-on: ubuntu-latest
49+
steps:
50+
- name: Checkout
51+
uses: actions/checkout@v4
52+
53+
- name: Setup Bun
54+
uses: oven-sh/setup-bun@v2
55+
56+
- name: Install dependencies
57+
run: bun install --frozen-lockfile
58+
2959
- name: Test
3060
run: bun run test:run

AGENTS.md

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
# AGENTS.md
22

3+
This file provides instructions for AI agents porting Apache PDFBox from Java to TypeScript.
4+
5+
## Quick Start
6+
7+
1. Read `.agents/ARCHITECTURE.md` for type mappings and patterns
8+
2. Check `.agents/PROGRESS.md` for what to port next
9+
3. Follow the porting order: io → cos → pdfparser → pdmodel → fontbox
10+
4. After porting a class, update PROGRESS.md and commit
11+
312
## Commands
413
- `bun install` - Install dependencies
514
- `bun run test` - Run tests in watch mode
@@ -9,6 +18,13 @@
918
- `bun run lint` - Check linting/formatting
1019
- `bun run lint:fix` - Auto-fix linting/formatting
1120

21+
## Validation Checklist
22+
Before committing, ensure:
23+
1. `bun run typecheck` passes
24+
2. `bun run lint` passes (or run `lint:fix`)
25+
3. `bun run test:run` passes
26+
4. Update `.agents/PROGRESS.md` to mark class as ✅
27+
1228
## Code Style
1329
- **Formatting**: Tabs for indentation, double quotes for strings (enforced by Biome)
1430
- **Imports**: Auto-organized by Biome; use `import type` for type-only imports (verbatimModuleSyntax)
@@ -36,3 +52,227 @@ For packages not listed, follow the same pattern and add to `package.json` impor
3652
## Project Context
3753
This is a 1:1 TypeScript port of Apache PDFBox. Reference implementation is in `checkouts/pdfbox/`.
3854
See `.agents/ARCHITECTURE.md` for detailed architecture overview and `.agents/PROGRESS.md` for porting status.
55+
56+
## Workflow Guidelines
57+
58+
### Commits
59+
- Commit often with meaningful messages
60+
- Each commit should represent a logical unit of work
61+
- Use conventional commit style when appropriate (e.g., `feat:`, `fix:`, `refactor:`)
62+
63+
### Planning & Documentation
64+
Write plans and notes to the `.agents/` directory:
65+
66+
| Directory | Purpose | Format |
67+
|-----------|---------|--------|
68+
| `.agents/plans/` | Implementation plans before starting work | `[NNNN]-[filename].md` (e.g., `0001-cos-objects.md`) |
69+
| `.agents/scratch/` | Temporary notes, things to remember between sessions/compaction | Free-form |
70+
| `.agents/justifications/` | Decisions that deviate from 1:1 port or significant design choices | `[NNNN]-[topic].md` |
71+
72+
### Justifications
73+
Document in `.agents/justifications/` when you:
74+
- Deviate from the Java implementation
75+
- Choose a different algorithm or data structure
76+
- Use TypeScript idioms instead of direct Java translation
77+
- Make performance trade-offs
78+
- Skip or significantly modify functionality
79+
80+
### External Dependencies
81+
Consider using external dependencies when they make the code significantly cleaner or provide functionality that would be complex to implement. Examples:
82+
- **Graphics**: e.g., `skia-canvas` or `@napi-rs/canvas` for rendering operations
83+
- **Compression**: e.g., native libraries for zlib/deflate if needed
84+
- **Crypto**: e.g., Node's built-in `crypto` module
85+
86+
Always document dependency decisions in `.agents/justifications/`.
87+
88+
## Porting Workflow
89+
90+
### Step-by-Step Process
91+
1. **Pick a class** from `.agents/PROGRESS.md` (follow dependency order)
92+
2. **Read the Java source** in `checkouts/pdfbox/`
93+
3. **Check for Java tests** in `src/test/java/...` — port these too
94+
4. **Create the TypeScript file** following package mapping
95+
5. **Port the code** using patterns from `.agents/ARCHITECTURE.md`
96+
6. **Write/port tests** as `*.test.ts` alongside the source
97+
7. **Run validation** (typecheck, lint, test)
98+
8. **Update PROGRESS.md** to mark as ✅
99+
9. **Commit** with meaningful message
100+
101+
### Dependency Order
102+
Port classes in order of dependencies. A class cannot be ported until its dependencies exist:
103+
104+
```
105+
1. io/ → Foundation (RandomAccessRead, etc.)
106+
2. cos/ → Depends on io (COSBase, COSName, COSDictionary, etc.)
107+
3. pdfparser/ → Depends on cos, io
108+
4. pdmodel/ → Depends on cos, pdfparser
109+
5. fontbox/ → Can be ported in parallel after io
110+
6. filter/ → Depends on cos, io
111+
7. text/ → Depends on pdmodel, contentstream
112+
8. rendering/ → Depends on pdmodel, graphics
113+
```
114+
115+
### Handling Circular Dependencies
116+
PDFBox has some circular dependencies. Strategies:
117+
1. **Forward declarations**: Use `type` imports for circular type references
118+
2. **Lazy initialization**: Defer imports using dynamic `import()`
119+
3. **Interface extraction**: Create interfaces in separate files
120+
4. Document any circular dependency solutions in `.agents/justifications/`
121+
122+
### Stub Dependencies
123+
If you need a class that isn't ported yet:
124+
1. Create a minimal stub with `// TODO: implement` comments
125+
2. Mark it as 🟡 (in progress) in PROGRESS.md
126+
3. Implement just enough to unblock current work
127+
4. Add a note in `.agents/scratch/` about what needs completion
128+
129+
## Common Patterns
130+
131+
### Static Factory Methods
132+
```typescript
133+
// Java: COSInteger.get(long val)
134+
// TypeScript: Use static method with caching
135+
class COSInteger {
136+
private static readonly CACHE: COSInteger[] = [];
137+
138+
static get(val: number): COSInteger {
139+
// Check cache first
140+
if (val >= -100 && val <= 256) {
141+
return COSInteger.CACHE[val + 100] ??= new COSInteger(val);
142+
}
143+
return new COSInteger(val);
144+
}
145+
146+
private constructor(private readonly value: number) {}
147+
}
148+
```
149+
150+
### Visitor Pattern
151+
```typescript
152+
// Java uses accept(ICOSVisitor)
153+
// TypeScript: Same pattern with interface
154+
interface ICOSVisitor<T> {
155+
visitFromArray(obj: COSArray): T;
156+
visitFromBoolean(obj: COSBoolean): T;
157+
visitFromDictionary(obj: COSDictionary): T;
158+
// ...
159+
}
160+
161+
abstract class COSBase {
162+
abstract accept<T>(visitor: ICOSVisitor<T>): T;
163+
}
164+
```
165+
166+
### Resource Management
167+
```typescript
168+
// Java: try-with-resources / Closeable
169+
// TypeScript: Use Symbol.dispose (requires TS 5.2+) or explicit close()
170+
171+
interface Closeable {
172+
close(): void;
173+
}
174+
175+
// Or with using declaration (TS 5.2+)
176+
class RandomAccessReadBuffer implements Disposable {
177+
[Symbol.dispose](): void {
178+
this.close();
179+
}
180+
}
181+
```
182+
183+
### Streams and Iteration
184+
```typescript
185+
// Java: InputStream with read()
186+
// TypeScript: Use Uint8Array and views
187+
188+
class RandomAccessReadBuffer {
189+
private buffer: Uint8Array;
190+
private position: number = 0;
191+
192+
read(): number {
193+
if (this.position >= this.buffer.length) return -1;
194+
return this.buffer[this.position++];
195+
}
196+
197+
readBytes(length: number): Uint8Array {
198+
const result = this.buffer.slice(this.position, this.position + length);
199+
this.position += result.length;
200+
return result;
201+
}
202+
}
203+
```
204+
205+
### Synchronized/Thread Safety
206+
```typescript
207+
// Java: synchronized keyword
208+
// TypeScript: JavaScript is single-threaded, usually not needed
209+
// If porting synchronized code, just remove the keyword
210+
// Document in justifications if there are concerns about Worker threads
211+
```
212+
213+
## Error Handling
214+
215+
### Custom Exceptions
216+
```typescript
217+
// Create error classes matching Java exceptions
218+
export class IOException extends Error {
219+
constructor(message: string, public readonly cause?: Error) {
220+
super(message);
221+
this.name = "IOException";
222+
}
223+
}
224+
225+
// Usage
226+
throw new IOException("Failed to read PDF", originalError);
227+
```
228+
229+
### Checked vs Unchecked
230+
Java has checked exceptions; TypeScript does not. Strategies:
231+
1. **Document throws** in JSDoc comments
232+
2. **Use Result types** for recoverable errors (optional)
233+
3. **Let errors propagate** for unrecoverable errors
234+
235+
## Testing
236+
237+
### Test File Location
238+
```
239+
src/core/cos/COSName.ts → src/core/cos/COSName.test.ts
240+
src/io/RandomAccessRead.ts → src/io/RandomAccessRead.test.ts
241+
```
242+
243+
### Test Structure
244+
```typescript
245+
import { describe, it, expect } from "vitest";
246+
import { COSName } from "./COSName.ts";
247+
248+
describe("COSName", () => {
249+
describe("getPDFName", () => {
250+
it("should return name for standard names", () => {
251+
const name = COSName.getPDFName("Type");
252+
expect(name.getName()).toBe("Type");
253+
});
254+
});
255+
});
256+
```
257+
258+
### Porting Java Tests
259+
1. Find test in `checkouts/pdfbox/pdfbox/src/test/java/org/apache/pdfbox/...`
260+
2. Convert JUnit assertions to Vitest:
261+
- `assertEquals(expected, actual)``expect(actual).toBe(expected)`
262+
- `assertTrue(condition)``expect(condition).toBe(true)`
263+
- `assertNotNull(obj)``expect(obj).not.toBeNull()`
264+
- `assertThrows(Exception.class, () -> ...)``expect(() => ...).toThrow()`
265+
266+
## Debugging Tips
267+
268+
### When Stuck
269+
1. Check if there's a similar pattern already ported
270+
2. Read the Java tests for expected behavior
271+
3. Look at PDFBox JIRA for context on complex code
272+
4. Write a note in `.agents/scratch/` and move on
273+
274+
### Common Issues
275+
- **Import cycles**: Use `import type` or restructure
276+
- **Null vs undefined**: Java null → TypeScript `null | undefined`
277+
- **Generics**: TypeScript generics are erased; may need runtime checks
278+
- **Reflection**: PDFBox uses some reflection; may need alternative approaches

0 commit comments

Comments
 (0)