|
1 | 1 | # AGENTS.md |
2 | 2 |
|
| 3 | +This file provides instructions for AI agents porting Apache PDFBox from Java to TypeScript. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +1. Read `.agents/ARCHITECTURE.md` for type mappings and patterns |
| 8 | +2. Check `.agents/PROGRESS.md` for what to port next |
| 9 | +3. Follow the porting order: io → cos → pdfparser → pdmodel → fontbox |
| 10 | +4. After porting a class, update PROGRESS.md and commit |
| 11 | + |
3 | 12 | ## Commands |
4 | 13 | - `bun install` - Install dependencies |
5 | 14 | - `bun run test` - Run tests in watch mode |
|
9 | 18 | - `bun run lint` - Check linting/formatting |
10 | 19 | - `bun run lint:fix` - Auto-fix linting/formatting |
11 | 20 |
|
| 21 | +## Validation Checklist |
| 22 | +Before committing, ensure: |
| 23 | +1. `bun run typecheck` passes |
| 24 | +2. `bun run lint` passes (or run `lint:fix`) |
| 25 | +3. `bun run test:run` passes |
| 26 | +4. Update `.agents/PROGRESS.md` to mark class as ✅ |
| 27 | + |
12 | 28 | ## Code Style |
13 | 29 | - **Formatting**: Tabs for indentation, double quotes for strings (enforced by Biome) |
14 | 30 | - **Imports**: Auto-organized by Biome; use `import type` for type-only imports (verbatimModuleSyntax) |
@@ -36,3 +52,227 @@ For packages not listed, follow the same pattern and add to `package.json` impor |
36 | 52 | ## Project Context |
37 | 53 | This is a 1:1 TypeScript port of Apache PDFBox. Reference implementation is in `checkouts/pdfbox/`. |
38 | 54 | See `.agents/ARCHITECTURE.md` for detailed architecture overview and `.agents/PROGRESS.md` for porting status. |
| 55 | + |
| 56 | +## Workflow Guidelines |
| 57 | + |
| 58 | +### Commits |
| 59 | +- Commit often with meaningful messages |
| 60 | +- Each commit should represent a logical unit of work |
| 61 | +- Use conventional commit style when appropriate (e.g., `feat:`, `fix:`, `refactor:`) |
| 62 | + |
| 63 | +### Planning & Documentation |
| 64 | +Write plans and notes to the `.agents/` directory: |
| 65 | + |
| 66 | +| Directory | Purpose | Format | |
| 67 | +|-----------|---------|--------| |
| 68 | +| `.agents/plans/` | Implementation plans before starting work | `[NNNN]-[filename].md` (e.g., `0001-cos-objects.md`) | |
| 69 | +| `.agents/scratch/` | Temporary notes, things to remember between sessions/compaction | Free-form | |
| 70 | +| `.agents/justifications/` | Decisions that deviate from 1:1 port or significant design choices | `[NNNN]-[topic].md` | |
| 71 | + |
| 72 | +### Justifications |
| 73 | +Document in `.agents/justifications/` when you: |
| 74 | +- Deviate from the Java implementation |
| 75 | +- Choose a different algorithm or data structure |
| 76 | +- Use TypeScript idioms instead of direct Java translation |
| 77 | +- Make performance trade-offs |
| 78 | +- Skip or significantly modify functionality |
| 79 | + |
| 80 | +### External Dependencies |
| 81 | +Consider using external dependencies when they make the code significantly cleaner or provide functionality that would be complex to implement. Examples: |
| 82 | +- **Graphics**: e.g., `skia-canvas` or `@napi-rs/canvas` for rendering operations |
| 83 | +- **Compression**: e.g., native libraries for zlib/deflate if needed |
| 84 | +- **Crypto**: e.g., Node's built-in `crypto` module |
| 85 | + |
| 86 | +Always document dependency decisions in `.agents/justifications/`. |
| 87 | + |
| 88 | +## Porting Workflow |
| 89 | + |
| 90 | +### Step-by-Step Process |
| 91 | +1. **Pick a class** from `.agents/PROGRESS.md` (follow dependency order) |
| 92 | +2. **Read the Java source** in `checkouts/pdfbox/` |
| 93 | +3. **Check for Java tests** in `src/test/java/...` — port these too |
| 94 | +4. **Create the TypeScript file** following package mapping |
| 95 | +5. **Port the code** using patterns from `.agents/ARCHITECTURE.md` |
| 96 | +6. **Write/port tests** as `*.test.ts` alongside the source |
| 97 | +7. **Run validation** (typecheck, lint, test) |
| 98 | +8. **Update PROGRESS.md** to mark as ✅ |
| 99 | +9. **Commit** with meaningful message |
| 100 | + |
| 101 | +### Dependency Order |
| 102 | +Port classes in order of dependencies. A class cannot be ported until its dependencies exist: |
| 103 | + |
| 104 | +``` |
| 105 | +1. io/ → Foundation (RandomAccessRead, etc.) |
| 106 | +2. cos/ → Depends on io (COSBase, COSName, COSDictionary, etc.) |
| 107 | +3. pdfparser/ → Depends on cos, io |
| 108 | +4. pdmodel/ → Depends on cos, pdfparser |
| 109 | +5. fontbox/ → Can be ported in parallel after io |
| 110 | +6. filter/ → Depends on cos, io |
| 111 | +7. text/ → Depends on pdmodel, contentstream |
| 112 | +8. rendering/ → Depends on pdmodel, graphics |
| 113 | +``` |
| 114 | + |
| 115 | +### Handling Circular Dependencies |
| 116 | +PDFBox has some circular dependencies. Strategies: |
| 117 | +1. **Forward declarations**: Use `type` imports for circular type references |
| 118 | +2. **Lazy initialization**: Defer imports using dynamic `import()` |
| 119 | +3. **Interface extraction**: Create interfaces in separate files |
| 120 | +4. Document any circular dependency solutions in `.agents/justifications/` |
| 121 | + |
| 122 | +### Stub Dependencies |
| 123 | +If you need a class that isn't ported yet: |
| 124 | +1. Create a minimal stub with `// TODO: implement` comments |
| 125 | +2. Mark it as 🟡 (in progress) in PROGRESS.md |
| 126 | +3. Implement just enough to unblock current work |
| 127 | +4. Add a note in `.agents/scratch/` about what needs completion |
| 128 | + |
| 129 | +## Common Patterns |
| 130 | + |
| 131 | +### Static Factory Methods |
| 132 | +```typescript |
| 133 | +// Java: COSInteger.get(long val) |
| 134 | +// TypeScript: Use static method with caching |
| 135 | +class COSInteger { |
| 136 | + private static readonly CACHE: COSInteger[] = []; |
| 137 | + |
| 138 | + static get(val: number): COSInteger { |
| 139 | + // Check cache first |
| 140 | + if (val >= -100 && val <= 256) { |
| 141 | + return COSInteger.CACHE[val + 100] ??= new COSInteger(val); |
| 142 | + } |
| 143 | + return new COSInteger(val); |
| 144 | + } |
| 145 | + |
| 146 | + private constructor(private readonly value: number) {} |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +### Visitor Pattern |
| 151 | +```typescript |
| 152 | +// Java uses accept(ICOSVisitor) |
| 153 | +// TypeScript: Same pattern with interface |
| 154 | +interface ICOSVisitor<T> { |
| 155 | + visitFromArray(obj: COSArray): T; |
| 156 | + visitFromBoolean(obj: COSBoolean): T; |
| 157 | + visitFromDictionary(obj: COSDictionary): T; |
| 158 | + // ... |
| 159 | +} |
| 160 | + |
| 161 | +abstract class COSBase { |
| 162 | + abstract accept<T>(visitor: ICOSVisitor<T>): T; |
| 163 | +} |
| 164 | +``` |
| 165 | + |
| 166 | +### Resource Management |
| 167 | +```typescript |
| 168 | +// Java: try-with-resources / Closeable |
| 169 | +// TypeScript: Use Symbol.dispose (requires TS 5.2+) or explicit close() |
| 170 | + |
| 171 | +interface Closeable { |
| 172 | + close(): void; |
| 173 | +} |
| 174 | + |
| 175 | +// Or with using declaration (TS 5.2+) |
| 176 | +class RandomAccessReadBuffer implements Disposable { |
| 177 | + [Symbol.dispose](): void { |
| 178 | + this.close(); |
| 179 | + } |
| 180 | +} |
| 181 | +``` |
| 182 | + |
| 183 | +### Streams and Iteration |
| 184 | +```typescript |
| 185 | +// Java: InputStream with read() |
| 186 | +// TypeScript: Use Uint8Array and views |
| 187 | + |
| 188 | +class RandomAccessReadBuffer { |
| 189 | + private buffer: Uint8Array; |
| 190 | + private position: number = 0; |
| 191 | + |
| 192 | + read(): number { |
| 193 | + if (this.position >= this.buffer.length) return -1; |
| 194 | + return this.buffer[this.position++]; |
| 195 | + } |
| 196 | + |
| 197 | + readBytes(length: number): Uint8Array { |
| 198 | + const result = this.buffer.slice(this.position, this.position + length); |
| 199 | + this.position += result.length; |
| 200 | + return result; |
| 201 | + } |
| 202 | +} |
| 203 | +``` |
| 204 | + |
| 205 | +### Synchronized/Thread Safety |
| 206 | +```typescript |
| 207 | +// Java: synchronized keyword |
| 208 | +// TypeScript: JavaScript is single-threaded, usually not needed |
| 209 | +// If porting synchronized code, just remove the keyword |
| 210 | +// Document in justifications if there are concerns about Worker threads |
| 211 | +``` |
| 212 | + |
| 213 | +## Error Handling |
| 214 | + |
| 215 | +### Custom Exceptions |
| 216 | +```typescript |
| 217 | +// Create error classes matching Java exceptions |
| 218 | +export class IOException extends Error { |
| 219 | + constructor(message: string, public readonly cause?: Error) { |
| 220 | + super(message); |
| 221 | + this.name = "IOException"; |
| 222 | + } |
| 223 | +} |
| 224 | + |
| 225 | +// Usage |
| 226 | +throw new IOException("Failed to read PDF", originalError); |
| 227 | +``` |
| 228 | + |
| 229 | +### Checked vs Unchecked |
| 230 | +Java has checked exceptions; TypeScript does not. Strategies: |
| 231 | +1. **Document throws** in JSDoc comments |
| 232 | +2. **Use Result types** for recoverable errors (optional) |
| 233 | +3. **Let errors propagate** for unrecoverable errors |
| 234 | + |
| 235 | +## Testing |
| 236 | + |
| 237 | +### Test File Location |
| 238 | +``` |
| 239 | +src/core/cos/COSName.ts → src/core/cos/COSName.test.ts |
| 240 | +src/io/RandomAccessRead.ts → src/io/RandomAccessRead.test.ts |
| 241 | +``` |
| 242 | + |
| 243 | +### Test Structure |
| 244 | +```typescript |
| 245 | +import { describe, it, expect } from "vitest"; |
| 246 | +import { COSName } from "./COSName.ts"; |
| 247 | + |
| 248 | +describe("COSName", () => { |
| 249 | + describe("getPDFName", () => { |
| 250 | + it("should return name for standard names", () => { |
| 251 | + const name = COSName.getPDFName("Type"); |
| 252 | + expect(name.getName()).toBe("Type"); |
| 253 | + }); |
| 254 | + }); |
| 255 | +}); |
| 256 | +``` |
| 257 | + |
| 258 | +### Porting Java Tests |
| 259 | +1. Find test in `checkouts/pdfbox/pdfbox/src/test/java/org/apache/pdfbox/...` |
| 260 | +2. Convert JUnit assertions to Vitest: |
| 261 | + - `assertEquals(expected, actual)` → `expect(actual).toBe(expected)` |
| 262 | + - `assertTrue(condition)` → `expect(condition).toBe(true)` |
| 263 | + - `assertNotNull(obj)` → `expect(obj).not.toBeNull()` |
| 264 | + - `assertThrows(Exception.class, () -> ...)` → `expect(() => ...).toThrow()` |
| 265 | + |
| 266 | +## Debugging Tips |
| 267 | + |
| 268 | +### When Stuck |
| 269 | +1. Check if there's a similar pattern already ported |
| 270 | +2. Read the Java tests for expected behavior |
| 271 | +3. Look at PDFBox JIRA for context on complex code |
| 272 | +4. Write a note in `.agents/scratch/` and move on |
| 273 | + |
| 274 | +### Common Issues |
| 275 | +- **Import cycles**: Use `import type` or restructure |
| 276 | +- **Null vs undefined**: Java null → TypeScript `null | undefined` |
| 277 | +- **Generics**: TypeScript generics are erased; may need runtime checks |
| 278 | +- **Reflection**: PDFBox uses some reflection; may need alternative approaches |
0 commit comments