Skip to content

Commit 5d4c10a

Browse files
committed
Initial project setup for pdfbox-ts
- TypeScript project with Biome linting/formatting, Vitest testing - Husky pre-commit hooks with lint-staged - GitHub Actions CI pipeline - PDFBox submodule in checkouts/ for reference - Agent documentation (AGENTS.md, .agents/) for LLM-assisted development - Example port: Vector class with tests - 1:1 file path mapping convention established
0 parents  commit 5d4c10a

File tree

27 files changed

+1016
-0
lines changed

27 files changed

+1016
-0
lines changed

.agents/ARCHITECTURE.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# PDFBox Architecture Overview
2+
3+
This document provides a high-level overview of Apache PDFBox's architecture for porting to TypeScript.
4+
5+
## Module Structure
6+
7+
| Java Module | TypeScript Path | Import Alias |
8+
|-------------|-----------------|--------------|
9+
| **pdfbox** (org.apache.pdfbox) | `src/core/` | `#core/*` |
10+
| **fontbox** (org.apache.fontbox) | `src/fontbox/` | `#fontbox/*` |
11+
| **io** (org.apache.pdfbox.io) | `src/io/` | `#io/*` |
12+
| **tools** | - | Skip |
13+
| **debugger** | - | Skip |
14+
| **examples** | - | Skip |
15+
16+
## Core Package Structure (pdfbox)
17+
18+
```
19+
org.apache.pdfbox/
20+
├── Loader # Main entry point for loading PDFs
21+
├── cos/ # COS (Carousel Object Structure) - low-level PDF objects
22+
│ ├── COSBase # Base class for all COS objects
23+
│ ├── COSDocument # Represents the COS document
24+
│ ├── COSArray, COSBoolean, COSDictionary, COSFloat, COSInteger, COSName...
25+
│ └── COSObject # Indirect object reference
26+
├── pdmodel/ # High-level PDF document model
27+
│ ├── PDDocument # Main document class
28+
│ ├── PDPage # Page object
29+
│ └── graphics/ # Colors, images, patterns
30+
├── pdfparser/ # PDF parsing
31+
│ ├── PDFParser # Main parser
32+
│ └── COSParser # COS object parsing
33+
├── pdfwriter/ # PDF writing/serialization
34+
├── contentstream/ # Content stream processing
35+
├── filter/ # Stream filters (Flate, ASCII85, etc.)
36+
├── text/ # Text extraction (PDFTextStripper)
37+
└── util/ # Utilities (Matrix, Vector, etc.)
38+
```
39+
40+
## Key Entry Points
41+
42+
1. **`Loader`** - Static factory for loading PDF documents
43+
2. **`PDDocument`** - Main document class for working with PDFs
44+
3. **`PDFParser`** - Low-level PDF parsing
45+
4. **`PDFTextStripper`** - Text extraction
46+
5. **`PDFRenderer`** - Rendering pages to images
47+
48+
## Porting Strategy
49+
50+
1. Start with **io** module (foundation for file reading)
51+
2. Port **cos** package (low-level PDF object model)
52+
3. Port **pdfparser** (parsing COS objects)
53+
4. Port **pdmodel** (high-level API)
54+
5. Port **fontbox** as needed
55+
6. Add text extraction, rendering, etc.
56+
57+
## TypeScript Mapping Conventions
58+
59+
| Java | TypeScript |
60+
|------|------------|
61+
| `float`/`double` | `number` |
62+
| `int`/`long` | `number` (or `bigint` for large values) |
63+
| `byte[]` | `Uint8Array` |
64+
| `List<T>` | `T[]` |
65+
| `Map<K,V>` | `Map<K,V>` |
66+
| `InputStream` | `ReadableStream` or custom |
67+
| `RandomAccessRead` | Custom interface with `Uint8Array` backing |
68+
| checked exceptions | Return types or thrown `Error` subclasses |

.agents/PROGRESS.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Porting Progress
2+
3+
Track which PDFBox packages/classes have been ported to TypeScript.
4+
5+
## Legend
6+
- ⬜ Not started
7+
- 🟡 In progress
8+
- ✅ Complete
9+
- ⏭️ Skipped (not needed)
10+
11+
## Core (`src/core/` ← org.apache.pdfbox)
12+
13+
### util/
14+
- ✅ Vector
15+
16+
### cos/
17+
- ⬜ COSBase
18+
- ⬜ COSObject
19+
- ⬜ COSDocument
20+
- ⬜ COSArray
21+
- ⬜ COSBoolean
22+
- ⬜ COSDictionary
23+
- ⬜ COSFloat
24+
- ⬜ COSInteger
25+
- ⬜ COSName
26+
- ⬜ COSNull
27+
- ⬜ COSString
28+
- ⬜ COSStream
29+
30+
### pdmodel/
31+
- ⬜ PDDocument
32+
- ⬜ PDPage
33+
- ⬜ PDPageTree
34+
35+
### pdfparser/
36+
- ⬜ PDFParser
37+
- ⬜ COSParser
38+
- ⬜ BaseParser
39+
40+
## IO (`src/io/` ← org.apache.pdfbox.io)
41+
42+
- ⬜ RandomAccessRead
43+
- ⬜ RandomAccessReadBuffer
44+
45+
## Fontbox (`src/fontbox/` ← org.apache.fontbox)
46+
47+
### ttf/
48+
- ⬜ TrueTypeFont
49+
- ⬜ TTFParser
50+
51+
### cff/
52+
- ⬜ CFFFont
53+
- ⬜ CFFParser

.agents/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# .agents
2+
3+
This directory is used by AI agents to track their work and decision-making process.
4+
5+
## Directories
6+
7+
### plans/
8+
Contains planning documents created during planning mode. These help track the approach and steps for implementing features or solving problems.
9+
10+
### justifications/
11+
Contains documents explaining why the agent made specific decisions. This provides transparency and helps with future reference when understanding past choices.
12+
13+
### scratch/
14+
Temporary workspace for notes, drafts, and work-in-progress content that doesn't need to be preserved long-term.

.agents/justifications/.gitkeep

Whitespace-only changes.

.agents/plans/.gitkeep

Whitespace-only changes.

.agents/scratch/.gitkeep

Whitespace-only changes.

.github/workflows/ci.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
ci:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v4
16+
17+
- name: Setup Bun
18+
uses: oven-sh/setup-bun@v2
19+
20+
- name: Install dependencies
21+
run: bun install --frozen-lockfile
22+
23+
- name: Lint
24+
run: bun run lint
25+
26+
- name: Type check
27+
run: bun run typecheck
28+
29+
- name: Test
30+
run: bun run test:run

.gitignore

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# dependencies (bun install)
2+
node_modules
3+
4+
# output
5+
out
6+
dist
7+
*.tgz
8+
9+
# code coverage
10+
coverage
11+
*.lcov
12+
13+
# logs
14+
logs
15+
_.log
16+
report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
17+
18+
# dotenv environment variable files
19+
.env
20+
.env.development.local
21+
.env.test.local
22+
.env.production.local
23+
.env.local
24+
25+
# caches
26+
.eslintcache
27+
.cache
28+
*.tsbuildinfo
29+
30+
# IntelliJ based IDEs
31+
.idea
32+
33+
# Finder (MacOS) folder config
34+
.DS_Store

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "checkouts/pdfbox"]
2+
path = checkouts/pdfbox
3+
url = https://github.com/apache/pdfbox.git

.husky/pre-commit

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
bunx lint-staged

0 commit comments

Comments
 (0)