|
| 1 | +# Synchronous Object Resolution Spec |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Change the `ObjectRegistry.resolve()` method and related infrastructure from async to sync. This enables code patterns like: |
| 6 | + |
| 7 | +```typescript |
| 8 | +if (value instanceof PdfRef) { |
| 9 | + value = ctx.resolve(value); |
| 10 | +} |
| 11 | + |
| 12 | +if (value instanceof PdfDict) { |
| 13 | + doSomething(value); |
| 14 | +} |
| 15 | +``` |
| 16 | + |
| 17 | +Without making the containing method async. |
| 18 | + |
| 19 | +## Current State |
| 20 | + |
| 21 | +### Filters: Already Sync |
| 22 | + |
| 23 | +The filter system is **already synchronous**: |
| 24 | + |
| 25 | +```typescript |
| 26 | +// src/filters/filter.ts |
| 27 | +interface Filter { |
| 28 | + decode(data: Uint8Array, params?: PdfDict): Uint8Array; // sync |
| 29 | + encode(data: Uint8Array, params?: PdfDict): Uint8Array; // sync |
| 30 | +} |
| 31 | + |
| 32 | +// src/filters/filter-pipeline.ts |
| 33 | +static decode(data: Uint8Array, filters: FilterSpec[]): Uint8Array; // sync |
| 34 | +static encode(data: Uint8Array, filters: FilterSpec[]): Uint8Array; // sync |
| 35 | + |
| 36 | +// src/objects/pdf-stream.ts |
| 37 | +getDecodedData(): Uint8Array; // sync |
| 38 | +getEncodedData(): Uint8Array; // sync |
| 39 | +``` |
| 40 | + |
| 41 | +However, call sites often use `await` unnecessarily (which works but is misleading). |
| 42 | + |
| 43 | +### Resolver: Async Interface, Sync Implementation |
| 44 | + |
| 45 | +The `ObjectResolver` type is async: |
| 46 | + |
| 47 | +```typescript |
| 48 | +// src/document/object-registry.ts |
| 49 | +type ObjectResolver = (ref: PdfRef) => Promise<PdfObject | null>; |
| 50 | + |
| 51 | +async resolve(ref: PdfRef): Promise<PdfObject | null> { |
| 52 | + const existing = this.getObject(ref); |
| 53 | + if (existing !== null) return existing; |
| 54 | + |
| 55 | + if (this.resolver) { |
| 56 | + const obj = await this.resolver(ref); // async call |
| 57 | + if (obj !== null) this.addLoaded(ref, obj); |
| 58 | + return obj; |
| 59 | + } |
| 60 | + return null; |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +But the actual resolver implementation in `DocumentParser.buildDocument()` is **synchronous**: |
| 65 | + |
| 66 | +```typescript |
| 67 | +// src/parser/document-parser.ts (line 573) |
| 68 | +const getObject = (ref: PdfRef): PdfObject | null => { |
| 69 | + // ... entirely synchronous: xref lookup, parsing, decryption |
| 70 | + return obj; |
| 71 | +}; |
| 72 | +``` |
| 73 | + |
| 74 | +The async wrapper exists for historical reasons (possibly anticipating streaming/lazy file I/O), but all current operations are synchronous in-memory operations. |
| 75 | + |
| 76 | +### Call Site Count |
| 77 | + |
| 78 | +- ~60 `await ctx.resolve(...)` calls across 15 source files |
| 79 | +- ~96 `instanceof PdfRef` checks that often precede resolution |
| 80 | +- Many methods are async solely because they call `resolve()` |
| 81 | + |
| 82 | +## Proposed Changes |
| 83 | + |
| 84 | +### 1. Change `ObjectResolver` Type to Sync |
| 85 | + |
| 86 | +```typescript |
| 87 | +// src/document/object-registry.ts |
| 88 | +export type ObjectResolver = (ref: PdfRef) => PdfObject | null; // was Promise<...> |
| 89 | +``` |
| 90 | + |
| 91 | +### 2. Change `ObjectRegistry.resolve()` to Sync |
| 92 | + |
| 93 | +```typescript |
| 94 | +// src/document/object-registry.ts |
| 95 | +resolve(ref: PdfRef): PdfObject | null { // was async, returned Promise |
| 96 | + const existing = this.getObject(ref); |
| 97 | + if (existing !== null) return existing; |
| 98 | + |
| 99 | + if (this.resolver) { |
| 100 | + const obj = this.resolver(ref); // sync call |
| 101 | + if (obj !== null) this.addLoaded(ref, obj); |
| 102 | + return obj; |
| 103 | + } |
| 104 | + return null; |
| 105 | +} |
| 106 | +``` |
| 107 | + |
| 108 | +### 3. Update `PDFContext.resolve()` to Sync |
| 109 | + |
| 110 | +```typescript |
| 111 | +// src/api/pdf-context.ts |
| 112 | +resolve(ref: PdfRef): PdfObject | null { // was async |
| 113 | + return this.registry.resolve(ref); |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +### 4. Remove Unnecessary `await` from Call Sites |
| 118 | + |
| 119 | +Transform ~60 call sites from: |
| 120 | + |
| 121 | +```typescript |
| 122 | +const resolved = await ctx.resolve(ref); |
| 123 | +``` |
| 124 | + |
| 125 | +To: |
| 126 | + |
| 127 | +```typescript |
| 128 | +const resolved = ctx.resolve(ref); |
| 129 | +``` |
| 130 | + |
| 131 | +### 5. Remove `async` from Methods That Only Awaited `resolve()` |
| 132 | + |
| 133 | +Many methods are async only because they called `resolve()`. These can become sync: |
| 134 | + |
| 135 | +```typescript |
| 136 | +// Before |
| 137 | +async getResources(): Promise<PdfDict | null> { |
| 138 | + const resources = await this.ctx.resolve(this.dict.getRef("Resources")); |
| 139 | + return resources instanceof PdfDict ? resources : null; |
| 140 | +} |
| 141 | + |
| 142 | +// After |
| 143 | +getResources(): PdfDict | null { |
| 144 | + const resources = this.ctx.resolve(this.dict.getRef("Resources")); |
| 145 | + return resources instanceof PdfDict ? resources : null; |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +### 6. Update `NameTree` Resolver Type |
| 150 | + |
| 151 | +```typescript |
| 152 | +// src/document/name-tree.ts |
| 153 | +export type Resolver = (ref: PdfRef) => PdfObject | null; // was Promise<...> |
| 154 | +``` |
| 155 | + |
| 156 | +### 7. Remove Unnecessary `await` from Filter Calls |
| 157 | + |
| 158 | +While filters are already sync, many call sites use `await`: |
| 159 | + |
| 160 | +```typescript |
| 161 | +// Before (works but misleading) |
| 162 | +const decoded = await stream.getDecodedData(); |
| 163 | + |
| 164 | +// After (explicit about sync nature) |
| 165 | +const decoded = stream.getDecodedData(); |
| 166 | +``` |
| 167 | + |
| 168 | +## Files to Modify |
| 169 | + |
| 170 | +### Core Infrastructure |
| 171 | + |
| 172 | +| File | Changes | |
| 173 | +| --------------------------------- | ----------------------------------------------------------- | |
| 174 | +| `src/document/object-registry.ts` | Change `ObjectResolver` type and `resolve()` method to sync | |
| 175 | +| `src/api/pdf-context.ts` | Change `resolve()` to sync | |
| 176 | +| `src/document/name-tree.ts` | Change `Resolver` type and methods to sync | |
| 177 | + |
| 178 | +### API Layer (~15 files) |
| 179 | + |
| 180 | +| File | Estimated Changes | |
| 181 | +| ---------------------------- | --------------------------------------------------- | |
| 182 | +| `src/api/pdf.ts` | ~10 await removals, several methods can become sync | |
| 183 | +| `src/api/pdf-page.ts` | ~15 await removals, most methods can become sync | |
| 184 | +| `src/api/pdf-catalog.ts` | ~5 await removals | |
| 185 | +| `src/api/pdf-attachments.ts` | ~5 await removals | |
| 186 | +| `src/api/pdf-context.ts` | ~2 changes | |
| 187 | + |
| 188 | +### Document Layer |
| 189 | + |
| 190 | +| File | Estimated Changes | |
| 191 | +| ----------------------------------------- | ----------------- | |
| 192 | +| `src/document/forms/acro-form.ts` | ~8 await removals | |
| 193 | +| `src/document/forms/field-tree.ts` | ~5 await removals | |
| 194 | +| `src/document/forms/fields/base.ts` | ~3 await removals | |
| 195 | +| `src/document/forms/form-flattener.ts` | ~5 await removals | |
| 196 | +| `src/document/forms/widget-annotation.ts` | ~3 await removals | |
| 197 | +| `src/document/object-copier.ts` | ~5 await removals | |
| 198 | + |
| 199 | +### Other |
| 200 | + |
| 201 | +| File | Estimated Changes | |
| 202 | +| ----------------------------------- | ----------------- | |
| 203 | +| `src/layers/layers.ts` | ~6 await removals | |
| 204 | +| `src/annotations/*.ts` | ~5 await removals | |
| 205 | +| `src/signatures/ltv/dss-builder.ts` | ~3 await removals | |
| 206 | + |
| 207 | +## Migration Strategy |
| 208 | + |
| 209 | +### Phase 1: Core Changes |
| 210 | + |
| 211 | +1. Update `ObjectResolver` type to sync |
| 212 | +2. Update `ObjectRegistry.resolve()` to sync |
| 213 | +3. Update `PDFContext.resolve()` to sync |
| 214 | +4. Update `NameTree` resolver |
| 215 | + |
| 216 | +### Phase 2: Call Site Updates (can be parallelized) |
| 217 | + |
| 218 | +For each file: |
| 219 | + |
| 220 | +1. Remove `await` from `resolve()` calls |
| 221 | +2. Remove `await` from filter calls (optional but good cleanup) |
| 222 | +3. Check if method can become sync (no other awaits) |
| 223 | +4. If yes, change signature and update callers |
| 224 | + |
| 225 | +### Phase 3: Test Updates |
| 226 | + |
| 227 | +1. Update tests to use sync patterns |
| 228 | +2. Remove unnecessary `await` in test code |
| 229 | + |
| 230 | +## Benefits |
| 231 | + |
| 232 | +1. **Simpler code patterns** - No async/await gymnastics for simple type checks |
| 233 | +2. **Better TypeScript inference** - Sync narrowing works immediately |
| 234 | +3. **Performance** - Eliminates microtask queue overhead from Promise wrapping |
| 235 | +4. **Consistency** - Filters and resolution both sync; easier mental model |
| 236 | +5. **Enables sync APIs** - Can offer sync versions of common operations |
| 237 | + |
| 238 | +## Risks and Mitigations |
| 239 | + |
| 240 | +### Risk: Future Streaming Support |
| 241 | + |
| 242 | +If we later want streaming/lazy file I/O, resolution would need to be async. |
| 243 | + |
| 244 | +**Mitigation**: We can re-introduce async at the `PDF.load()` boundary. The resolver callback could do eager loading of the entire xref at load time, keeping `resolve()` sync. Alternatively, provide both sync and async APIs. |
| 245 | + |
| 246 | +### Risk: Large Refactor |
| 247 | + |
| 248 | +~60+ call sites need updating. |
| 249 | + |
| 250 | +**Mitigation**: This is mechanical and can be done systematically. TypeScript will catch any missed sites (type errors from awaiting non-Promises or assigning `PdfObject | null` where `Promise<...>` was expected). |
| 251 | + |
| 252 | +## Non-Goals |
| 253 | + |
| 254 | +- Changing the public `PDF.load()` API (remains async for file I/O) |
| 255 | +- Changing the save/write pipeline (has legitimate async needs) |
| 256 | +- Making everything sync (some operations genuinely need async) |
| 257 | + |
| 258 | +## Success Criteria |
| 259 | + |
| 260 | +1. All tests pass |
| 261 | +2. No `async` methods that only awaited `resolve()` |
| 262 | +3. TypeScript compiles without errors |
| 263 | +4. `resolve()` calls no longer use `await` |
0 commit comments