PSPDFKit
diff --git a/‎github_issues/06_extract_pages.md‎
Lines changed: 78 additions & 0 deletions b/‎github_issues/06_extract_pages.md‎
Lines changed: 78 additions & 0 deletions
diff --git a/‎github_issues/06_convert_to_pdfa.md‎ renamed to ‎github_issues/07_convert_to_pdfa.md‎ b/‎github_issues/06_convert_to_pdfa.md‎ renamed to ‎github_issues/07_convert_to_pdfa.md‎
diff --git a/‎github_issues/07_convert_to_images.md‎ renamed to ‎github_issues/08_convert_to_images.md‎ b/‎github_issues/07_convert_to_images.md‎ renamed to ‎github_issues/08_convert_to_images.md‎
diff --git a/‎github_issues/08_extract_content.md‎ renamed to ‎github_issues/09_extract_content.md‎ b/‎github_issues/08_extract_content.md‎ renamed to ‎github_issues/09_extract_content.md‎
diff --git a/‎github_issues/10_convert_to_office.md‎
Lines changed: 95 additions & 0 deletions b/‎github_issues/10_convert_to_office.md‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎github_issues/09_ai_redact.md‎ renamed to ‎github_issues/11_ai_redact.md‎ b/‎github_issues/09_ai_redact.md‎ renamed to ‎github_issues/11_ai_redact.md‎
diff --git a/‎github_issues/10_digital_signature.md‎ renamed to ‎github_issues/12_digital_signature.md‎ b/‎github_issues/10_digital_signature.md‎ renamed to ‎github_issues/12_digital_signature.md‎
diff --git a/‎github_issues/13_batch_processing.md‎
Lines changed: 139 additions & 0 deletions b/‎github_issues/13_batch_processing.md‎
Lines changed: 139 additions & 0 deletions
@@ -0,0 +1,78 @@
+# Feature: Extract Page Range Method
+
+## Summary
+Implement `extract_pages()` as a simpler alternative to `split_pdf()` for extracting a continuous range of pages.
+
+## Proposed Implementation
+```python
+def extract_pages(
+    self,
+    input_file: FileInput,
+    start_page: int,
+    end_page: Optional[int] = None,  # None means to end
+    output_path: Optional[str] = None,
+) -> Optional[bytes]:
+```
+
+## Benefits
+- Simpler API than split_pdf for common use case
+- More intuitive for single range extraction
+- Clear intent and usage
+- Memory efficient for large documents
+
+## Implementation Details
+- Use Build API with single FilePart and page range
+- Support negative indexing (-1 for last page)
+- Handle "to end" extraction with None
+- Clear error messages for invalid ranges
+
+## Testing Requirements
+- [ ] Test single page extraction
+- [ ] Test range extraction
+- [ ] Test "to end" extraction (end_page=None)
+- [ ] Test negative page indexes
+- [ ] Test invalid ranges (start > end)
+- [ ] Test out of bounds pages
+
+## OpenAPI Reference
+- Uses FilePart with `pages` parameter
+- Page ranges use start/end format
+- Build API with single part
+
+## Use Case Example
+```python
+# Extract first 10 pages
+first_chapter = client.extract_pages(
+    "book.pdf",
+    start_page=0,
+    end_page=10
+)
+
+# Extract from page 50 to end
+appendix = client.extract_pages(
+    "book.pdf", 
+    start_page=50
+    # end_page=None means to end
+)
+
+# Extract single page
+cover = client.extract_pages(
+    "book.pdf",
+    start_page=0,
+    end_page=1
+)
+```
+
+## Relationship to split_pdf
+- `split_pdf`: Multiple ranges, multiple outputs
+- `extract_pages`: Single range, single output
+- This method is essentially `split_pdf` with a single range
+
+## Priority
+🟢 Priority 2 - Core missing method
+
+## Labels
+- feature
+- pdf-manipulation
+- pages
+- openapi-compliance
@@ -0,0 +1,95 @@
+# Feature: Convert to Office Formats Method
+
+## Summary
+Implement `convert_to_office()` to export PDFs to Microsoft Office formats (DOCX, XLSX, PPTX).
+
+## Proposed Implementation
+```python
+def convert_to_office(
+    self,
+    input_file: FileInput,
+    output_path: Optional[str] = None,
+    format: Literal["docx", "xlsx", "pptx"] = "docx",
+    ocr_language: Optional[Union[str, List[str]]] = None,  # Auto-OCR if needed
+) -> Optional[bytes]:
+```
+
+## Benefits
+- Edit PDFs in familiar Office applications
+- Preserve formatting and layout where possible
+- Automatic OCR for scanned documents
+- Workflow integration with Office 365
+- Accessibility improvements
+
+## Implementation Details
+- Use Build API with output type: `docx`, `xlsx`, or `pptx`
+- Automatic format detection based on content
+- OCR integration for scanned PDFs
+- Handle complex layouts gracefully
+
+## Testing Requirements
+- [ ] Test DOCX conversion (text documents)
+- [ ] Test XLSX conversion (tables/data)
+- [ ] Test PPTX conversion (presentations)
+- [ ] Test with scanned documents (OCR)
+- [ ] Test formatting preservation
+- [ ] Test with complex layouts
+- [ ] Test with forms and tables
+
+## OpenAPI Reference
+- Output types: `docx`, `xlsx`, `pptx`
+- Part of BuildOutput options
+- Supports OCR language parameter
+
+## Use Case Example
+```python
+# Convert PDF to Word for editing
+word_doc = client.convert_to_office(
+    "report.pdf",
+    format="docx",
+    output_path="report.docx"
+)
+
+# Convert scanned document with OCR
+editable_doc = client.convert_to_office(
+    "scanned_contract.pdf",
+    format="docx",
+    ocr_language=["english", "spanish"]
+)
+
+# Convert data PDF to Excel
+spreadsheet = client.convert_to_office(
+    "financial_data.pdf",
+    format="xlsx",
+    output_path="data.xlsx"
+)
+
+# Convert to PowerPoint
+presentation = client.convert_to_office(
+    "slides.pdf",
+    format="pptx"
+)
+```
+
+## Format Selection Guide
+- **DOCX**: Text-heavy documents, reports, contracts
+- **XLSX**: Data tables, financial reports, lists
+- **PPTX**: Presentations, slide decks
+
+## Known Limitations
+- Complex layouts may not convert perfectly
+- Some PDF features have no Office equivalent
+- Font substitution may occur
+- Interactive elements may be lost
+
+## Priority
+🟡 Priority 3 - Format conversion method
+
+## Labels
+- feature
+- conversion
+- office
+- docx
+- xlsx
+- pptx
+- openapi-compliance
@@ -0,0 +1,139 @@
+# Feature: Batch Processing Method
+
+## Summary
+Implement `batch_process()` for efficient processing of multiple files with the same operations.
+
+## Proposed Implementation
+```python
+def batch_process(
+    self,
+    input_files: List[FileInput],
+    operations: List[Dict[str, Any]],  # List of operations to apply
+    output_dir: Optional[str] = None,
+    output_format: str = "{name}_{index}{ext}",  # Naming pattern
+    parallel: bool = True,
+    max_workers: int = 4,
+    continue_on_error: bool = True,
+    progress_callback: Optional[Callable[[int, int], None]] = None,
+) -> BatchResult:
+```
+
+## Benefits
+- Process hundreds of files efficiently
+- Parallel processing for performance
+- Consistent operations across files
+- Progress tracking and reporting
+- Error recovery and partial results
+- Memory-efficient streaming
+
+## Implementation Details
+- Client-side enhancement (not in OpenAPI)
+- Use ThreadPoolExecutor for parallel processing
+- Implement retry logic for transient failures
+- Stream results to avoid memory issues
+- Provide detailed error reporting
+
+## BatchResult Structure
+```python
+@dataclass
+class BatchResult:
+    successful: List[Tuple[str, Union[bytes, str]]]  # (input_file, output)
+    failed: List[Tuple[str, Exception]]  # (input_file, error)
+    total_processed: int
+    processing_time: float
+    
+    @property
+    def success_rate(self) -> float:
+        return len(self.successful) / self.total_processed * 100
+```
+
+## Testing Requirements
+- [ ] Test sequential processing
+- [ ] Test parallel processing
+- [ ] Test error handling and recovery
+- [ ] Test progress callback
+- [ ] Test memory usage with large batches
+- [ ] Test interruption and resume
+- [ ] Test various operation combinations
+
+## Use Case Example
+```python
+# Add watermark to all PDFs in directory
+files = glob.glob("documents/*.pdf")
+result = client.batch_process(
+    input_files=files,
+    operations=[
+        {"method": "watermark_pdf", "params": {"text": "CONFIDENTIAL"}}
+    ],
+    output_dir="watermarked/",
+    parallel=True,
+    max_workers=8
+)
+
+print(f"Processed {result.total_processed} files")
+print(f"Success rate: {result.success_rate}%")
+
+# OCR and flatten multiple documents
+operations = [
+    {"method": "ocr_pdf", "params": {"language": "english"}},
+    {"method": "flatten_annotations", "params": {}}
+]
+
+def progress_update(current, total):
+    print(f"Processing {current}/{total}...")
+
+result = client.batch_process(
+    input_files=["scan1.pdf", "scan2.pdf", "scan3.pdf"],
+    operations=operations,
+    output_dir="processed/",
+    progress_callback=progress_update
+)
+
+# Complex workflow with error handling
+result = client.batch_process(
+    input_files=large_file_list,
+    operations=[
+        {"method": "rotate_pages", "params": {"degrees": 90, "page_indexes": [0]}},
+        {"method": "ocr_pdf", "params": {"language": ["english", "spanish"]}},
+        {"method": "convert_to_pdfa", "params": {"conformance": "pdfa-2b"}}
+    ],
+    continue_on_error=True,  # Don't stop on individual failures
+    output_format="processed_{name}_{index}{ext}"
+)
+
+# Review failures
+for file, error in result.failed:
+    print(f"Failed to process {file}: {error}")
+```
+
+## Operation Format
+```python
+{
+    "method": "method_name",  # Direct API method name
+    "params": {               # Method parameters
+        "param1": value1,
+        "param2": value2
+    }
+}
+```
+
+## Performance Considerations
+- Default 4 workers balances speed and API limits
+- Automatic retry with exponential backoff
+- Memory streaming for large files
+- Progress callback doesn't impact performance
+
+## Error Handling
+- Individual file failures don't stop batch
+- Detailed error information per file
+- Automatic retry for transient errors
+- Optional stop-on-error mode
+
+## Priority
+🟠 Priority 4 - Advanced feature
+
+## Labels
+- feature
+- performance
+- batch-processing
+- client-enhancement