docs: comprehensive future enhancement plan with GitHub issue templates

jdrhyne · jdrhyne · commit 063770658224 · 2025-06-24T22:27:14.000-04:00
Created detailed enhancement roadmap based on OpenAPI v1.9.0 analysis:

📋 Enhancement Plan:
- 13 proposed enhancements across 4 priority levels
- Detailed implementation specifications
- Testing requirements and use cases
- Recommended 4-phase implementation timeline

📁 GitHub Issue Templates:
- Individual issue template for each enhancement
- Consistent format with implementation details
- OpenAPI references and code examples
- Priority levels and labels

🎯 Goals:
- Increase API coverage from ~30% to ~80%
- Maintain backward compatibility
- Add most requested features
- Follow OpenAPI specification precisely

This provides a clear roadmap for community contributions and
systematic feature development.
diff --git a/github_issues/06_convert_to_pdfa.md b/github_issues/06_convert_to_pdfa.md
@@ -0,0 +1,76 @@
+# Feature: Convert to PDF/A Method
+
+## Summary
+Implement `convert_to_pdfa()` to convert PDFs to PDF/A archival format for long-term preservation and compliance.
+
+## Proposed Implementation
+```python
+def convert_to_pdfa(
+    self,
+    input_file: FileInput,
+    output_path: Optional[str] = None,
+    conformance: Literal["pdfa-1a", "pdfa-1b", "pdfa-2a", "pdfa-2u", "pdfa-2b", "pdfa-3a", "pdfa-3u"] = "pdfa-2b",
+    vectorization: bool = True,
+    rasterization: bool = True,
+) -> Optional[bytes]:
+```
+
+## Benefits
+- Long-term archival compliance (ISO 19005)
+- Legal and regulatory requirement fulfillment
+- Guaranteed font embedding
+- Self-contained documents
+- Multiple conformance levels for different needs
+
+## Implementation Details
+- Use Build API with output type: `pdfa`
+- Support all PDF/A conformance levels
+- Provide sensible defaults (PDF/A-2b most common)
+- Handle vectorization/rasterization options
+- Clear error messages for conversion failures
+
+## Testing Requirements
+- [ ] Test each conformance level
+- [ ] Test vectorization on/off
+- [ ] Test rasterization on/off
+- [ ] Test with complex PDFs (forms, multimedia)
+- [ ] Verify output is valid PDF/A
+- [ ] Test conversion failures gracefully
+
+## OpenAPI Reference
+- Output type: `pdfa`
+- Conformance levels: pdfa-1a, pdfa-1b, pdfa-2a, pdfa-2u, pdfa-2b, pdfa-3a, pdfa-3u
+- Options: vectorization (default: true), rasterization (default: true)
+
+## Use Case Example
+```python
+# Convert for long-term archival (most permissive)
+archived_pdf = client.convert_to_pdfa(
+    "document.pdf",
+    conformance="pdfa-2b"
+)
+
+# Convert for accessibility compliance (strictest)
+accessible_pdf = client.convert_to_pdfa(
+    "document.pdf",
+    conformance="pdfa-2a",
+    output_path="archived_accessible.pdf"
+)
+```
+
+## Conformance Level Guide
+- **PDF/A-1a**: Level A compliance, accessibility features required
+- **PDF/A-1b**: Level B compliance, visual appearance preservation
+- **PDF/A-2a/2b**: Based on PDF 1.7, more features allowed
+- **PDF/A-2u**: Unicode mapping required
+- **PDF/A-3a/3u**: Allows embedded files
+
+## Priority
+🟡 Priority 3 - Format conversion method
+
+## Labels
+- feature
+- conversion
+- compliance
+- archival
+- openapi-compliance
diff --git a/github_issues/07_convert_to_images.md b/github_issues/07_convert_to_images.md
@@ -0,0 +1,88 @@
+# Feature: Convert PDF to Images Method
+
+## Summary
+Implement `convert_to_images()` to extract PDF pages as image files in various formats.
+
+## Proposed Implementation
+```python
+def convert_to_images(
+    self,
+    input_file: FileInput,
+    output_dir: Optional[str] = None,  # Directory for multiple images
+    format: Literal["png", "jpeg", "webp"] = "png",
+    pages: Optional[List[int]] = None,  # None means all pages
+    width: Optional[int] = None,
+    height: Optional[int] = None,
+    dpi: int = 150,
+) -> Union[List[bytes], None]:  # Returns list of image bytes or None if saved
+```
+
+## Benefits
+- Generate thumbnails and previews
+- Web-friendly image formats
+- Flexible resolution control
+- Selective page extraction
+- Batch image generation
+
+## Implementation Details
+- Use Build API with output type: `image`
+- Support PNG, JPEG, and WebP formats
+- Handle multi-page extraction (returns list)
+- Automatic file naming when saving to directory
+- Resolution control via width/height/DPI
+
+## Testing Requirements
+- [ ] Test PNG format extraction
+- [ ] Test JPEG format extraction
+- [ ] Test WebP format extraction
+- [ ] Test single page extraction
+- [ ] Test multi-page extraction
+- [ ] Test resolution options (width, height, DPI)
+- [ ] Test file saving vs bytes return
+
+## OpenAPI Reference
+- Output type: `image`
+- Formats: png, jpeg, jpg, webp
+- Parameters: width, height, dpi, pages (range)
+
+## Use Case Example
+```python
+# Extract all pages as PNG thumbnails
+thumbnails = client.convert_to_images(
+    "document.pdf",
+    format="png",
+    width=200  # Fixed width, height auto-calculated
+)
+
+# Extract specific pages as high-res JPEGs
+client.convert_to_images(
+    "document.pdf",
+    output_dir="./page_images",
+    format="jpeg",
+    pages=[0, 1, 2],  # First 3 pages
+    dpi=300  # High resolution
+)
+
+# Generate web-optimized previews
+web_images = client.convert_to_images(
+    "document.pdf",
+    format="webp",
+    width=800,
+    height=600
+)
+```
+
+## File Naming Convention
+When saving to directory:
+- Single page: `{original_name}.{format}`
+- Multiple pages: `{original_name}_page_{n}.{format}`
+
+## Priority
+🟡 Priority 3 - Format conversion method
+
+## Labels
+- feature
+- conversion
+- images
+- thumbnails
+- openapi-compliance
diff --git a/github_issues/08_extract_content.md b/github_issues/08_extract_content.md
@@ -0,0 +1,107 @@
+# Feature: Extract Content as JSON Method
+
+## Summary
+Implement `extract_content()` to extract text, tables, and metadata from PDFs as structured JSON data.
+
+## Proposed Implementation
+```python
+def extract_content(
+    self,
+    input_file: FileInput,
+    extract_text: bool = True,
+    extract_tables: bool = True,
+    extract_metadata: bool = True,
+    extract_structure: bool = False,
+    language: Union[str, List[str]] = "english",
+    output_path: Optional[str] = None,
+) -> Union[Dict[str, Any], None]:
+```
+
+## Benefits
+- Structured data extraction for analysis
+- Table detection and extraction
+- Metadata parsing
+- Search indexing support
+- Machine learning data preparation
+- Multi-language text extraction
+
+## Implementation Details
+- Use Build API with output type: `json-content`
+- Map parameters to OpenAPI options:
+  - `plainText`: extract_text
+  - `tables`: extract_tables
+  - `structuredText`: extract_structure
+- Include document metadata in response
+- Support OCR for scanned documents
+
+## Testing Requirements
+- [ ] Test plain text extraction
+- [ ] Test table extraction
+- [ ] Test metadata extraction
+- [ ] Test structured text extraction
+- [ ] Test with multi-language documents
+- [ ] Test with scanned documents (OCR)
+- [ ] Validate JSON structure
+
+## OpenAPI Reference
+- Output type: `json-content`
+- Options: plainText, structuredText, tables, keyValuePairs
+- Language support for OCR
+- Returns structured JSON
+
+## Use Case Example
+```python
+# Extract everything from a document
+content = client.extract_content(
+    "report.pdf",
+    extract_text=True,
+    extract_tables=True,
+    extract_metadata=True
+)
+
+# Access extracted data
+print(content["metadata"]["title"])
+print(content["text"])
+for table in content["tables"]:
+    print(table["data"])
+
+# Extract for multilingual search indexing
+search_data = client.extract_content(
+    "multilingual.pdf",
+    language=["english", "spanish", "french"],
+    extract_structure=True
+)
+```
+
+## Expected JSON Structure
+```json
+{
+  "metadata": {
+    "title": "Document Title",
+    "author": "Author Name",
+    "created": "2024-01-01T00:00:00Z",
+    "pages": 10
+  },
+  "text": "Extracted plain text...",
+  "structured_text": {
+    "paragraphs": [...],
+    "headings": [...]
+  },
+  "tables": [
+    {
+      "page": 1,
+      "data": [["Header1", "Header2"], ["Row1Col1", "Row1Col2"]]
+    }
+  ]
+}
+```
+
+## Priority
+🟡 Priority 3 - Format conversion method
+
+## Labels
+- feature
+- extraction
+- data-processing
+- json
+- openapi-compliance
diff --git a/github_issues/09_ai_redact.md b/github_issues/09_ai_redact.md
@@ -0,0 +1,84 @@
+# Feature: AI-Powered Redaction Method
+
+## Summary
+Implement `ai_redact()` to use Nutrient's AI capabilities for automatic detection and redaction of sensitive information.
+
+## Proposed Implementation
+```python
+def ai_redact(
+    self,
+    input_file: FileInput,
+    output_path: Optional[str] = None,
+    sensitivity_level: Literal["low", "medium", "high"] = "medium",
+    entity_types: Optional[List[str]] = None,  # ["email", "ssn", "phone", etc.]
+    review_mode: bool = False,  # Create redactions without applying
+    confidence_threshold: float = 0.8,
+) -> Optional[bytes]:
+```
+
+## Benefits
+- Automated GDPR/CCPA compliance
+- Reduce manual review time by 90%
+- Consistent redaction across documents
+- Multiple entity type detection
+- Configurable sensitivity levels
+- Review mode for human verification
+
+## Implementation Details
+- Use dedicated `/ai/redact` endpoint
+- Different from create_redactions (rule-based)
+- Support confidence thresholds
+- Allow entity type filtering
+- Option to review before applying
+
+## Testing Requirements
+- [ ] Test sensitivity levels (low/medium/high)
+- [ ] Test specific entity detection
+- [ ] Test review mode
+- [ ] Test confidence thresholds
+- [ ] Compare with manual redaction
+- [ ] Test on various document types
+
+## OpenAPI Reference
+- Endpoint: `/ai/redact`
+- Separate from Build API
+- AI-powered detection
+- Returns processed document
+
+## Use Case Example
+```python
+# Automatic GDPR compliance
+gdpr_safe = client.ai_redact(
+    "customer_data.pdf",
+    entity_types=["email", "phone", "name", "address"],
+    sensitivity_level="high"
+)
+
+# Review before applying
+review_pdf = client.ai_redact(
+    "contract.pdf",
+    entity_types=["ssn", "bank_account", "credit_card"],
+    review_mode=True,  # Creates redaction annotations only
+    confidence_threshold=0.9
+)
+
+# Then manually review and apply
+final = client.apply_redactions(review_pdf)
+```
+
+## Supported Entity Types
+- Personal: name, email, phone, address
+- Financial: ssn, credit_card, bank_account, routing_number
+- Medical: medical_record, diagnosis, prescription
+- Custom: (API may support additional types)
+
+## Priority
+🟠 Priority 4 - Advanced feature
+
+## Labels
+- feature
+- ai
+- redaction
+- compliance
+- gdpr
+- openapi-compliance
diff --git a/github_issues/10_digital_signature.md b/github_issues/10_digital_signature.md