Skip to content

Commit d3afe37

Browse files
committed
docs: comprehensive future enhancement plan with GitHub issue templates
Created detailed enhancement roadmap based on OpenAPI v1.9.0 analysis: 📋 Enhancement Plan: - 13 proposed enhancements across 4 priority levels - Detailed implementation specifications - Testing requirements and use cases - Recommended 4-phase implementation timeline 📁 GitHub Issue Templates: - Individual issue template for each enhancement - Consistent format with implementation details - OpenAPI references and code examples - Priority levels and labels 🎯 Goals: - Increase API coverage from ~30% to ~80% - Maintain backward compatibility - Add most requested features - Follow OpenAPI specification precisely This provides a clear roadmap for community contributions and systematic feature development.
1 parent bdb654b commit d3afe37

12 files changed

+1401
-0
lines changed

FUTURE_ENHANCEMENTS_PLAN.md

Lines changed: 528 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Enhancement Roadmap: Nutrient DWS Python Client
2+
3+
## Overview
4+
This issue tracks the comprehensive enhancement plan for the Nutrient DWS Python Client based on OpenAPI specification v1.9.0 analysis. The goal is to expand from ~30% to ~80% API coverage while maintaining our high standards for code quality and backward compatibility.
5+
6+
## Enhancement Categories
7+
8+
### 🔵 Priority 1: Enhanced Existing Methods
9+
*Improve current methods with additional OpenAPI capabilities*
10+
11+
- [ ] #1 **Multi-Language OCR Support** - Support multiple languages in `ocr_pdf()`
12+
- [ ] #2 **Image Watermark Support** - Add image watermarks to `watermark_pdf()`
13+
- [ ] #3 **Selective Annotation Flattening** - Add annotation ID filtering to `flatten_annotations()`
14+
15+
### 🟢 Priority 2: Core Missing Methods
16+
*Add commonly requested document operations*
17+
18+
- [ ] #4 **Create Redactions** - Implement `create_redactions()` with text/regex/preset strategies
19+
- [ ] #5 **Import Annotations** - Implement `import_annotations()` for Instant JSON/XFDF
20+
- [ ] #6 **Extract Page Range** - Simple `extract_pages()` method (simpler than split_pdf)
21+
22+
### 🟡 Priority 3: Format Conversion Methods
23+
*Enable output format flexibility*
24+
25+
- [ ] #7 **Convert to PDF/A** - Implement `convert_to_pdfa()` for archival compliance
26+
- [ ] #8 **Convert to Images** - Implement `convert_to_images()` for PNG/JPEG/WebP
27+
- [ ] #9 **Extract Content as JSON** - Implement `extract_content()` for structured data
28+
- [ ] #10 **Convert to Office Formats** - Implement `convert_to_office()` for DOCX/XLSX/PPTX
29+
30+
### 🟠 Priority 4: Advanced Features
31+
*Sophisticated document processing capabilities*
32+
33+
- [ ] #11 **AI-Powered Redaction** - Implement `ai_redact()` using AI entity detection
34+
- [ ] #12 **Digital Signatures** - Implement `sign_pdf()` with visual signatures
35+
- [ ] #13 **Batch Processing** - Client-side `batch_process()` for bulk operations
36+
37+
## Implementation Timeline
38+
39+
### Phase 1 (Weeks 1-4)
40+
Focus on Priority 1 enhancements that improve existing methods:
41+
- Multi-language OCR
42+
- Image watermarks
43+
- Selective flattening
44+
45+
### Phase 2 (Weeks 5-8)
46+
Add Priority 2 core methods:
47+
- Create redactions
48+
- Import annotations
49+
- PDF/A conversion
50+
51+
### Phase 3 (Weeks 9-12)
52+
Implement Priority 3 format conversions:
53+
- Image extraction
54+
- Content extraction
55+
- Office format export
56+
57+
### Phase 4 (Weeks 13-16)
58+
Advanced features for Priority 4:
59+
- AI redaction
60+
- Digital signatures
61+
- Batch processing
62+
63+
## Success Metrics
64+
65+
- **API Coverage**: Increase from ~30% to ~80%
66+
- **Test Coverage**: Maintain 95%+ coverage
67+
- **Documentation**: 100% method documentation with examples
68+
- **Performance**: Sub-second operations for common tasks
69+
- **Backward Compatibility**: Zero breaking changes
70+
71+
## Implementation Guidelines
72+
73+
For each enhancement:
74+
1. Review OpenAPI specification for exact requirements
75+
2. Implement with backward compatibility in mind
76+
3. Add comprehensive unit and integration tests
77+
4. Include detailed docstrings with examples
78+
5. Update documentation and changelog
79+
6. Consider performance implications
80+
81+
## Related Documents
82+
83+
- [FUTURE_ENHANCEMENTS_PLAN.md](../FUTURE_ENHANCEMENTS_PLAN.md) - Detailed enhancement specifications
84+
- [OPENAPI_COMPLIANCE_REVIEW.md](../OPENAPI_COMPLIANCE_REVIEW.md) - Current compliance status
85+
- [openapi_spec.yml](../openapi_spec.yml) - Official API specification v1.9.0
86+
87+
## Contributing
88+
89+
We welcome contributions! Please:
90+
1. Comment on the specific issue you'd like to work on
91+
2. Follow the implementation template in each issue
92+
3. Ensure all tests pass
93+
4. Update documentation
94+
5. Submit PR referencing the issue number
95+
96+
## Questions?
97+
98+
Feel free to ask questions in the comments or open a discussion for broader topics.
99+
100+
---
101+
102+
**Labels**: roadmap, enhancement, meta-issue
103+
**Milestone**: v2.0.0
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Enhancement: Multi-Language OCR Support
2+
3+
## Summary
4+
Enhance the `ocr_pdf()` method to support multiple languages simultaneously, as supported by the OpenAPI specification.
5+
6+
## Current Behavior
7+
- `ocr_pdf()` accepts only a single language string
8+
- Limited to one language per document
9+
10+
## Proposed Enhancement
11+
```python
12+
def ocr_pdf(
13+
self,
14+
input_file: FileInput,
15+
output_path: Optional[str] = None,
16+
language: Union[str, List[str]] = "english", # Now accepts list
17+
enable_structure: bool = False, # New parameter
18+
) -> Optional[bytes]:
19+
```
20+
21+
## Benefits
22+
- Process multi-lingual documents accurately
23+
- Better OCR accuracy with proper language hints
24+
- Optional structured text extraction
25+
- Backward compatible with existing single-language usage
26+
27+
## Implementation Details
28+
- Modify `_map_tool_to_action()` in builder.py to handle language arrays
29+
- Update parameter validation to accept both string and list
30+
- Add `enable_structure` parameter for structured output
31+
- Extend language mapping to support all 30+ OpenAPI languages
32+
33+
## Testing Requirements
34+
- [ ] Test single language string (backward compatibility)
35+
- [ ] Test multiple languages as list
36+
- [ ] Test structured output option
37+
- [ ] Test all supported language codes
38+
- [ ] Update integration tests
39+
40+
## OpenAPI Reference
41+
- BuildAction type: `ocr`
42+
- Parameter: `language` - can be single OcrLanguage or array
43+
- Supports: english, spanish, french, german, italian, portuguese, chinese, japanese, korean, russian, arabic, hindi, and more
44+
45+
## Priority
46+
🔵 Priority 1 - Enhancement to existing method
47+
48+
## Labels
49+
- enhancement
50+
- ocr
51+
- openapi-compliance
52+
- backward-compatible
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Enhancement: Image Watermark Support
2+
3+
## Summary
4+
Extend `watermark_pdf()` to support image watermarks in addition to text watermarks, as specified in the OpenAPI ImageWatermarkAction.
5+
6+
## Current Behavior
7+
- Only supports text watermarks
8+
- No image watermark capability
9+
10+
## Proposed Enhancement
11+
```python
12+
def watermark_pdf(
13+
self,
14+
input_file: FileInput,
15+
output_path: Optional[str] = None,
16+
# Text watermark parameters (existing)
17+
text: Optional[str] = None,
18+
# Image watermark parameters (new)
19+
image_file: Optional[FileInput] = None,
20+
image_url: Optional[str] = None,
21+
# Common parameters
22+
width: int = 200,
23+
height: int = 100,
24+
opacity: float = 1.0,
25+
position: str = "center",
26+
rotation: int = 0, # New parameter
27+
) -> Optional[bytes]:
28+
```
29+
30+
## Benefits
31+
- Logo and branding watermarks
32+
- Complex visual watermarks
33+
- Rotation support for both text and image watermarks
34+
- Maintains backward compatibility
35+
36+
## Implementation Details
37+
- Extend `_map_tool_to_action()` to handle image watermarks
38+
- Add validation for image_file/image_url parameters
39+
- Support rotation parameter for all watermark types
40+
- Handle image file upload in multipart request
41+
42+
## Testing Requirements
43+
- [ ] Test with image file input (PNG, JPEG)
44+
- [ ] Test with image URL
45+
- [ ] Test rotation parameter (0, 90, 180, 270)
46+
- [ ] Test opacity with images
47+
- [ ] Test all position options
48+
- [ ] Verify backward compatibility with text watermarks
49+
50+
## OpenAPI Reference
51+
- BuildAction type: `watermark`
52+
- Subtypes: TextWatermarkAction, ImageWatermarkAction
53+
- Image parameter: `image` (FileHandle)
54+
- New parameter: `rotation`
55+
56+
## Priority
57+
🔵 Priority 1 - Enhancement to existing method
58+
59+
## Labels
60+
- enhancement
61+
- watermark
62+
- openapi-compliance
63+
- backward-compatible
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Enhancement: Selective Annotation Flattening
2+
3+
## Summary
4+
Enhance `flatten_annotations()` to support selective flattening by annotation IDs, as supported by the OpenAPI FlattenAction.
5+
6+
## Current Behavior
7+
- Flattens all annotations and form fields
8+
- No selective control
9+
10+
## Proposed Enhancement
11+
```python
12+
def flatten_annotations(
13+
self,
14+
input_file: FileInput,
15+
output_path: Optional[str] = None,
16+
annotation_ids: Optional[List[Union[str, int]]] = None, # New parameter
17+
) -> Optional[bytes]:
18+
```
19+
20+
## Benefits
21+
- Preserve specific annotations while flattening others
22+
- More granular control over document processing
23+
- Better support for complex form workflows
24+
- Backward compatible (None = flatten all)
25+
26+
## Implementation Details
27+
- Modify BuildAction to include `annotationIds` when provided
28+
- Support both string and integer IDs
29+
- Handle empty list (flatten none) vs None (flatten all)
30+
- Update parameter documentation
31+
32+
## Testing Requirements
33+
- [ ] Test with None (flatten all - current behavior)
34+
- [ ] Test with specific annotation IDs
35+
- [ ] Test with mix of valid and invalid IDs
36+
- [ ] Test with empty list
37+
- [ ] Test with different annotation types
38+
39+
## OpenAPI Reference
40+
- BuildAction type: `flatten`
41+
- Parameter: `annotationIds` (optional array of string/integer)
42+
- Behavior: If not specified, flattens all annotations
43+
44+
## Priority
45+
🔵 Priority 1 - Enhancement to existing method
46+
47+
## Labels
48+
- enhancement
49+
- annotations
50+
- openapi-compliance
51+
- backward-compatible
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Feature: Create Redactions Method
2+
3+
## Summary
4+
Implement `create_redactions()` method to programmatically create redaction annotations using text search, regex patterns, or presets.
5+
6+
## Proposed Implementation
7+
```python
8+
def create_redactions(
9+
self,
10+
input_file: FileInput,
11+
output_path: Optional[str] = None,
12+
strategy: Literal["text", "regex", "preset"] = "text",
13+
search_text: Optional[str] = None, # For text strategy
14+
regex_pattern: Optional[str] = None, # For regex strategy
15+
preset_type: Optional[str] = None, # For preset strategy
16+
case_sensitive: bool = False,
17+
whole_words_only: bool = False,
18+
# Redaction appearance
19+
fill_color: Optional[str] = "#000000",
20+
outline_color: Optional[str] = "#000000",
21+
overlay_text: Optional[str] = None,
22+
) -> Optional[bytes]:
23+
```
24+
25+
## Benefits
26+
- Automated redaction creation for compliance workflows
27+
- Multiple search strategies (text, regex, presets)
28+
- Customizable redaction appearance
29+
- Preview redactions before permanently applying
30+
- Works with existing `apply_redactions()` method
31+
32+
## Implementation Details
33+
- Use BuildAction type: `createRedactions`
34+
- Support three strategies:
35+
- `text`: Simple text search
36+
- `regex`: Regular expression patterns
37+
- `preset`: Common patterns (SSN, email, phone, etc.)
38+
- Include appearance customization options
39+
- Return PDF with redaction annotations (not yet applied)
40+
41+
## Testing Requirements
42+
- [ ] Test text search strategy
43+
- [ ] Test regex patterns (email, SSN, phone)
44+
- [ ] Test preset types
45+
- [ ] Test case sensitivity options
46+
- [ ] Test appearance customization
47+
- [ ] Integration test with apply_redactions()
48+
49+
## OpenAPI Reference
50+
- BuildAction type: `createRedactions`
51+
- Strategies: text, regex, preset
52+
- Strategy options vary by type
53+
- Includes content appearance configuration
54+
55+
## Use Case Example
56+
```python
57+
# Create redactions for all SSNs
58+
pdf_with_redactions = client.create_redactions(
59+
"document.pdf",
60+
strategy="regex",
61+
regex_pattern=r"\b\d{3}-\d{2}-\d{4}\b",
62+
overlay_text="[REDACTED]"
63+
)
64+
65+
# Review and then apply
66+
final_pdf = client.apply_redactions(pdf_with_redactions)
67+
```
68+
69+
## Priority
70+
🟢 Priority 2 - Core missing method
71+
72+
## Labels
73+
- feature
74+
- redaction
75+
- security
76+
- openapi-compliance

0 commit comments

Comments
 (0)