Skip to content

Commit 72aafc3

Browse files
committed
docs: add comprehensive README with examples and documentation
- Add detailed installation and quick start guide - Include examples for both Direct API and Builder API - Document all available tools and their usage - Add error handling examples - Include development setup instructions - Add contribution guidelines
1 parent 07b711c commit 72aafc3

File tree

1 file changed

+294
-13
lines changed

1 file changed

+294
-13
lines changed

README.md

Lines changed: 294 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,21 @@
11
# Nutrient DWS Python Client
22

3-
A Python client library for the Nutrient Document Web Services (DWS) API.
3+
A Python client library for the [Nutrient Document Web Services (DWS) API](https://www.nutrient.io/). This library provides a Pythonic interface to interact with Nutrient's document processing services, supporting both Direct API calls and Builder API workflows.
4+
5+
## Features
6+
7+
- 🚀 **Two API styles**: Direct API for single operations, Builder API for complex workflows
8+
- 📄 **Comprehensive document tools**: Convert, merge, rotate, OCR, watermark, and more
9+
- 🔄 **Automatic retries**: Built-in retry logic for transient failures
10+
- 📁 **Flexible file handling**: Support for file paths, bytes, and file-like objects
11+
- 🔒 **Type-safe**: Full type hints for better IDE support
12+
-**Streaming support**: Memory-efficient processing of large files
13+
- 🧪 **Well-tested**: Comprehensive test suite with high coverage
414

515
## Installation
616

717
```bash
8-
pip install nutrient
18+
pip install nutrient-dws
919
```
1020

1121
## Quick Start
@@ -14,22 +24,293 @@ pip install nutrient
1424
from nutrient import NutrientClient
1525

1626
# Initialize the client
17-
client = NutrientClient(api_key="YOUR_API_KEY")
27+
client = NutrientClient(api_key="your-api-key")
28+
29+
# Direct API - Convert Office document to PDF
30+
pdf = client.convert_to_pdf(
31+
input_file="document.docx",
32+
output_path="converted.pdf"
33+
)
34+
35+
# Builder API - Chain multiple operations
36+
client.build(input_file="document.pdf") \
37+
.add_step("rotate-pages", {"degrees": 90}) \
38+
.add_step("ocr-pdf", {"language": "en"}) \
39+
.add_step("watermark-pdf", {"text": "CONFIDENTIAL"}) \
40+
.execute(output_path="processed.pdf")
41+
```
42+
43+
## Authentication
44+
45+
The client supports API key authentication through multiple methods:
46+
47+
```python
48+
# 1. Pass directly to client
49+
client = NutrientClient(api_key="your-api-key")
50+
51+
# 2. Set environment variable
52+
# export NUTRIENT_API_KEY=your-api-key
53+
client = NutrientClient() # Will use env variable
54+
55+
# 3. Use context manager for automatic cleanup
56+
with NutrientClient(api_key="your-api-key") as client:
57+
client.convert_to_pdf("document.docx")
58+
```
59+
60+
## Direct API Examples
61+
62+
### Convert to PDF
63+
64+
```python
65+
# Convert Office document to PDF
66+
client.convert_to_pdf(
67+
input_file="presentation.pptx",
68+
output_path="presentation.pdf"
69+
)
70+
71+
# Convert with options
72+
client.convert_to_pdf(
73+
input_file="spreadsheet.xlsx",
74+
output_path="spreadsheet.pdf",
75+
page_range="1-3"
76+
)
77+
```
78+
79+
### Merge PDFs
80+
81+
```python
82+
# Merge multiple PDFs
83+
client.merge_pdfs(
84+
input_files=["doc1.pdf", "doc2.pdf", "doc3.pdf"],
85+
output_path="merged.pdf"
86+
)
87+
```
88+
89+
### OCR PDF
90+
91+
```python
92+
# Add OCR layer to scanned PDF
93+
client.ocr_pdf(
94+
input_file="scanned.pdf",
95+
output_path="searchable.pdf",
96+
language="en"
97+
)
98+
```
99+
100+
### Rotate Pages
101+
102+
```python
103+
# Rotate all pages
104+
client.rotate_pages(
105+
input_file="document.pdf",
106+
output_path="rotated.pdf",
107+
degrees=180
108+
)
109+
110+
# Rotate specific pages
111+
client.rotate_pages(
112+
input_file="document.pdf",
113+
output_path="rotated.pdf",
114+
degrees=90,
115+
page_indexes=[0, 2, 4] # Pages 1, 3, and 5
116+
)
117+
```
118+
119+
### Watermark PDF
120+
121+
```python
122+
# Add text watermark
123+
client.watermark_pdf(
124+
input_file="document.pdf",
125+
output_path="watermarked.pdf",
126+
text="DRAFT",
127+
opacity=0.5
128+
)
129+
130+
# Add image watermark
131+
client.watermark_pdf(
132+
input_file="document.pdf",
133+
output_path="watermarked.pdf",
134+
image_url="https://example.com/logo.png",
135+
position="center"
136+
)
137+
```
138+
139+
## Builder API Examples
140+
141+
The Builder API allows you to chain multiple operations in a single workflow:
142+
143+
```python
144+
# Complex document processing pipeline
145+
result = client.build(input_file="raw-scan.pdf") \
146+
.add_step("ocr-pdf", {"language": "en"}) \
147+
.add_step("rotate-pages", {"degrees": -90, "page_indexes": [0]}) \
148+
.add_step("watermark-pdf", {
149+
"text": "PROCESSED",
150+
"opacity": 0.3,
151+
"position": "top-right"
152+
}) \
153+
.add_step("flatten-annotations") \
154+
.set_output_options(
155+
metadata={"title": "Processed Document", "author": "DWS Client"},
156+
optimize=True
157+
) \
158+
.execute(output_path="final.pdf")
159+
```
160+
161+
## File Input Options
162+
163+
The library supports multiple ways to provide input files:
18164

19-
# Convert a document to PDF
20-
pdf_bytes = client.convert_to_pdf(input_file="document.docx")
165+
```python
166+
# File path (string or Path object)
167+
client.convert_to_pdf("document.docx")
168+
client.convert_to_pdf(Path("document.docx"))
169+
170+
# Bytes
171+
with open("document.docx", "rb") as f:
172+
file_bytes = f.read()
173+
client.convert_to_pdf(file_bytes)
174+
175+
# File-like object
176+
with open("document.docx", "rb") as f:
177+
client.convert_to_pdf(f)
178+
179+
# URL (for supported operations)
180+
client.import_from_url("https://example.com/document.pdf")
181+
```
182+
183+
## Error Handling
184+
185+
The library provides specific exceptions for different error scenarios:
186+
187+
```python
188+
from nutrient import (
189+
NutrientError,
190+
AuthenticationError,
191+
APIError,
192+
ValidationError,
193+
TimeoutError,
194+
FileProcessingError
195+
)
196+
197+
try:
198+
client.convert_to_pdf("document.docx")
199+
except AuthenticationError:
200+
print("Invalid API key")
201+
except ValidationError as e:
202+
print(f"Invalid parameters: {e.errors}")
203+
except APIError as e:
204+
print(f"API error: {e.status_code} - {e.message}")
205+
except TimeoutError:
206+
print("Request timed out")
207+
except FileProcessingError as e:
208+
print(f"File processing failed: {e}")
209+
```
21210

22-
# Use the Builder API for complex workflows
23-
client.build(input_file="document.docx") \
24-
.add_step(tool="convert-to-pdf") \
25-
.add_step(tool="rotate-pages", options={"degrees": 90}) \
26-
.execute(output_path="output.pdf")
211+
## Advanced Configuration
212+
213+
### Custom Timeout
214+
215+
```python
216+
# Set timeout to 10 minutes for large files
217+
client = NutrientClient(api_key="your-api-key", timeout=600)
27218
```
28219

29-
## Documentation
220+
### Streaming Large Files
221+
222+
Files larger than 10MB are automatically streamed to avoid memory issues:
223+
224+
```python
225+
# This will stream the file instead of loading it into memory
226+
client.convert_to_pdf("large-presentation.pptx")
227+
```
228+
229+
## Available Tools
230+
231+
### Document Conversion
232+
- `convert_to_pdf` - Convert Office documents to PDF
233+
- `convert_from_pdf` - Convert PDF to Office formats
234+
- `convert_pdf_page_to_image` - Convert PDF pages to images
235+
- `import_from_url` - Import documents from URLs
236+
237+
### PDF Manipulation
238+
- `merge_pdfs` - Merge multiple PDFs
239+
- `split_pdf` - Split PDF into multiple files
240+
- `rotate_pages` - Rotate PDF pages
241+
- `delete_pages` - Remove pages from PDF
242+
- `duplicate_pages` - Duplicate pages in PDF
243+
- `move_pages` - Reorder pages in PDF
244+
245+
### PDF Enhancement
246+
- `ocr_pdf` - Add searchable text layer
247+
- `watermark_pdf` - Add text or image watermarks
248+
- `flatten_annotations` - Flatten form fields and annotations
249+
- `linearize_pdf` - Optimize for web viewing
30250

31-
Full documentation is available at [https://nutrient-dws-client-python.readthedocs.io](https://nutrient-dws-client-python.readthedocs.io)
251+
### PDF Security
252+
- `apply_redactions` - Permanently remove sensitive content
253+
- `create_redactions` - Mark content for redaction
254+
- `sanitize_pdf` - Remove potentially harmful content
255+
256+
### Annotations and Forms
257+
- `apply_instant_json` - Apply Nutrient Instant JSON annotations
258+
- `export_instant_json` - Export annotations as Instant JSON
259+
- `apply_xfdf` - Apply XFDF annotations
260+
- `export_xfdf` - Export annotations as XFDF
261+
- `export_pdf_info` - Extract PDF metadata and structure
262+
263+
## Development
264+
265+
### Setup
266+
267+
```bash
268+
# Clone the repository
269+
git clone https://github.com/jdrhyne/nutrient-dws-client-python.git
270+
cd nutrient-dws-client-python
271+
272+
# Install in development mode
273+
pip install -e ".[dev]"
274+
275+
# Run tests
276+
pytest
277+
278+
# Run linting
279+
ruff check .
280+
281+
# Run type checking
282+
mypy src tests
283+
```
284+
285+
### Running Tests
286+
287+
```bash
288+
# Run all tests
289+
pytest
290+
291+
# Run with coverage
292+
pytest --cov=nutrient --cov-report=html
293+
294+
# Run specific test file
295+
pytest tests/unit/test_client.py
296+
```
297+
298+
## Contributing
299+
300+
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
301+
302+
1. Fork the repository
303+
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
304+
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
305+
4. Push to the branch (`git push origin feature/amazing-feature`)
306+
5. Open a Pull Request
32307

33308
## License
34309

35-
MIT License - see LICENSE file for details.
310+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
311+
312+
## Support
313+
314+
- 📧 Email: [email protected]
315+
- 📚 Documentation: https://www.nutrient.io/docs/
316+
- 🐛 Issues: https://github.com/jdrhyne/nutrient-dws-client-python/issues

0 commit comments

Comments
 (0)