Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,13 @@ jobs:
pip install -e ".[dev]"

- name: Run linting with ruff
if: matrix.python-version == '3.8'
run: |
python -m ruff check .
python -m ruff format --check .

- name: Run type checking with mypy
run: python -m mypy src tests
run: python -m mypy --python-version=${{ matrix.python-version }} src tests

- name: Run tests with pytest
run: python -m pytest -v --cov=nutrient_dws --cov-report=xml --cov-report=term
Expand Down
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -149,4 +149,10 @@ Thumbs.db

# Project specific
openapi_spec.yml
.ruff_cache/
.ruff_cache/

.pixi
.claude/settings.local.json

# Integration test configuration
tests/integration/integration_config.py
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,5 @@ Always run the quality checks above to ensure code meets standards.
2. Implement features incrementally
3. Write tests alongside implementation
4. Update documentation/docstrings
5. Run quality checks before marking tasks complete
5. Run quality checks before marking tasks complete
6. Use `gh` cli tool
160 changes: 75 additions & 85 deletions SPECIFICATION.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,43 @@
# Software Design Specification: Nutrient DWS Python Client
Version: 1.1
Date: June 18, 2024
Version: 1.2
Date: December 19, 2024

## 1. Introduction
### 1.1. Project Overview
This document outlines the software design specification for a new Python client library for the Nutrient Document Web Services (DWS) API. The goal of this project is to create a high-quality, lightweight, and intuitive Python package that simplifies interaction with the Nutrient DWS API for developers.

The library will provide two primary modes of interaction:
- A **Direct API** for executing single, discrete document processing tasks (e.g., converting a single file, rotating a page) by wrapping the `/process/{tool}` endpoints.
- A **Direct API** for executing single, discrete document processing tasks (e.g., converting a single file, rotating a page).
- A **Builder API** that offers a fluent, chainable interface for composing and executing complex, multi-step document processing workflows, abstracting the `POST /build` endpoint of the Nutrient API.

The final product will be a distributable package suitable for publishing on PyPI, with comprehensive documentation. The design prioritizes ease of use, adherence to Python best practices, and clear documentation consumable by both humans and LLMs.

### 1.2. Scope
This specification covers the design and architecture of the Python client library itself. The scope includes:
- Client authentication and configuration.
- Implementation of static wrappers for individual document processing tools.
- Implementation of the Builder API for multi-step workflows.
- A refined error handling and reporting strategy.
- Flexible file input/output handling.
- Packaging structure for PyPI distribution.
- Documentation generation strategy.

Out of scope for this version are:
- A command-line interface (CLI).
- Support for asynchronous job polling or webhooks. All API calls will be synchronous, holding the HTTP connection open until a final result is returned.
- Implementations in languages other than Python.
This specification covers the implemented Python client library:
- Client authentication and configuration
- Direct API methods for common document operations
- Builder API for multi-step workflows
- Comprehensive error handling with custom exceptions
- Optimized file input/output handling
- Standard Python package structure

Out of scope:
- Command-line interface (CLI)
- Asynchronous operations (all calls are synchronous)
- Non-Python implementations

### 1.3. References
- **Nutrient DWS OpenAPI Specification**: https://dashboard.nutrient.io/assets/specs/[email protected]
- **Nutrient DWS API Documentation & Guides**: https://www.nutrient.io/api/documentation/
- **Target API Endpoint Base**: https://www.nutrient.io/api/processor-api/
- **Nutrient DWS API Documentation**: https://www.nutrient.io/api/reference/public/
- **Nutrient DWS List of Tools**: https://www.nutrient.io/api/tools-overview/
- **Target API Endpoint**: https://api.pspdfkit.com

## 2. Goals and Objectives
- **Simplicity**: Provide a clean, Pythonic interface that abstracts the complexities of direct HTTP requests, authentication, and file handling.
- **Flexibility**: Offer both a simple, direct API for single tasks and a powerful, fluent Builder API for complex workflows.
- **Lightweight**: The library will have one primary external dependency: the `requests` library for synchronous HTTP communication.
- **Discoverability**: The API design and documentation will be clear and predictable, enabling developers (and LLMs) to easily understand and use its capabilities.
- **Distribution-Ready**: The project will be structured as a standard Python package, complete with a `pyproject.toml` file, ready for publication to PyPI.
- **Well-Documented**: Produce high-quality, auto-generated API documentation from docstrings, supplemented with tutorials and usage examples.
- **Simplicity**: Clean, Pythonic interface abstracting HTTP requests, authentication, and file handling
- **Flexibility**: Direct API for single operations and Builder API for complex workflows
- **Lightweight**: Single external dependency on `requests` library
- **Performance**: Optimized file handling with streaming for large files (>10MB)
- **Distribution-Ready**: Standard Python package structure with `pyproject.toml`

## 3. High-Level Architecture
The library is architected around a central `NutrientClient` class, which is the main entry point for all interactions.
Expand All @@ -62,57 +61,46 @@ The library is architected around a central `NutrientClient` class, which is the

### 3.2. Data Flow
**Direct API Call:**
1. User instantiates `NutrientClient`.
2. User calls a method, e.g., `client.rotate_pages(input_file='path/to/doc.pdf', degrees=90)`.
3. The method prepares the input file and parameters.
4. It constructs a `multipart/form-data` POST request to `/process/rotate-pages`.
5. It receives the processed file in the HTTP response and returns it.
1. User calls method like `client.rotate_pages(input_file='path/to/doc.pdf', degrees=90)`
2. Method internally uses Builder API with single step
3. File is processed via `/build` endpoint
4. Returns processed file bytes or saves to `output_path`

**Builder API Call:**
1. User instantiates `NutrientClient`.
2. User starts a build chain: `builder = client.build(input_file='path/to/doc.docx')`.
3. User chains operations: `builder.add_step(tool='convert-to-pdf').add_step(tool='rotate-pages', options={'degrees': 90})`.
4. User calls `builder.execute()`.
5. The `execute()` method constructs the `multipart/form-data` request, sending the file(s) and a JSON payload describing the sequence of actions to the `/build` endpoint.
6. It receives the final processed file and returns it.

## 4. Detailed API Design
### 4.1. Client Initialization
The client will be initialized with an optional API key and timeout. It will follow modern Python library best practices for configuration.
1. User chains operations: `client.build(input_file='doc.docx').add_step(tool='rotate-pages', options={'degrees': 90})`
2. `execute()` sends `multipart/form-data` request to `/build` endpoint
3. Returns processed file bytes or saves to `output_path`

## 4. API Design
### 4.1. Client Initialization
```python
from nutrient_dws import NutrientClient, AuthenticationError

# Option 1: API key passed directly (takes precedence)
client = NutrientClient(api_key="YOUR_DWS_API_KEY", timeout=300)
# API key from parameter (takes precedence) or NUTRIENT_API_KEY env var
client = NutrientClient(api_key="YOUR_DWS_API_KEY", timeout=300)

# Option 2: API key read from NUTRIENT_API_KEY environment variable
# client = NutrientClient()

# No error is raised on init if no key is found.
# An AuthenticationError will be raised on the first API call.
try:
# This call will fail if the key is invalid or missing.
client.some_api_call(...)
except AuthenticationError as e:
print(f"Authentication failed: {e}")
# Context manager support
with NutrientClient() as client:
result = client.convert_to_pdf("document.docx")
```

- **Precedence**: The `api_key` argument in the constructor takes priority over the `NUTRIENT_API_KEY` environment variable.
- **Timeout**: The `timeout` argument (in seconds) is passed to the underlying `requests` calls.
- **API Key**: Parameter takes precedence over `NUTRIENT_API_KEY` environment variable
- **Timeout**: Default 300 seconds, configurable per client
- **Error Handling**: `AuthenticationError` raised on first API call if key invalid

### 4.2. File Handling
**Input Types**: Methods that accept file inputs will support:
- A `str` representing a local file path.
- A raw `bytes` object.
- A file-like object that supports reading in binary mode (an instance of `io.IOBase`).
**Input Types**:
- `str` or `Path` for local file paths
- `bytes` objects
- File-like objects (`io.IOBase`)

**Output Behavior**: Methods that return a file will:
- Return a `bytes` object by default.
- If an `output_path` string argument is provided, the method will save the file directly to that path and return `None` to conserve memory.
**Output Behavior**:
- Returns `bytes` by default
- Saves to `output_path` and returns `None` when path provided
- Large files (>10MB) use streaming to optimize memory usage

### 4.3. Direct API Design
Method names will be snake_case versions of the tool identifiers from the OpenAPI specification. All tool-specific parameters will be keyword-only arguments.
Method names are snake_case versions of operations. Tool-specific parameters are keyword-only arguments.

**Example Usage:**
```python
Expand All @@ -124,22 +112,22 @@ pdf_bytes = client.convert_to_pdf(
)

# Step 2: Rotate the newly created PDF from memory
# The 'degrees' parameter is a required, keyword-only argument for this tool.
client.rotate_pages(
input_file=pdf_bytes,
output_path="path/to/rotated_document.pdf", # Save the final result
degrees=90
output_path="path/to/rotated_document.pdf",
degrees=90 # keyword-only argument
)

print("File saved to path/to/rotated_document.pdf")
```

### 4.4. Builder API Design
The Builder API provides a more elegant and efficient solution for multi-step workflows by making a single API call.
Fluent interface for multi-step workflows with single API call:

- `client.build(input_file)`: Starts a new build workflow.
- `.add_step(tool: str, options: dict = None)`: Adds a processing step. `tool` is the string identifier from the API. `options` is a dictionary of parameters for that tool.
- `.execute(output_path: str = None)`: Finalizes the chain, sends the request to the `/build` endpoint, and returns the result.
- `client.build(input_file)`: Starts workflow
- `.add_step(tool, options=None)`: Adds processing step
- `.execute(output_path=None)`: Executes workflow
- `.set_output_options(**options)`: Sets output metadata/optimization

**Example Usage:**
```python
Expand All @@ -148,7 +136,6 @@ from nutrient_dws import APIError
# User Story: Convert a DOCX to PDF and rotate it (Builder version)
try:
client.build(input_file="path/to/document.docx") \
.add_step(tool="convert-to-pdf") \
.add_step(tool="rotate-pages", options={"degrees": 90}) \
.execute(output_path="path/to/final_document.pdf")

Expand All @@ -159,22 +146,25 @@ except APIError as e:
```

### 4.5. Error Handling
The library will use a specific set of custom exceptions for clear error feedback.
The library provides a comprehensive set of custom exceptions for clear error feedback:

- `NutrientError(Exception)`: The base exception for all library-specific errors.
- `AuthenticationError(NutrientError)`: Raised on 401/403 HTTP errors, indicating an invalid or missing API key.
- `APIError(NutrientError)`: Raised for all other general API errors (e.g., 400, 422, 5xx status codes). It will contain the `status_code` and the raw `response_body` from the API for debugging.
- `FileNotFoundError` (Built-in): This standard Python exception will be allowed to propagate if a string path provided as `input_file` does not exist.

## 5. Packaging and Distribution
- **Structure**: The project will follow the standard `src` layout for Python packages.
- **Configuration**: A `pyproject.toml` file will manage project metadata, build configurations, and dependencies.
- **Dependencies**: `requests` will be the only primary runtime dependency.
- **Versioning**: The project will use semantic versioning (e.g., `1.0.0`).
- **Publication**: The package will be configured for easy building (`python -m build`) and uploading to PyPI using `twine`.

## 6. Documentation
- **Tool**: Sphinx or MkDocs will be used to generate the documentation website.
- **API Reference**: An "API Reference" section will be generated automatically from the Python docstrings using `sphinx.ext.autodoc`. Docstrings will be written in a clear, structured format (e.g., Google Style).
- **Tutorials/Guides**: The documentation will include a "Quickstart" guide, a detailed page explaining the Direct vs. Builder APIs, and code examples for common use cases.
- **Deployment**: Documentation will be automatically built and deployed to GitHub Pages via GitHub Actions on merges to the `main` branch.
- `APIError(NutrientError)`: Raised for general API errors (e.g., 400, 422, 5xx status codes). Contains `status_code`, `response_body`, and optional `request_id` attributes.
- `ValidationError(NutrientError)`: Raised when request validation fails, with optional `errors` dictionary.
- `NutrientTimeoutError(NutrientError)`: Raised when requests timeout.
- `FileProcessingError(NutrientError)`: Raised when file processing operations fail.
- `FileNotFoundError` (Built-in): Standard Python exception for missing file paths.

## 5. Implementation Details

### 5.1. Package Structure
- **Layout**: Standard `src` layout with `nutrient_dws` package
- **Configuration**: `pyproject.toml` for project metadata and dependencies
- **Dependencies**: `requests` as sole runtime dependency
- **Versioning**: Semantic versioning starting at `1.0.0`

### 5.2. File Handling Optimizations
- **Large Files**: Files >10MB are streamed rather than loaded into memory
- **Input Types**: Support for `str` paths, `bytes`, `Path` objects, and file-like objects
- **Output**: Returns `bytes` by default, or saves to `output_path` when provided
Loading