Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
fb96026
feat: enhance integration test workflow
jdrhyne Jun 20, 2025
7e6e9d1
Add basic smoke test for integration setup
jdrhyne Jun 20, 2025
cd9ba08
feat: enhance integration test workflow
jdrhyne Jun 20, 2025
82b4db3
Merge pull request #5 from PSPDFKit/add-integration-tests-ci
jdrhyne Jun 20, 2025
6c09942
release: bump version to 1.0.1
jdrhyne Jun 20, 2025
c29a248
fix: update license format to modern SPDX expression
jdrhyne Jun 20, 2025
670a273
fix: comprehensive test coverage and CI pipeline fixes
jdrhyne Jun 20, 2025
6571053
fix: use compatible license format for Python 3.8
jdrhyne Jun 20, 2025
2370092
fix: resolve mypy type checking errors for Python 3.11
jdrhyne Jun 20, 2025
bd3a791
docs: add PyPI badges and changelog
jdrhyne Jun 20, 2025
eca9aa8
fix: update GitHub repository URLs from jdrhyne to PSPDFKit
jdrhyne Jun 20, 2025
77456bd
fix: resolve linting errors in test files
jdrhyne Jun 20, 2025
571f638
Merge pull request #6 from PSPDFKit/update-readme-badges
jdrhyne Jun 20, 2025
1118243
build(deps): bump codecov/codecov-action from 4 to 5 (#2)
dependabot[bot] Jun 21, 2025
ab8e3e3
Review openapi compliance (#23)
jdrhyne Jun 21, 2025
882cf6b
feat: integrate fork features with comprehensive Direct API methods a…
jdrhyne Jun 22, 2025
0e58954
Update CLAUDE.md
msch-nutrient Jun 23, 2025
63fea08
fix: update Codecov action configuration
jdrhyne Jun 23, 2025
cf534d6
feat: improve CI integration test strategy and fix Codecov configuration
jdrhyne Jun 23, 2025
0804070
test: add missing test data and update integration tests for multi-pa…
jdrhyne Jun 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 95 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
name: CI

# Integration Test Strategy:
# - Fork PRs: Cannot access secrets, so integration tests are skipped with informative feedback
# - Same-repo PRs: Have access to secrets, integration tests run normally
# - Push to main/develop: Integration tests always run to catch any issues after merge
# - Manual trigger: Allows maintainers to run integration tests on demand
#
# This ensures security while still validating integration tests before release

on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main, develop ]
# Run integration tests after PR is merged
workflow_dispatch: # Allow manual trigger for integration tests

jobs:
test:
Expand Down Expand Up @@ -47,25 +57,32 @@ jobs:
run: python -m pytest tests/unit/ -v --cov=nutrient_dws --cov-report=xml --cov-report=term

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
files: ./coverage.xml
flags: unittests
name: codecov-umbrella
fail_ci_if_error: false

integration-test:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
# Run on: pushes to main/develop, PRs from same repo, and manual triggers
if: |
github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository)
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']

steps:
- uses: actions/checkout@v4

- name: Set up Python 3.12
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: '3.12'
python-version: ${{ matrix.python-version }}

- name: Cache pip dependencies
uses: actions/cache@v4
Expand All @@ -80,7 +97,29 @@ jobs:
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: Check for API key availability
run: |
if [ -z "${{ secrets.NUTRIENT_DWS_API_KEY }}" ]; then
echo "::warning::NUTRIENT_DWS_API_KEY secret not found, skipping integration tests"
echo "skip_tests=true" >> $GITHUB_ENV

# Provide context about why this might be happening
if [ "${{ github.event_name }}" == "pull_request" ]; then
if [ "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.repository }}" ]; then
echo "::notice::This appears to be a PR from a fork. Secrets are not available for security reasons."
else
echo "::error::This is a PR from the same repository but the API key is missing. Please check repository secrets configuration."
fi
else
echo "::error::Running on ${{ github.event_name }} event but API key is missing. Please configure NUTRIENT_DWS_API_KEY secret."
fi
else
echo "::notice::API key found, integration tests will run"
echo "skip_tests=false" >> $GITHUB_ENV
fi

- name: Create integration config with API key
if: env.skip_tests != 'true'
run: |
python -c "
import os
Expand All @@ -91,8 +130,58 @@ jobs:
NUTRIENT_DWS_API_KEY: ${{ secrets.NUTRIENT_DWS_API_KEY }}

- name: Run integration tests
if: env.skip_tests != 'true'
run: python -m pytest tests/integration/ -v

- name: Cleanup integration config
if: always()
run: rm -f tests/integration/integration_config.py

# Provide feedback for fork PRs where integration tests can't run
integration-test-fork-feedback:
runs-on: ubuntu-latest
if: |
github.event_name == 'pull_request' &&
github.event.pull_request.head.repo.full_name != github.repository
steps:
- name: Comment on PR about integration tests
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const issue_number = context.issue.number;
const owner = context.repo.owner;
const repo = context.repo.repo;

// Check if we've already commented
const comments = await github.rest.issues.listComments({
owner,
repo,
issue_number,
});

const botComment = comments.data.find(comment =>
comment.user.type === 'Bot' &&
comment.body.includes('Integration tests are skipped for pull requests from forks')
);

if (!botComment) {
await github.rest.issues.createComment({
owner,
repo,
issue_number,
body: `## Integration Tests Status\n\n` +
`Integration tests are skipped for pull requests from forks due to security restrictions. ` +
`These tests will run automatically after the PR is merged.\n\n` +
`**What this means:**\n` +
`- Unit tests, linting, and type checking have passed ✅\n` +
`- Integration tests require API credentials that aren't available to fork PRs\n` +
`- A maintainer will review your changes and merge if appropriate\n` +
`- Integration tests will run on the main branch after merge\n\n` +
`Thank you for your contribution! 🙏`
});
}

build:
runs-on: ubuntu-latest
needs: test
Expand Down Expand Up @@ -120,4 +209,4 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: dist
path: dist/
path: dist/
107 changes: 41 additions & 66 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,54 @@
# Changelog

All notable changes to the nutrient-dws Python client library will be documented in this file.
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.0.0] - 2024-06-17
## [1.0.1] - 2024-06-20

### Added
- 🎉 First stable release on PyPI
- Comprehensive test suite with 94% coverage (154 tests)
- Full support for Python 3.8 through 3.12
- Type hints for all public APIs
- PyPI package publication

#### Core Features
- **NutrientClient**: Main client class with support for both Direct API and Builder API patterns
- **Direct API Methods**: Convenient methods for single operations:
- `convert_to_pdf()` - Convert Office documents to PDF (uses implicit conversion)
- `flatten_annotations()` - Flatten PDF annotations and form fields
- `rotate_pages()` - Rotate specific or all pages
- `ocr_pdf()` - Apply OCR to make PDFs searchable
- `watermark_pdf()` - Add text or image watermarks
- `apply_redactions()` - Apply existing redaction annotations
- `merge_pdfs()` - Merge multiple PDFs and Office documents

- **Builder API**: Fluent interface for chaining multiple operations:
```python
client.build(input_file="document.docx") \
.add_step("rotate-pages", {"degrees": 90}) \
.add_step("ocr-pdf", {"language": "english"}) \
.execute(output_path="processed.pdf")
```

#### Infrastructure
- **HTTP Client**:
- Connection pooling for performance
- Automatic retry logic with exponential backoff
- Bearer token authentication
- Comprehensive error handling

- **File Handling**:
- Support for multiple input types (paths, Path objects, bytes, file-like objects)
- Automatic streaming for large files (>10MB)
- Memory-efficient processing

- **Exception Hierarchy**:
- `NutrientError` - Base exception
- `AuthenticationError` - API key issues
- `APIError` - General API errors with status codes
- `ValidationError` - Request validation failures
- `TimeoutError` - Request timeouts
- `FileProcessingError` - File operation failures

#### Development Tools
- **Testing**: 82 unit tests with 92.46% code coverage
- **Type Safety**: Full mypy type checking support
- **Linting**: Configured with ruff
- **Pre-commit Hooks**: Automated code quality checks
- **CI/CD**: GitHub Actions for testing, linting, and releases
- **Documentation**: Comprehensive README with examples
### Fixed
- CI pipeline compatibility for all Python versions
- Package metadata format for older setuptools versions
- Type checking errors with mypy strict mode
- File handler edge cases with BytesIO objects

### Changed
- Package name updated from `nutrient` to `nutrient-dws` for PyPI
- Source directory renamed from `src/nutrient` to `src/nutrient_dws`
- API endpoint updated to https://api.pspdfkit.com
- Authentication changed from X-Api-Key header to Bearer token

### Discovered
- **Implicit Document Conversion**: The API automatically converts Office documents (DOCX, XLSX, PPTX) to PDF when processing, eliminating the need for explicit conversion steps
- Improved error messages for better debugging
- Enhanced file handling with proper position restoration
- Updated coverage from 92% to 94%

### Fixed
- Watermark operation now correctly requires width/height parameters
- OCR language codes properly mapped (e.g., "en" → "english")
- All API operations updated to use the Build API endpoint
- Type annotations corrected throughout the codebase
## [1.0.0] - 2024-06-19

### Security
- API keys are never logged or exposed
- Support for environment variable configuration
- Secure handling of authentication tokens

[1.0.0]: https://github.com/jdrhyne/nutrient-dws-client-python/releases/tag/v1.0.0
### Added
- Initial implementation of Direct API with 7 methods:
- `convert_to_pdf` - Convert documents to PDF
- `convert_from_pdf` - Convert PDFs to other formats
- `ocr_pdf` - Perform OCR on PDFs
- `watermark_pdf` - Add watermarks to PDFs
- `flatten_annotations` - Flatten PDF annotations
- `rotate_pages` - Rotate PDF pages
- `merge_pdfs` - Merge multiple PDFs
- Builder API for complex document workflows
- Comprehensive error handling with custom exceptions
- Automatic retry logic with exponential backoff
- File streaming support for large documents
- Full type hints and py.typed marker
- Extensive documentation and examples
- MIT License

### Technical Details
- Built on `requests` library (only dependency)
- Supports file inputs as paths, bytes, or file-like objects
- Memory-efficient processing with streaming
- Connection pooling for better performance

[1.0.1]: https://github.com/PSPDFKit/nutrient-dws-client-python/compare/v1.0.0...v1.0.1
[1.0.0]: https://github.com/PSPDFKit/nutrient-dws-client-python/releases/tag/v1.0.0
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Claude Development Guide for Nutrient DWS Python Client


## Critical Reference
**ALWAYS** refer to `SPECIFICATION.md` before implementing any features. This document contains the complete design specification for the Nutrient DWS Python Client library.

Expand Down
83 changes: 83 additions & 0 deletions CREATE_GITHUB_ISSUES_MANUALLY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Manual GitHub Issue Creation Guide

Since automatic issue creation requires PSPDFKit organization permissions, please follow these steps to manually create the issues:

## Prerequisites
1. Ensure you have write access to the PSPDFKit/nutrient-dws-client-python repository
2. Or request someone with appropriate permissions to create these issues

## Issue Templates Location
All issue templates are in the `github_issues/` directory with the following structure:
- `00_roadmap.md` - Overall enhancement roadmap (create this first)
- `01_multi_language_ocr.md` - Multi-language OCR support
- `02_image_watermark.md` - Image watermark support
- `03_selective_flattening.md` - Selective annotation flattening
- `04_create_redactions.md` - Create redactions method
- `05_import_annotations.md` - Import annotations feature
- `06_extract_pages.md` - Extract page range method
- `07_convert_to_pdfa.md` - PDF/A conversion
- `08_convert_to_images.md` - Image extraction
- `09_extract_content_json.md` - JSON content extraction
- `10_convert_to_office.md` - Office format conversion
- `11_ai_redaction.md` - AI-powered redaction
- `12_digital_signature.md` - Digital signature support
- `13_batch_processing.md` - Batch processing method

## Steps to Create Issues

### Option 1: Using GitHub Web Interface
1. Go to https://github.com/PSPDFKit/nutrient-dws-client-python/issues
2. Click "New issue"
3. For each template file:
- Copy the title from the first line (after the #)
- Copy the entire content into the issue body
- Add the labels listed at the bottom of each template
- Click "Submit new issue"

### Option 2: Using GitHub CLI (if you have permissions)
If you get appropriate permissions, you can run:

```bash
cd /Users/admin/Projects/nutrient-dws-client-python

# Create the roadmap issue first
gh issue create \
--title "Enhancement Roadmap: Comprehensive Feature Plan" \
--body-file github_issues/00_roadmap.md \
--label "roadmap,enhancement,documentation"

# Then create individual feature issues
for i in {01..13}; do
title=$(head -n 1 github_issues/${i}_*.md | sed 's/# //')
labels=$(tail -n 1 github_issues/${i}_*.md | sed 's/- //')
gh issue create \
--title "$title" \
--body-file github_issues/${i}_*.md \
--label "$labels"
done
```

### Option 3: Request Organization Access
1. Contact the PSPDFKit organization administrators
2. Request contributor access to the nutrient-dws-client-python repository
3. Once granted, use the GitHub CLI commands above

## Issue Organization

### Priority Labels
- 🔵 `priority-1`: Enhanced existing methods
- 🟢 `priority-2`: Core missing methods
- 🟡 `priority-3`: Format conversion methods
- 🟠 `priority-4`: Advanced features

### Implementation Phases
- **Phase 1** (1-2 months): Issues 01, 02, 04
- **Phase 2** (2-3 months): Issues 07, 08, 05
- **Phase 3** (3-4 months): Issues 09, 10, 11
- **Phase 4** (4-6 months): Issues 12, 13

## Notes
- Create the roadmap issue (00) first as it provides context for all others
- Each issue is self-contained with implementation details, testing requirements, and examples
- Issues are numbered in suggested implementation order within their priority groups
- All issues follow the same format for consistency
Loading