PDF processing reports success but returns empty content

# Bug Report: PDF Processing Returns Empty Content

## Summary
DocStrange processes PDF successfully (reports "1 successful") but returns empty content for all output formats. The PDF is valid and readable by other tools.

## Environment
- **docstrange version**: 1.1.5
- **OS**: macOS (Darwin)
- **Python**: (from mise/pipx installation)
- **Authentication**: Authenticated cloud mode (10k/month free calls)
- **PDF details**:
  - File: `2512.14012.pdf` (likely arXiv paper)
  - Size: 1.4 MB
  - Pages: 11 (confirmed with `file` command)
  - Format: PDF 1.7

## Steps to Reproduce

1. **Basic markdown conversion (fails)**:
```bash
docstrange ~/Downloads/2512.14012.pdf --output markdown --verbose
```

**Output**:
```
Processing: /Users/ramarivera/Downloads/2512.14012.pdf

Summary: 1 successful, 0 failed
Initialized extractor in cloud mode:
  - Output format: markdown
  - Auth: authenticated (10k/month) free calls

[empty output]
```

2. **JSON with field extraction (fails)**:
```bash
docstrange ~/Downloads/2512.14012.pdf --output json --extract-fields title abstract authors
```

**Output**:
```json
{
  "document": {
    "raw_content": ""
  },
  "format": "json_parse_error",
  "error": "Expecting value: line 1 column 1 (char 0)"
}
```

3. **With OCR enabled (fails)**:
```bash
docstrange ~/Downloads/2512.14012.pdf --output markdown --ocr-enabled --verbose
```
Still returns empty content.

4. **With Gemini model (fails)**:
```bash
docstrange ~/Downloads/2512.14012.pdf --model gemini --output markdown --verbose
```
Still returns empty content.

5. **Saving to file (fails)**:
```bash
docstrange ~/Downloads/2512.14012.pdf --output markdown --output-file output.md
```
Creates `output.md` with 0 bytes.

## Expected Behavior
- Should extract text content from the PDF
- Should return markdown/JSON with the document content
- Should not report "successful" if content extraction failed

## Actual Behavior
- Reports "1 successful" in summary
- Returns completely empty content (`raw_content: ""`)
- No error messages or warnings about why extraction failed
- Output files are empty (0 bytes for markdown, error JSON for json format)

## Additional Testing

### ✅ DocStrange works with simple text files:
```bash
echo "Test document" > test.txt
docstrange test.txt --output markdown
```
Returns:
```markdown
# Text Document

Test document
```

### ❌ This specific PDF fails consistently
Tried all combinations of:
- Output formats: markdown, json, text, html
- Models: default (nanonets), gemini
- Flags: --ocr-enabled, --include-images, --preserve-layout
- All produce empty content

## Possible Causes
1. Silent failure in PDF text extraction (no error logged)
2. Cloud API returning empty response without error
3. PDF might have embedded text that's not being detected
4. PDF might need OCR but OCR isn't triggering properly

## Related Issues
- Issue #48 mentions "No content available" for JSON extraction
- Issue #35 mentions accuracy differences between hosted vs local

## Diagnostic Commands
```bash
# Verify PDF is valid
file 2512.14012.pdf
# Output: PDF document, version 1.7, 11 pages

# Check credentials
ls -la ~/.docstrange/credentials.json
# Exists and was created during authentication

# Test with verbose mode
docstrange 2512.14012.pdf --output json --verbose
# Shows successful authentication but empty content
```

## Request
- Could you investigate why PDFs report "successful" but return empty content?
- Should there be more detailed error logging when extraction silently fails?
- Is there a way to get detailed debug logs to see what's happening during processing?

## Sample File
The PDF that fails: `2512.14012.pdf` (arXiv paper, 1.4MB, 11 pages)
I can provide the file if needed for debugging.

---

*Note: Similar issues were searched before filing. This report was created with AI assistance.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF processing reports success but returns empty content #51

Bug Report: PDF Processing Returns Empty Content

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Testing

✅ DocStrange works with simple text files:

❌ This specific PDF fails consistently

Possible Causes

Related Issues

Diagnostic Commands

Request

Sample File

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PDF processing reports success but returns empty content #51

Description

Bug Report: PDF Processing Returns Empty Content

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Testing

✅ DocStrange works with simple text files:

❌ This specific PDF fails consistently

Possible Causes

Related Issues

Diagnostic Commands

Request

Sample File

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions