Claude reads PDF using Visual Mode even if citations are disabled

I wanted Anthropic to handle PDFs parsing for my Agent. I followed the [official docs](https://docs.claude.com/en/docs/build-with-claude/pdf-support#document-processing-modes).

I wanted to use Text extraction only ("Converse Document Chat (Original mode - Text extraction only)") which is `Automatically used when citations are not enabled` without visual mode ("Claude PDF Chat (New mode - Full visual understanding)")

So I disabled citations in each document object. This is what my tracing looks like:
```json
{
    "role": "user",
    "content": [
        {
            "type": "document",
            "source": {
                "type": "base64",
                "media_type": "application/pdf",
                "data": "@@@langfuseMedia:type=application/pdf|id=HX8_5o4Ap6brS1uXGy89pK|source=base64_data_uri@@@"
            },
            "title": "root/DE/nature/nature_park/natureparkdata.pdf:1-1",
            "context": "You have access to this file until a system-reminder tells you otherwise.",
            "citations": {
                "enabled": false
            }
        },
        {
            "type": "text",
            "text": "what is this pdf about? use text extraction only. not visual mode."
        }
    ]
}
```

However with a completely "scanned pdf" (just one image) without OCR I get this response:

> **The user is asking me to extract text from a PDF file. Looking at the document content provided, I can see the text content that was already extracted from the PDF**

The tracing shows, that all text was extracted from the image in the pdf. 

```json
[
    {
        "id": "toolu_01CW9ShJRYNpPzR7424jPjHv",
        "input": {},
        "name": "think",
        "type": "tool_use",
        "index": 0,
        "partial_json": {
            "thought": "The user is asking me to extract text from a PDF file. Looking at the document content provided, I can see the text content that was already extracted from the PDF. Let me analyze what's in there:\n\nFrom the PDF content:\n- Title: \"Map Preview\"\n- Text: \"No files available\"\n"
        }
    }
]
```

It seems like Anthropic extracted all data from the image using visual mode. This also corresponds to token usage which is quite high (too high for just reading text). I would like to know how I can prevent this from happening. It looks like disabling citations doesn't do the trick.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude reads PDF using Visual Mode even if citations are disabled #1072

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Claude reads PDF using Visual Mode even if citations are disabled #1072

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions