-
Notifications
You must be signed in to change notification settings - Fork 394
Description
I wanted Anthropic to handle PDFs parsing for my Agent. I followed the official docs.
I wanted to use Text extraction only ("Converse Document Chat (Original mode - Text extraction only)") which is Automatically used when citations are not enabled without visual mode ("Claude PDF Chat (New mode - Full visual understanding)")
So I disabled citations in each document object. This is what my tracing looks like:
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": "@@@langfuseMedia:type=application/pdf|id=HX8_5o4Ap6brS1uXGy89pK|source=base64_data_uri@@@"
},
"title": "root/DE/nature/nature_park/natureparkdata.pdf:1-1",
"context": "You have access to this file until a system-reminder tells you otherwise.",
"citations": {
"enabled": false
}
},
{
"type": "text",
"text": "what is this pdf about? use text extraction only. not visual mode."
}
]
}However with a completely "scanned pdf" (just one image) without OCR I get this response:
The user is asking me to extract text from a PDF file. Looking at the document content provided, I can see the text content that was already extracted from the PDF
The tracing shows, that all text was extracted from the image in the pdf.
[
{
"id": "toolu_01CW9ShJRYNpPzR7424jPjHv",
"input": {},
"name": "think",
"type": "tool_use",
"index": 0,
"partial_json": {
"thought": "The user is asking me to extract text from a PDF file. Looking at the document content provided, I can see the text content that was already extracted from the PDF. Let me analyze what's in there:\n\nFrom the PDF content:\n- Title: \"Map Preview\"\n- Text: \"No files available\"\n"
}
}
]It seems like Anthropic extracted all data from the image using visual mode. This also corresponds to token usage which is quite high (too high for just reading text). I would like to know how I can prevent this from happening. It looks like disabling citations doesn't do the trick.