generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 318
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Issue Description
When attempting to use the file_read tool with PDF files in the Ollama model lab (https://github.com/strands-agents/samples/tree/main/01-tutorials/01-fundamentals/02-model-providers/01-ollama-model), the agent fails to properly read the PDF file.
Steps to Reproduce
- Run the following code in the notebook:
local_agent("Read the file in the path `sample_file/Amazon-com-Inc-2023-Shareholder-Letter.pdf` and summarize it in 5 bullet points.")Error Output
Tool #1: file_read
[GIN] 2025/08/03 - 16:48:52 | 200 | 9.716882005s | 127.0.0.1 | POST "/api/chat"
The error message indicates that the file is in a format that Python's standard library can't handle, specifically the UTF-8 encoding. This could be due to the file containing non-ASCII characters.
To fix this issue, I would recommend using a different method to read the file, such as:
import PyPDF2
# Open the file in read-binary mode
with open(file_path, 'rb') as f:
# Create a PDF reader object
pdf_reader = PyPDF2.PdfFileReader(f)
# Get the number of pages in the PDF document
num_pages = pdf_reader.numPages
# Initialize an empty list to store the text from each page
text = []
Expected Behavior
The agent should be able to read PDF files and extract their content properly using the file_read tool.
Actual Behavior
The agent fails to read PDF files, suggesting to use PyPDF2 library instead, but does not actually implement this solution.
Environment Details
- Date tested: 2025/08/03
- Lab notebook: 01-ollama-model
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working