Skip to content

Commit 3d5fbbe

Browse files
committed
changes after the review: organize OCR files, change to gpt-5.1, update README, refacor sync/async parts
1 parent d790b51 commit 3d5fbbe

16 files changed

+610
-399
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,5 @@ build/
4343
# Custom
4444
*.db
4545
.ruff_cache
46+
ocr_parsing/files/results
47+
ocr_parsing/files/temp_files

README.md

Lines changed: 45 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,20 @@ uv sync
1919
echo "OPENAI_API_KEY=your-key-here" > .env
2020
```
2121

22-
**Note**: Most examples use OpenAI's GPT-4o. Ensure your API key has appropriate permissions and sufficient quota.
22+
**Note**: Most examples use OpenAI's GPT-5.1. Ensure your API key has appropriate permissions and sufficient quota.
23+
24+
## Learning Path
25+
26+
**Recommended order for learning PydanticAI**:
27+
28+
1. **[Direct Model Requests](direct_model_request/)** - Understand basic LLM API calls
29+
2. **[Temperature](temperature/)** - Understand model parameters
30+
3. **[Reasoning Effort](reasoning_effort/)** - Uncover how the reasoning effort may change the model's output
31+
4. **[Basic Sentiment](basic_sentiment/)** - Learn structured outputs with Pydantic
32+
5. **[Dynamic Classification](dynamic_classification/)** - Runtime schema generation
33+
6. **[Bielik](bielik_example/)** - Local models and tools
34+
7. **[History Processor](history_processor/)** - Multi-turn conversations
35+
8. **[OCR Parsing](ocr_parsing_demo/)** - Complex real-world document processing
2336

2437
## Examples Overview
2538

@@ -194,6 +207,36 @@ Most examples use PydanticAI's `Agent` class, which wraps an LLM with:
194207
- Output type schemas for structured responses
195208
- Async/await support for concurrent requests
196209

210+
### Tools
211+
212+
It's worth noticing that since those are examples, most of them are pretty basic. However, it's easy to add an a tool for given agent. Let's look at **[OCR Parsing](ocr_parsing/) code.
213+
214+
Currently the Agent does all the work itself - classifies document, parses the output, does the OCR and so on for every document in the same way. But what if we'd like to have a different behavior based on the document type?
215+
216+
```python
217+
from pydantic_ai import Agent, RunContext
218+
from my_schemas import OCRInvoiceOutput, ReportOcrOutput
219+
220+
# The Agent acts as a router, deciding which tool to call
221+
# based on the document's visual or textual cues.
222+
agent = Agent(
223+
'openai:gpt-5.1',
224+
system_prompt="Analyze the document and use the appropriate tool for parsing."
225+
)
226+
227+
@agent.tool
228+
async def parse_invoice(ctx: RunContext[MyDeps], data: bytes) -> OCRInvoiceOutput:
229+
"""Use this tool when the document is identified as an Invoice."""
230+
# Your specialized OCR & validation logic here
231+
return await ctx.deps.ocr_service.process(data, schema=OCRInvoiceOutput)
232+
233+
@agent.tool
234+
async def parse_report(ctx: RunContext[MyDeps], data: bytes) -> ReportOcrOutput:
235+
"""Use this tool when the document is a multi-page Annual Report."""
236+
# Custom logic for complex reports
237+
return await ctx.deps.ocr_service.process(data, schema=ReportOcrOutput)
238+
```
239+
197240
### Structured Outputs
198241

199242
Examples show how to enforce type safety using Pydantic `BaseModel`:
@@ -265,7 +308,6 @@ Bielik example shows alternative to cloud APIs:
265308
│ ├── 1_basic_ocr_demo.py
266309
│ ├── 2_ocr_with_structured_output.py
267310
│ ├── 3_ocr_validation.py
268-
│ ├── shared_fns.py
269311
│ ├── README.md
270312
│ ├── files/
271313
│ │ ├── samples/ # Sample PDF documents
@@ -275,18 +317,6 @@ Bielik example shows alternative to cloud APIs:
275317
└── README.md
276318
```
277319

278-
## Learning Path
279-
280-
**Recommended order for learning PydanticAI**:
281-
282-
1. **[Direct Model Requests](direct_model_request/)** - Understand basic LLM API calls
283-
2. **[Basic Sentiment](basic_sentiment/)** - Learn structured outputs with Pydantic
284-
3. **[Temperature](temperature/)** - Understand model parameters
285-
4. **[Dynamic Classification](dynamic_classification/)** - Runtime schema generation
286-
5. **[Bielik](bielik_example/)** - Local models and tools
287-
6. **[History Processor](history_processor/)** - Multi-turn conversations
288-
7. **[OCR Parsing](ocr_parsing_demo/)** - Complex real-world document processing
289-
290320
## Common Issues & Troubleshooting
291321

292322
### API Key Issues
@@ -309,7 +339,7 @@ Bielik example shows alternative to cloud APIs:
309339

310340
- **poppler not found**: Install via your package manager (brew/apt/choco)
311341
- **PDF conversion fails**: Ensure PDF is valid and readable
312-
- **Rate limiting**: Reduce semaphore value in `shared_fns.py`
342+
- **Rate limiting**: Reduce semaphore value in `ocr_parsing/shared_fns.py`
313343

314344
See individual example READMEs for specific setup requirements.
315345

history_processor/1_basic_history_handling.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
def main() -> None:
2020
"""Run basic history inspection example."""
2121
# Create a basic agent
22-
agent = Agent(model="openai:gpt-4o", system_prompt="Be a helpful assistant")
22+
agent = Agent(model="openai:gpt-5.1", system_prompt="Be a helpful assistant")
2323

2424
# Run a single inference
2525
prompt = "Tell me a funny joke. Respond in plain text."

history_processor/2_continuous_history.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
def main() -> None:
2222
"""Run multi-turn conversation example."""
2323
# Create agent
24-
agent = Agent(model="openai:gpt-4o", system_prompt="Be a helpful assistant")
24+
agent = Agent(model="openai:gpt-5.1", system_prompt="Be a helpful assistant")
2525

2626
# First turn: Agent generates a joke
2727
prompt_1 = "Provide a really, really funny joke. Respond in plain text."

history_processor/3_history_usage.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
def main() -> None:
2222
"""Run multi-turn conversation with persistence example."""
2323
# Create agent
24-
agent = Agent(model="openai:gpt-4o", system_prompt="Be a helpful assistant")
24+
agent = Agent(model="openai:gpt-5.1", system_prompt="Be a helpful assistant")
2525

2626
# Turn 1: Get initial motto
2727
log.info("\n=== Turn 1 ===")

history_processor/4_history_filtering.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,13 +59,13 @@ def main() -> None:
5959

6060
# Example 1: Summarize only user messages
6161
log.info("\n=== Filtering: User Messages Only ===")
62-
agent_user = Agent("openai:gpt-4o", history_processors=[user_message_filter])
62+
agent_user = Agent("openai:gpt-5.1", history_processors=[user_message_filter])
6363
result_1 = agent_user.run_sync("Please summarize the whole chat history until now.", message_history=history)
6464
log.info(f"Summary (user messages only):\n{result_1.output}")
6565

6666
# Example 2: Attempt to filter only model messages (will fail)
6767
log.info("\n=== Filtering: Model Messages Only ===")
68-
agent_model = Agent("openai:gpt-4o", history_processors=[model_message_filter])
68+
agent_model = Agent("openai:gpt-5.1", history_processors=[model_message_filter])
6969
try:
7070
result_2 = agent_model.run_sync("Please summarize the whole chat history until now.", message_history=history)
7171
log.info(f"Summary (model messages only):\n{result_2.output}")

history_processor/5a_history_length_fixed.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ def main() -> None:
5151

5252
# Create agent with message count limiter
5353
log.info("\n=== Agent with Fixed Message Limit (last 3) ===")
54-
agent_1 = Agent("openai:gpt-4o", history_processors=[keep_last_messages])
54+
agent_1 = Agent("openai:gpt-5.1", history_processors=[keep_last_messages])
5555
result_1 = agent_1.run_sync("What were we talking about?", message_history=history)
5656
log.info(f"Answer (with truncated history):\n{result_1.output}")
5757

history_processor/5b_history_length_dynamic.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
# `tiktoken` is used for OpenAI models, therefore if you're going to
2424
# use different model provided, this bit will need to be changed
2525
# to different tokenizer that corresponding to model used
26-
tokenizer = tiktoken.encoding_for_model("gpt-4o")
26+
tokenizer = tiktoken.encoding_for_model("gpt-5.1")
2727

2828

2929
@dataclass
@@ -58,7 +58,7 @@ def estimate_tokens(messages: list[ModelMessage]) -> int:
5858
# of this example, threshold is set low for the logic to trigger. Usually,
5959
# this value is much bigger and corresponds to used model's context
6060
# window size. To fully utilize model processing capabilities it is best to
61-
# set this value close to context size. For `gpt-4o` model this value is
61+
# set this value close to context size. For `gpt-5.1` model this value is
6262
# equal to 128_000 tokens
6363

6464

@@ -100,7 +100,7 @@ def main() -> None:
100100

101101
log.info("\n=== Agent with Dynamic Token-Based Context Guard ===")
102102
agent_2 = Agent(
103-
"openai:gpt-4o",
103+
"openai:gpt-5.1",
104104
deps_type=MemoryState,
105105
history_processors=[context_guard],
106106
system_prompt="You are a helpful and concise assistant.",

history_processor/5c_history_with_tools.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def run_conversation_with_history_processor(history_processor: Callable[..., lis
7878
log.info(f"\n=== Running with history processor: {processor_name} ===")
7979

8080
# Create agent with history processor
81-
agent = Agent("openai:gpt-4o", system_prompt="You are a helpful and playful assistant", history_processors=[history_processor])
81+
agent = Agent("openai:gpt-5.1", system_prompt="You are a helpful and playful assistant", history_processors=[history_processor])
8282

8383
# Add basic tool
8484
@agent.tool

history_processor/6_persistent_history.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ class ConversationRecord(Base):
3535
id: Unique identifier for the record
3636
question: User prompt/question
3737
answer: Agent response
38-
model_used: Model identifier (e.g., "gpt-4o")
38+
model_used: Model identifier (e.g., "gpt-5.1")
3939
usage: Token usage metadata (input, output, total tokens)
4040
"""
4141

@@ -110,7 +110,7 @@ def main() -> None:
110110
"""Run database persistence example."""
111111
# Initialize agent
112112
log.info("=== Initializing Agent ===")
113-
agent = Agent("openai:gpt-4o", system_prompt=("You are a helpful assistant. Respond concisely and clearly."))
113+
agent = Agent("openai:gpt-5.1", system_prompt=("You are a helpful assistant. Respond concisely and clearly."))
114114

115115
# Run conversation and save to database
116116
log.info("\n=== Running Conversation ===")

0 commit comments

Comments
 (0)