Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs

Hello,

I’m working on processing a large number of loosely related PDF files—primarily financial statements such as balance sheets, income statements, and similar documents. In this project, I’m not defining a fixed ontology upfront; instead, I’m relying on the LLM to determine how to interpret and extract information from each document.

Given this use case, I’d like to know: What are the most optimal chunking configurations for this kind of unstructured, heterogeneous input?

Additionally, is there any documentation or best-practice guide that explains the trade-offs between using larger vs. smaller chunk sizes? I’m particularly interested in how chunk size impacts context retention, accuracy of entity/relation extraction, and overall performance when using LLMs for knowledge graph construction.

Any advice or references would be greatly appreciated!

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1370

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance on Optimal Chunking Configuration for LLM-Based Processing of Financial PDFs #1370

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions