Skip to content

Commit 5ee8ee9

Browse files
committed
create readme.md
created readme.md
1 parent 8a454d6 commit 5ee8ee9

File tree

1 file changed

+19
-0
lines changed
  • supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag

1 file changed

+19
-0
lines changed
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
2+
# PDF Parsing - Table Extraction
3+
4+
Python notebook demonstrates an alternative approach to parsing PDFs, particularly focusing on extracting and converting tables into a format suitable for search applications such as Retrieval-Augmented Generation (RAG). The notebook leverages Azure OpenAI to process and convert table data from PDFs into plain text for better searchability and indexing.
5+
6+
## Features
7+
- **PDF Table Extraction**: The notebook identifies and parses tables from PDFs.
8+
- **LLM Integration**: Calls Azure OpenAI models to provide a text representation of the extracted tables.
9+
- **Search Optimization**: The parsed table data is processed into a format that can be more easily indexed and searched in Elasticsearch or other vector-based search systems.
10+
11+
## Getting Started
12+
13+
### Prerequisites
14+
- Python 3.x
15+
16+
17+
## Example Use Case
18+
This notebook is ideal for use cases where PDFs contain structured tables that need to be converted into plain text for indexing and search applications in environments like Elasticsearch or similar search systems.
19+

0 commit comments

Comments
 (0)