You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python notebook demonstrates an alternative approach to parsing PDFs, particularly focusing on extracting and converting tables into a format suitable for search applications such as Retrieval-Augmented Generation (RAG). The notebook leverages Azure OpenAI to process and convert table data from PDFs into plain text for better searchability and indexing.
5
+
6
+
## Features
7
+
-**PDF Table Extraction**: The notebook identifies and parses tables from PDFs.
8
+
-**LLM Integration**: Calls Azure OpenAI models to provide a text representation of the extracted tables.
9
+
-**Search Optimization**: The parsed table data is processed into a format that can be more easily indexed and searched in Elasticsearch or other vector-based search systems.
10
+
11
+
## Getting Started
12
+
13
+
### Prerequisites
14
+
- Python 3.x
15
+
16
+
17
+
## Example Use Case
18
+
This notebook is ideal for use cases where PDFs contain structured tables that need to be converted into plain text for indexing and search applications in environments like Elasticsearch or similar search systems.
0 commit comments