Skip to content

Commit 8d43c23

Browse files
committed
New asset for analyzing and talking to financial documents with graphs and images
1 parent 990ac80 commit 8d43c23

File tree

4 files changed

+413
-0
lines changed

4 files changed

+413
-0
lines changed
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# 📊 Document Analysis with Graphs
2+
A Streamlit-based application for extracting insights from financial documents by combining text and visual (chart/image) content using Oracle Generative AI.
3+
4+
This tool enables semantic search, summarization, and financial Q&A by leveraging OCI GenAI services — providing rich context-aware answers grounded in both OCR-extracted text and chart images.
5+
6+
Author: **Ali Ottoman**
7+
8+
---
9+
10+
## 🔧 Features
11+
12+
### Multimodal Financial Document Processing
13+
- Upload PDFs or images of corporate financial documents.
14+
- Extract both **textual data** and **visual elements** (charts, tables, graphs).
15+
16+
### Oracle GenAI-Powered Search & QA
17+
- Embed documents using **Cohere Embed v4.0** via OCI Generative AI.
18+
- Use **Llama 4 Maverick** to answer questions with visual + textual reasoning.
19+
- "Super Searcher" mode rewrites your query with **Command A** for enhanced semantic search.
20+
21+
### Semantic Memory & Chat Interface
22+
- Context-aware responses based on prior conversation.
23+
- Semantic search across vectorized chunks using Qdrant.
24+
- Responses grounded in document context + image evidence.
25+
26+
### Summary & Analytics View
27+
- Summarizes uploaded financial reports into key highlights.
28+
- Understand KPIs, trends, and performance across firm sizes and time periods.
29+
30+
---
31+
32+
## 👥 Who Can Use This
33+
34+
**Finance & Strategy Teams**
35+
→ Analyze trends, ratios, and balance sheet insights across time with chart references.
36+
37+
**Business Analysts**
38+
→ Automate exploration of complex PDF documents and balance sheets.
39+
40+
**Developers & AI Engineers**
41+
→ Explore multimodal document Q&A using OCI’s latest GenAI capabilities.
42+
43+
**Anyone using OCI AI Services**
44+
→ Seamlessly integrate this workflow into larger OCI-based analytics pipelines.
45+
46+
---
47+
48+
## 🗂️ Files & Structure
49+
50+
```
51+
.
52+
├── doc_analysis_with_graphs.py # Main Streamlit app
53+
├── config.py # OCI config & model IDs (user-provided)
54+
├── requirements.txt # Python dependencies
55+
└── README.md # You're reading it
56+
```
57+
58+
---
59+
60+
## ⚙️ Setup & Installation
61+
62+
### 1. Clone the Repository
63+
64+
```bash
65+
git clone https://github.com/your-username/your-repo.git
66+
cd your-repo
67+
```
68+
69+
### 2. Configure OCI Credentials
70+
71+
Fill out the `config.py` file:
72+
73+
```python
74+
# config.py
75+
COMPARTMENT_ID = "<your OCI Compartment OCID>"
76+
```
77+
78+
Ensure you also have an OCI config file (usually at `~/.oci/config`) with proper credentials.
79+
80+
### 3. Install Requirements
81+
82+
```bash
83+
pip install -r requirements.txt
84+
```
85+
86+
---
87+
88+
## 🚀 Run the App
89+
90+
```bash
91+
streamlit run doc_analysis_with_graphs.py
92+
```
93+
94+
---
95+
96+
## 📝 How to Use
97+
98+
### 1. Upload your documents
99+
→ PDFs or images containing **financial reports, charts, balance sheets**
100+
101+
### 2. Ask your question
102+
→ Examples:
103+
- “What is the change in ROA from 1990 to 2000?”
104+
- “Summarize the key liquidity trends in small firms.”
105+
- “Explain the data in Chart 2a on debt ratios.”
106+
107+
### 3. View responses
108+
→ AI replies with:
109+
- Financially-grounded insights
110+
- Visual chart references (axes, values)
111+
- Source document images
112+
- NULL if data is unavailable
113+
114+
---
115+
116+
## 🛠️ Customization
117+
118+
- **Enable/disable Super Searcher** to use Command-A for rephrased queries.
119+
- **Change model temperature or token limits** in `ChatOCIGenAI` constructor.
120+
- **Add custom logic** to extend analysis for ratios, ROE, gearing, sector comparison, etc.
121+
122+
---
123+
124+
## 🧠 Example Chat
125+
126+
> **You**: What is the debt-to-assets ratio trend from 1990 to 2000?
127+
>
128+
> **AI**:
129+
> - Debt-to-assets ratio declined from **47% in 1990** to **39% in 2000**.
130+
> - As per **Chart 1a**, small firms saw the sharpest drop post-1997.
131+
> - The Y-axis shows the ratio (%) and the X-axis is the year.
132+
133+
---
134+
135+
## 🔧 OCI Services Used
136+
137+
### 1. **OCI Generative AI – Embeddings**
138+
- Used for vector search on document content.
139+
```python
140+
from langchain_community.embeddings.oci_generative_ai import OCIGenAIEmbeddings
141+
```
142+
143+
### 2. **OCI Generative AI – LLM (Llama 4 Maverick)**
144+
- Used to extract structured insights from text + images.
145+
```python
146+
from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI
147+
```
148+
149+
---
150+
151+
## 🔗 Docs & References
152+
153+
- 📘 [OCI Generative AI Overview](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
154+
- 📘 [OCI Document Understanding](https://docs.oracle.com/en-us/iaas/Content/document-understanding/using/home.htm)
155+
156+
---
157+
158+
## 📄 License
159+
160+
MIT License — see [LICENSE](LICENSE) for details.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
"""
2+
config file
3+
4+
- Add the OCID of your compartment
5+
"""
6+
7+
COMPARTMENT_ID = ""
8+
9+
OTHER_MODELS = [
10+
"meta.llama-4-maverick-17b-128e-instruct-fp8",
11+
"meta.llama-4-scout-17b-16e-instruct"]
12+
13+
MODEL_ID = "cohere.command-a-03-2025"

0 commit comments

Comments
 (0)