Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 21 additions & 7 deletions ai/generative-ai-service/smart-invoice-extraction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@

An intelligent invoice data extractor built with **OCI Generative AI**, **LangChain**, and **Streamlit**. Upload any invoice PDF and this app will extract structured data like REF. NO., POLICY NO., DATES, etc. using multimodal LLMs.

Reviewed date: 22.09.2025

<img src="./image.png">
</img>

---

## 🚀 Features

- 🔍 Automatically identifies key invoice headers using OCI Vision LLM (LLaMA 3.2 90B Vision)
- 🔍 Automatically identifies key invoice headers using OCI Vision LLM (LLaMA 3.2 90B Vision or Llama 4 Maverick)
- 🤖 Lets you choose what elements to extract (with type selection)
- 🧠 Leverages a text-based LLM (Cohere Command R+) for context-aware value extraction
- 🧪 Outputs data in clean **JSON** and saves to **CSV**
Expand All @@ -32,16 +37,16 @@ An intelligent invoice data extractor built with **OCI Generative AI**, **LangCh
1. **User Uploads Invoice PDF**
The file is uploaded and converted into an image using `pdf2image` (Ensure you upload one page documents ONLY)

2. **Initial Header Detection (LLaMA-3.2 Vision)**
2. **Initial Header Detection (LLaMA-3.2 Vision or Llama 4 Maverick)**
The first page is passed to the multimodal LLM which returns a list of fields that are likely to be useful (e.g., "Policy No.", "Amount", "Underwriter").

3. **User Selects Fields and Types**
A UI allows the user to pick 3 fields from the detected list, and specify their data types (Text, Number, etc.).

4. **Prompt Generation (Cohere Command R+)**
4. **Prompt Generation (Cohere Command A)**
The second LLM generates a custom system prompt to extract those fields as JSON.

5. **Full Invoice Extraction (LLaMA-3.2 Vision)**
5. **Full Invoice Extraction (LLaMA-3.2 Vision or Llama 4 Maverick)**
Each page image is passed into the multimodal LLM using the custom prompt, returning JSON values for the requested fields.

6. **Data Saving & Display**
Expand Down Expand Up @@ -86,8 +91,8 @@ streamlit run app.py
> - Replace all instances of `<YOUR_COMPARTMENT_OCID_HERE>` with your actual **OCI Compartment OCID**
> - Ensure you have access to **OCI Generative AI Services** with correct permissions
> - Update model IDs in the code if needed:
> - Vision model: `meta.llama-3.2-90b-vision-instruct`
> - Text model: `cohere.command-r-plus-08-2024`
> - Vision model: `meta.llama-3.2-90b-vision-instruct` or `meta.llama-4-maverick-17b-128e-instruct-fp8`
> - Text model: `cohere.command-a-03-2025`

---

Expand All @@ -104,4 +109,13 @@ streamlit run app.py
},
...
]
```
```

## License
Copyright (c) 2024 Oracle and/or its affiliates.

Licensed under the Universal Permissive License (UPL), Version 1.0.

See [LICENSE](LICENSE.txt) for more details.

ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.