Skip to content

Commit 00d899e

Browse files
authored
Merge branch 'main' into hyperion_udate_july25
2 parents 55afb08 + 20105e6 commit 00d899e

File tree

271 files changed

+34718
-2102
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

271 files changed

+34718
-2102
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,8 @@ terraform.rc
3939
.DS_Store
4040

4141
#VSC files
42-
.vscode
42+
.vscode
43+
44+
# Exclude cached Python binary files
45+
*.pyc
46+
__pycache__

README.md

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,57 @@
1-
# Welcome to the Oracle Technology Specialists GitHub Repository
1+
# Technology Engineering GitHub Repository
22

3-
## Who are we?
3+
## Introduction
44

5-
We are a team of Oracle specialists focussing on technology cloud and software products. We answer questions, create demos and workshops, do hands-on guides, create code snippets and examples, and do health checks on existing solutions. We help our customers to find solutions to business challenges and to move workloads to Oracle cloud.
5+
### Who are we?
66

7-
## Why do we have a Git Repository?
7+
We are a team of Oracle specialists focusing on cloud and software products. We answer questions, create demos and workshops, provide hands-on guides, develop code snippets and examples, and perform health checks on existing solutions. We help our customers find solutions to business challenges and migrate workloads to the Oracle Cloud.
88

9-
We create all kinds of interesting assets in our line of work, which we believe should be open-source and accessible to everyone. We want to share our architecture patterns for solutions, technical step-by-step guides, and code examples. We believe this will simplify how we work internally in Oracle, while also sharing access to the same assets with our customers, implementation partners, and everyone interested. We want to be transparent on how we work and also grow the quality of our assets and best practices with the contribution coming from the community.
9+
### Why do we have a Git Repository?
1010

11-
## How to use this Repository?
11+
We create various interesting assets in our line of work, which we believe should be open-source and accessible to everyone. We want to share our architecture patterns for solutions, technical step-by-step guides, and code examples. We simplify how we work internally in Oracle, while also sharing access to the same assets with our customers, implementation partners, and everyone interested. We want to be transparent on how we work and also grow the quality of our assets and best practices, with contributions coming from the community.
12+
13+
## Installation
14+
15+
This is not a single software product repository, but a collection of various assets (code or not) in one single 'larger' repository. Thus, you won't 'install' this repository; rather, search it to find various interesting assets.
1216

1317
We structure our assets by our internal product areas and reflect this in the folder structure of this repository. The repository will have at least four levels of folders, starting with the first representing a wider product area (like Infrastructure Cloud), followed by a specific product area (like Compute), followed by a product or cloud service (like Bare Metal Compute), and finally by an asset for that product (like How-to-Guides).
1418

19+
As there is a lot of individual content within this repository and under a wider folder hierarchy, discovering that content is not as easy. Here's how you can find content or navigate the folder structure:
20+
21+
- Manually search via navigation through the folder hierarchy
22+
- Use the search feature of GitHub on the top right-hand side of the browser - it works better if you are logged into your GitHub account. You can search specific code languages and filter content that way. Also, you can search for keywords such as 'Workshop'. Lastly, if you enter the search from a subfolder, you can choose to only search in that folder, hopefully resulting in more useful search results.
23+
- Use direct links to share a specific asset or folder of the repository.
24+
25+
Some of our assets are code-based and can also be installed. There is always a README file within the asset folder, explaining the installation process.
26+
27+
## Documentation
28+
29+
As per the previous chapter on how to install, we have various assets in this one larger repository. If you find a code asset, it will have a README file for installation, documentation, and example purposes.
30+
31+
## Examples
32+
33+
As per the previous chapter on how to install, we have various assets in this one larger repository. If you find a code asset, it will have a README file for installation, documentation, and example purposes.
34+
35+
See the bullet points from the chapter *Installation* list as an example on how to navigate the repository.
36+
37+
## Help
38+
39+
If you find an error, can't find what you were looking for, or would like to suggest a new asset, please use the Issues feature of GitHub (Account login required).
40+
41+
## Contributing
42+
43+
This project welcomes contributions from the community. Before submitting a pull request, please [review our contribution guide](./CONTRIBUTING.md).
44+
45+
## Security
46+
47+
Please consult the [security guide](./SECURITY.md) for our responsible security vulnerability disclosure process.
48+
1549
## License
50+
1651
Copyright (c) 2025 Oracle and/or its affiliates.
1752

1853
Licensed under the Universal Permissive License (UPL), Version 1.0.
1954

2055
See [LICENSE](LICENSE) for more details.
2156

22-
ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.
57+
ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.  FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright (c) 2025 Oracle and/or its affiliates.
2+
3+
The Universal Permissive License (UPL), Version 1.0
4+
5+
Subject to the condition set forth below, permission is hereby granted to any
6+
person obtaining a copy of this software, associated documentation and/or data
7+
(collectively the "Software"), free of charge and under any and all copyright
8+
rights in the Software, and any and all patent rights owned or freely
9+
licensable by each licensor hereunder covering either (i) the unmodified
10+
Software as contributed to or provided by such licensor, or (ii) the Larger
11+
Works (as defined below), to deal in both
12+
13+
(a) the Software, and
14+
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
one is included with the Software (each a "Larger Work" to which the Software
16+
is contributed by such licensors),
17+
18+
without restriction, including without limitation the rights to copy, create
19+
derivative works of, display, perform, and distribute the Software and make,
20+
use, sell, offer for sale, import, export, have made, and have sold the
21+
Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
either these or other terms.
23+
24+
This license is subject to the following condition:
25+
The above copyright notice and either this complete permission notice or at
26+
a minimum a reference to the UPL must be included in all copies or
27+
substantial portions of the Software.
28+
29+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
SOFTWARE.
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# MCP Document Understanding Invoice Agent
2+
3+
The **Document Understanding Agent** is an AI-powered assistant designed to extract and understand text from documents (e.g., PDFs, images) using Oracle Cloud Infrastructure (OCI) Generative AI Agents and Document Understanding services.
4+
5+
This tool demonstrates an end-to-end workflow involving:
6+
7+
- File upload (via React frontend)
8+
- File storage in OCI Object Storage
9+
- Text extraction with OCI Document Understanding
10+
- Summary and reasoning via a GenAI Agent powered by MCP (Model Context Protocol)
11+
12+
The architecture is modular and can be easily extended by adding tools directly from the OCI Console, such as a RAG (Retrieval-Augmented Generation) tool or any other custom MCP-compatible tool, enabling more advanced workflows beyond document extraction—such as contextual question answering, validation, enrichment, or classification
13+
14+
---
15+
16+
## When to use this asset?
17+
18+
Use this assistant when you want to:
19+
20+
- Automatically extract text from scanned documents (images, PDFs)
21+
- Invoke OCI Document Understanding tools through an AI agent
22+
- Demonstrate AI-based document orchestration on OCI and Validation
23+
24+
### Ideal for:
25+
26+
- AI developers building document understanding pipelines
27+
- Oracle Cloud users integrating Generative AI with Object Storage
28+
- Showing document AI capabilities
29+
30+
---
31+
32+
## How to use this asset?
33+
34+
### Start the Backend
35+
36+
Navigate to the backend directory and run:
37+
38+
```bash
39+
cd backend
40+
python mcp_server_docunderstandingobjectextract.py
41+
(In a different terminal)
42+
uvicorn apiserverdocunderstandingobjectextract:app --reload --port 8001
43+
```
44+
45+
This does the following:
46+
47+
- Starts a local MCP server with a tool (`ocr_extract_from_object_storage2`) that wraps OCI Document Understanding.
48+
- Starts a FastAPI server that handles file uploads and routes them to Object Storage + the agent.
49+
50+
### Start the Frontend
51+
52+
In a separate terminal:
53+
54+
```bash
55+
cd oci-genai-agent-llama-react-frontend
56+
npm install
57+
npm run dev
58+
```
59+
60+
You will see a chat interface at [http://localhost:3000](http://localhost:3000), with support for file uploads (PDF, PNG, etc).
61+
62+
When you send a file:
63+
64+
- It's shown as a preview
65+
- Uploaded to the backend
66+
- Saved in OCI Object Storage
67+
- Routed to the GenAI Agent with an instruction like:
68+
69+
```
70+
Extract text from object storage. Namespace: <namespace>, Bucket: <bucket>, Name: <filename>
71+
```
72+
73+
---
74+
75+
## ⚙️ Setup Instructions
76+
77+
### 1. OCI Config
78+
79+
Set the following in `~/.oci/config`:
80+
81+
```ini
82+
[DEFAULT]
83+
user=ocid1.user.oc1..exampleuniqueID
84+
fingerprint=xx:xx:xx:xx
85+
key_file=~/.oci/oci_api_key.pem
86+
tenancy=ocid1.tenancy.oc1..exampleuniqueID
87+
region=us-chicago-1
88+
```
89+
90+
### 2. Object Storage Setup
91+
92+
- Create a bucket (e.g., `bucket-20250714-1419`)
93+
- Make sure the user has permission to `put_object` and `get_namespace`
94+
95+
### 3. MCP Tooling
96+
97+
- `mcp_server_docunderstandingobjectextract.py` exposes a tool `ocr_extract_from_object_storage2`
98+
- This is picked up by the agent automatically on FastAPI startup
99+
100+
---
101+
102+
## ✨ Key Features
103+
104+
| Feature | Description |
105+
| ---------------------- | ----------------------------------------------------------------- |
106+
| File Upload | Upload images or PDFs through the React UI |
107+
| OCI Object Storage | All files are stored securely in your OCI bucket |
108+
| OCR Extraction | Uses OCI Document Understanding to extract text from scanned docs |
109+
| GenAI Agent | Routes and responds to user requests intelligently |
110+
| Tool Orchestration | Agent can invoke tools dynamically (via MCP) |
111+
| Natural Language Reply | AI explains extracted results in human-readable format |
112+
113+
---
114+
115+
## Prompt Customization
116+
117+
The main agent prompt is set as:
118+
119+
```
120+
If the user wants to extract text from a document in Object Storage,
121+
call the `ocr_extract_from_object_storage2` tool with the `namespace`, `bucket`, and `name`.
122+
```
123+
124+
You can modify this in `apiserverdocunderstandingobjectextract.py` under `Agent(...)` setup.
125+
126+
---
127+
128+
## Useful for:
129+
130+
- Demos of OCI Document Understanding + GenAI Agents
131+
- Building your own document processing pipeline
132+
- AI chatbots that take file input and analyze content
133+
134+
---
135+
136+
## Directory Structure
137+
138+
```bash
139+
backend/
140+
├── apiserverdocunderstandingobjectextract.py # FastAPI app
141+
├── mcp_server_docunderstandingobjectextract.py # MCP server with document OCR tool
142+
143+
oci-genai-agent-llama-react-frontend/
144+
├── src/
145+
│ └── app/
146+
│ └── contexts/
147+
│ └── ChatContext.js # Hooks into backend API
148+
```
149+
150+
---
151+
152+
## License
153+
154+
Copyright (c) 2025 Oracle and/or its affiliates.
155+
Licensed under the Universal Permissive License (UPL), Version 1.0.
156+
157+
See LICENSE for more details.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# backend/apiserver.py
2+
import logging
3+
from fastapi import FastAPI, UploadFile, File, Form
4+
from fastapi.middleware.cors import CORSMiddleware
5+
from fastapi.responses import JSONResponse
6+
from rich.logging import RichHandler
7+
from rich.markup import escape
8+
9+
import oci
10+
from mcp.client.session_group import StreamableHttpParameters
11+
from oci.addons.adk import Agent, AgentClient
12+
from oci.addons.adk.mcp import MCPClientStreamableHttp
13+
14+
# — Logging —
15+
logging.basicConfig(level=logging.INFO, format="%(message)s", handlers=[RichHandler()])
16+
logger = logging.getLogger(__name__)
17+
18+
# — FastAPI Setup —
19+
app = FastAPI()
20+
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
21+
22+
BUCKET_NAME = "bucket-20250714-1419"
23+
24+
@app.on_event("startup")
25+
async def startup_event():
26+
logger.info("Connecting to MCP…")
27+
mcp_params = StreamableHttpParameters(url="http://localhost:8000/mcp")
28+
mcp_client = await MCPClientStreamableHttp(params=mcp_params, name="Smart Toolbox MCP").__aenter__()
29+
30+
client = AgentClient(auth_type="api_key", debug=False, region="us-chicago-1")
31+
agent = Agent(
32+
client=client,
33+
agent_endpoint_id="ocid1.genaiagentendpoint",
34+
instructions=(
35+
"If the user wants to extract text from a document in Object Storage, "
36+
"call the `ocr_extract_from_object_storage2` tool with the `namespace`, `bucket`, and `name`."
37+
38+
),
39+
tools=[await mcp_client.as_toolkit()],
40+
)
41+
agent.setup()
42+
app.state.agent = agent
43+
logger.info(" Agent ready.")
44+
45+
@app.post("/chat")
46+
async def chat(message: str = Form(...), file: UploadFile = File(None)):
47+
if file:
48+
try:
49+
file_bytes = await file.read()
50+
file_name = file.filename
51+
52+
config = oci.config.from_file()
53+
obj_client = oci.object_storage.ObjectStorageClient(config)
54+
namespace = obj_client.get_namespace().data
55+
56+
# Upload file
57+
obj_client.put_object(namespace, BUCKET_NAME, file_name, file_bytes)
58+
logger.info(f" Uploaded {file_name} to {BUCKET_NAME}")
59+
60+
# Inject the correct call instruction to the agent
61+
message = f"Extract text from object storage. Namespace: {namespace}, Bucket: {BUCKET_NAME}, Name: {file_name}"
62+
63+
except Exception as e:
64+
logger.exception(" Upload failed")
65+
return JSONResponse({"error": f"Upload failed: {str(e)}"}, status_code=500)
66+
67+
try:
68+
result = await app.state.agent.run_async(message)
69+
70+
if isinstance(result.output, dict) and result.output.get("type") == "function":
71+
name = result.output["name"]
72+
args = result.output["parameters"]
73+
logger.info(" Agent calls tool: %s(%r)", name, args)
74+
75+
tool_out = await app.state.agent.invoke_tool(name, args)
76+
followup = await app.state.agent.run_async({"type": "tool", "name": name, "output": tool_out})
77+
out = followup.output if not isinstance(followup.output, dict) else followup.output.get("text", "")
78+
else:
79+
out = result.output if not isinstance(result.output, dict) else result.output.get("text", "")
80+
81+
logger.info(" Replying: %s", escape(out))
82+
return JSONResponse({"text": out})
83+
84+
except Exception:
85+
logger.exception(" Chat error")
86+
return JSONResponse({"error": "internal error"}, status_code=500)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[DEFAULT]
2+
user=ocid1.user....
3+
fingerprint=c6:4f:6
4+
tenancy=ocid1.tenan...
5+
region=eu-frankfurt-1
6+
key_file=~/.oc
7+
8+

0 commit comments

Comments
 (0)