Skip to content

Commit 3d532d3

Browse files
jesusbraseroppaolucc-it
authored andcommitted
anshuman image to text code
1 parent d4cb6ac commit 3d532d3

File tree

5 files changed

+250
-0
lines changed

5 files changed

+250
-0
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright (c) 2024 Oracle and/or its affiliates.
2+
3+
The Universal Permissive License (UPL), Version 1.0
4+
5+
Subject to the condition set forth below, permission is hereby granted to any
6+
person obtaining a copy of this software, associated documentation and/or data
7+
(collectively the "Software"), free of charge and under any and all copyright
8+
rights in the Software, and any and all patent rights owned or freely
9+
licensable by each licensor hereunder covering either (i) the unmodified
10+
Software as contributed to or provided by such licensor, or (ii) the Larger
11+
Works (as defined below), to deal in both
12+
13+
(a) the Software, and
14+
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
one is included with the Software (each a "Larger Work" to which the Software
16+
is contributed by such licensors),
17+
18+
without restriction, including without limitation the rights to copy, create
19+
derivative works of, display, perform, and distribute the Software and make,
20+
use, sell, offer for sale, import, export, have made, and have sold the
21+
Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
either these or other terms.
23+
24+
This license is subject to the following condition:
25+
The above copyright notice and either this complete permission notice or at
26+
a minimum a reference to the UPL must be included in all copies or
27+
substantial portions of the Software.
28+
29+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
SOFTWARE.
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
2+
# Image-to-Text with Oracle OCI Gen AI
3+
4+
This application is built using **Streamlit** and **Oracle OCI Generative AI**, allowing users to upload an image, input a prompt, and receive a text-based response generated by the AI model. It leverages Oracle's Gen AI Inference API for processing multimodal data (text and image).
5+
6+
Reviewed: 19.11.2024
7+
8+
<img src="./image1.png">
9+
</img>
10+
11+
---
12+
13+
## Features
14+
15+
- Upload an image file (`.png`, `.jpg`, `.jpeg`).
16+
- Provide a natural language prompt describing your query about the image.
17+
- Get a detailed response generated by Oracle's Generative AI model.
18+
- Easy-to-use interface built with Streamlit.
19+
20+
---
21+
22+
## Prerequisites
23+
24+
1. **Oracle OCI Configuration**
25+
- Set up your Oracle Cloud Infrastructure (OCI) account.
26+
- Obtain the following:
27+
- **Compartment OCID**
28+
- **Generative AI Service Endpoint**
29+
- **Model ID** (e.g., `meta.llama-3.2-90b-vision-instruct`).
30+
- Configure your `~/.oci/config` file with your profile details.
31+
32+
2. **Python Environment**
33+
- Install Python 3.8 or later.
34+
- Install required dependencies (see below).
35+
36+
---
37+
38+
## Installation
39+
40+
1. Clone the repository:
41+
42+
43+
2. Install dependencies:
44+
```bash
45+
pip install -r requirements.txt
46+
```
47+
48+
3. Configure OCI:
49+
Ensure your `~/.oci/config` file is set up with the correct credentials and profile.
50+
51+
---
52+
53+
## Usage
54+
55+
1. Run the application:
56+
```bash
57+
streamlit run app.py
58+
```
59+
60+
2. Open the web application in your browser at `http://localhost:8501`.
61+
62+
3. Upload an image and provide a prompt in the text input field. Click **Generate Response** to receive the AI-generated output.
63+
64+
---
65+
66+
## File Structure
67+
68+
```plaintext
69+
.
70+
├── app.py # Main application file
71+
├── requirements.txt # Python dependencies
72+
└── README.md # Project documentation
73+
```
74+
75+
---
76+
77+
## Dependencies
78+
79+
List of dependencies (found in `requirements.txt`):
80+
- **Streamlit**: For creating the web UI.
81+
- **oci**: Oracle Cloud Infrastructure SDK.
82+
- **base64**: For encoding images.
83+
84+
Install them using:
85+
```bash
86+
pip install -r requirements.txt
87+
```
88+
89+
---
90+
91+
## Notes
92+
93+
- Ensure your OCI credentials and Compartment OCID are correct in the script.
94+
- Check the image format and size for compatibility.
95+
- Use the appropriate Generative AI service endpoint for your region.
96+
97+
---
98+
99+
## Troubleshooting
100+
101+
- **Error: `oci.exceptions.ServiceError`**
102+
- Check if your compartment OCID and API keys are configured correctly.
103+
104+
- **Streamlit does not load:**
105+
- Verify that Streamlit is installed and the application is running on the correct port.
106+
107+
108+
109+
---
110+
111+
## Acknowledgments
112+
113+
- [Oracle Cloud Infrastructure (OCI)](https://www.oracle.com/cloud/)
114+
- [Streamlit Documentation](https://docs.streamlit.io/)
115+
116+
For questions or feedback, please contact [[email protected]].
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Author: Ansh
2+
import streamlit as st
3+
import oci
4+
import base64
5+
from PIL import Image
6+
7+
# OCI Configuration ( Put your compartment id below)
8+
compartmentId = "ocid1.compartment.oc1..***************************"
9+
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
10+
11+
# Define functions
12+
def encode_image(image_path):
13+
with open(image_path, "rb") as image_file:
14+
return base64.b64encode(image_file.read()).decode("utf-8")
15+
16+
def get_message(encoded_image, user_prompt):
17+
content1 = oci.generative_ai_inference.models.TextContent()
18+
content1.text = user_prompt
19+
20+
content2 = oci.generative_ai_inference.models.ImageContent()
21+
image_url = oci.generative_ai_inference.models.ImageUrl()
22+
image_url.url = f"data:image/jpeg;base64,{encoded_image}"
23+
content2.image_url = image_url
24+
25+
message = oci.generative_ai_inference.models.UserMessage()
26+
message.content = [content1, content2]
27+
return message
28+
29+
def get_chat_request(encoded_image, user_prompt):
30+
chat_request = oci.generative_ai_inference.models.GenericChatRequest()
31+
chat_request.messages = [get_message(encoded_image, user_prompt)]
32+
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC
33+
chat_request.num_generations = 1
34+
chat_request.is_stream = False
35+
chat_request.max_tokens = 500
36+
chat_request.temperature = 0.75
37+
chat_request.top_p = 0.7
38+
chat_request.top_k = -1
39+
chat_request.frequency_penalty = 1.0
40+
return chat_request
41+
42+
def get_chat_detail(chat_request):
43+
chat_detail = oci.generative_ai_inference.models.ChatDetails()
44+
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="meta.llama-3.2-90b-vision-instruct")
45+
chat_detail.compartment_id = compartmentId
46+
chat_detail.chat_request = chat_request
47+
return chat_detail
48+
49+
# Streamlit UI
50+
st.title("Image to Text with Oci Gen AI")
51+
st.write("Upload an image, provide a prompt, and get a response from Oci Gen AI.")
52+
53+
# Upload image
54+
uploaded_file = st.file_uploader("Upload an image", type=["png", "jpg", "jpeg"])
55+
56+
# Prompt input
57+
user_prompt = st.text_input("Enter your prompt for the image:", value="Tell me about this image.")
58+
59+
if uploaded_file:
60+
# Save the uploaded image temporarily
61+
temp_image_path = "temp_uploaded_image.jpg"
62+
with open(temp_image_path, "wb") as f:
63+
f.write(uploaded_file.getbuffer())
64+
65+
# Display the uploaded image
66+
st.image(temp_image_path, caption="Uploaded Image", use_column_width=True)
67+
68+
# Process and call the model
69+
if st.button("Generate Response"):
70+
with st.spinner("Please wait while the model processes the image..."):
71+
try:
72+
# Encode the image
73+
encoded_image = encode_image(temp_image_path)
74+
75+
# Setup OCI client
76+
CONFIG_PROFILE = "DEFAULT"
77+
config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)
78+
79+
llm_client = oci.generative_ai_inference.GenerativeAiInferenceClient(
80+
config=config,
81+
service_endpoint=llm_service_endpoint,
82+
retry_strategy=oci.retry.NoneRetryStrategy(),
83+
timeout=(10, 240)
84+
)
85+
86+
# Get the chat request and response
87+
llm_request = get_chat_request(encoded_image, user_prompt)
88+
llm_payload = get_chat_detail(llm_request)
89+
llm_response = llm_client.chat(llm_payload)
90+
91+
# Extract and display the response
92+
llm_text = llm_response.data.chat_response.choices[0].message.content[0].text
93+
st.success("Model Response:")
94+
st.write(llm_text)
95+
except Exception as e:
96+
st.error(f"An error occurred: {str(e)}")
50.7 KB
Loading
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
streamlit==1.33.0
2+
oci==3.50.1
3+
Pillow

0 commit comments

Comments
 (0)