Skip to content

Commit 492966f

Browse files
Merge branch 'main' into mfioramo-patch-3
2 parents 05f588e + 478e536 commit 492966f

File tree

21 files changed

+377
-10
lines changed

21 files changed

+377
-10
lines changed

ai/generative-ai-service/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Reviewed: 13.11.2024
5858

5959
## Reusable Assets Overview
6060
- [Podcast Generator](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/ai-speech/podcast-generator)
61-
61+
- [Decode Images and Videos with OCI GenAI](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/generative-ai-service/decode-Images-and-Videos-with-OCI-GenAI)
6262

6363
# Useful Links
6464

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
2+
# Decode Images and Videos with OCI GenAI
3+
4+
This is an AI-powered application designed to unlock insights hidden within media files using the Oracle Cloud Infrastructure (OCI) Generative AI services. This application enables users to analyze images and videos, generating detailed summaries in multiple languages. Whether you are a content creator, researcher, or media enthusiast, this app helps you interpret visual content with ease.
5+
6+
<img src="./image.png">
7+
</img>
8+
---
9+
10+
## Features
11+
12+
### 🌍 **Multi-Language Support**
13+
- Receive summaries in your preferred language, including:
14+
- English, French, Arabic, Spanish, Italian, German, Portuguese, Japanese, Korean, and Chinese.
15+
16+
### 🎥 **Customizable Frame Processing for Videos**
17+
- Extract video frames at user-defined intervals.
18+
- Analyze specific frame ranges to tailor your results for precision.
19+
20+
### **Parallel Processing**
21+
- Uses efficient parallel computation for quick and accurate frame analysis.
22+
23+
### 🖼️ **Image Analysis**
24+
- Upload images to generate detailed summaries based on your input prompt.
25+
26+
### 🧠 **Cohesive Summaries**
27+
- Combines individual frame insights to create a seamless, cohesive summary of the video’s overall theme, events, and key details.
28+
29+
---
30+
31+
## Technologies Used
32+
- **[Streamlit](https://streamlit.io/):** For building an interactive user interface.
33+
- **[Oracle Cloud Infrastructure (OCI) Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm):** For powerful image and video content analysis.
34+
- **[OpenCV](https://opencv.org/):** For video frame extraction and processing.
35+
- **[Pillow (PIL)](https://pillow.readthedocs.io/):** For image handling and processing.
36+
- **[tqdm](https://tqdm.github.io/):** For progress visualization in parallel processing.
37+
38+
---
39+
40+
## Installation
41+
42+
1. **Clone the repository:**
43+
44+
45+
2. **Install dependencies:**
46+
Make sure you have Python 3.8+ installed. Then, install the required libraries:
47+
```bash
48+
pip install -r requirements.txt
49+
```
50+
51+
3. **Configure OCI:**
52+
- Set up your OCI configuration by creating or updating the `~/.oci/config` file with your credentials and profile.
53+
- Replace placeholders like `compartmentId`, `llm_service_endpoint`, and `visionModel` in the code with your actual values.
54+
55+
---
56+
57+
## Usage
58+
59+
1. **Run the application:**
60+
```bash
61+
streamlit run app.py
62+
```
63+
64+
2. **Upload a file:**
65+
- Use the sidebar to upload an image (`.png`, `.jpg`, `.jpeg`) or a video (`.mp4`, `.avi`, `.mov`).
66+
67+
3. **Set parameters:**
68+
- For videos, adjust the frame extraction interval and select specific frame ranges for analysis.
69+
70+
4. **Analyze and summarize:**
71+
- Enter a custom prompt to guide the AI in generating a meaningful summary.
72+
- Choose the output language from the sidebar.
73+
74+
5. **Get results:**
75+
- View detailed image summaries or cohesive video summaries directly in the app.
76+
77+
---
78+
79+
## Screenshots
80+
### Image Analysis
81+
<img src="./image2.png">
82+
</img>
83+
84+
### Video Analysis
85+
<img src="./image3.png">
86+
</img>
87+
88+
---
89+
90+
91+
## Acknowledgments
92+
- Oracle Cloud Infrastructure Generative AI for enabling state-of-the-art visual content analysis.
93+
- Open-source libraries like OpenCV, Pillow, and Streamlit for providing powerful tools to build this application.
94+
95+
---
96+
97+
## Contact
98+
If you have questions or feedback, feel free to reach out via [[email protected]](mailto:[email protected]).
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Author: Ansh
2+
import streamlit as st
3+
import oci
4+
import base64
5+
import cv2
6+
from PIL import Image
7+
from concurrent.futures import ThreadPoolExecutor
8+
from tqdm import tqdm
9+
10+
# OCI Configuration
11+
compartmentId = "ocid1.compartment.oc1..XXXXXXXXXXXXXxxxxxxxxxxxxxxxxxxxxxxxxxxx"
12+
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
13+
CONFIG_PROFILE = "DEFAULT"
14+
visionModel = "meta.llama-3.2-90b-vision-instruct"
15+
summarizeModel = "cohere.command-r-plus-08-2024"
16+
config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)
17+
llm_client = oci.generative_ai_inference.GenerativeAiInferenceClient(
18+
config=config,
19+
service_endpoint=llm_service_endpoint,
20+
retry_strategy=oci.retry.NoneRetryStrategy(),
21+
timeout=(10, 240)
22+
)
23+
24+
# Functions for Image Analysis
25+
def encode_image(image_path):
26+
with open(image_path, "rb") as image_file:
27+
return base64.b64encode(image_file.read()).decode("utf-8")
28+
29+
# Functions for Video Analysis
30+
def encode_cv2_image(frame):
31+
_, buffer = cv2.imencode('.jpg', frame)
32+
return base64.b64encode(buffer).decode("utf-8")
33+
34+
# Common Functions
35+
def get_message(encoded_image=None, user_prompt=None):
36+
content1 = oci.generative_ai_inference.models.TextContent()
37+
content1.text = user_prompt
38+
39+
message = oci.generative_ai_inference.models.UserMessage()
40+
message.content = [content1]
41+
42+
if encoded_image:
43+
content2 = oci.generative_ai_inference.models.ImageContent()
44+
image_url = oci.generative_ai_inference.models.ImageUrl()
45+
image_url.url = f"data:image/jpeg;base64,{encoded_image}"
46+
content2.image_url = image_url
47+
message.content.append(content2)
48+
return message
49+
50+
def get_chat_request(encoded_image=None, user_prompt=None):
51+
chat_request = oci.generative_ai_inference.models.GenericChatRequest()
52+
chat_request.messages = [get_message(encoded_image, user_prompt)]
53+
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC
54+
chat_request.num_generations = 1
55+
chat_request.is_stream = False
56+
chat_request.max_tokens = 500
57+
chat_request.temperature = 0.75
58+
chat_request.top_p = 0.7
59+
chat_request.top_k = -1
60+
chat_request.frequency_penalty = 1.0
61+
return chat_request
62+
63+
def cohere_chat_request(encoded_image=None, user_prompt=None):
64+
print(" i am here")
65+
chat_request = oci.generative_ai_inference.models.CohereChatRequest()
66+
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_COHERE
67+
message = get_message(encoded_image, user_prompt)
68+
chat_request.message = message.content[0].text
69+
chat_request.is_stream = False
70+
chat_request.preamble_override = "Make sure you answer only in "+ lang_type
71+
chat_request.max_tokens = 500
72+
chat_request.temperature = 0.75
73+
chat_request.top_p = 0.7
74+
chat_request.top_k = 0
75+
chat_request.frequency_penalty = 1.0
76+
return chat_request
77+
78+
79+
def get_chat_detail(chat_request,model):
80+
chat_detail = oci.generative_ai_inference.models.ChatDetails()
81+
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id=model)
82+
chat_detail.compartment_id = compartmentId
83+
chat_detail.chat_request = chat_request
84+
return chat_detail
85+
86+
def extract_frames(video_path, interval=1):
87+
frames = []
88+
cap = cv2.VideoCapture(video_path)
89+
frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
90+
success, frame = cap.read()
91+
count = 0
92+
93+
while success:
94+
if count % (frame_rate * interval) == 0:
95+
frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
96+
success, frame = cap.read()
97+
count += 1
98+
cap.release()
99+
return frames
100+
101+
def process_frame(llm_client, frame, prompt):
102+
encoded_image = encode_cv2_image(frame)
103+
try:
104+
llm_request = get_chat_request(encoded_image, prompt)
105+
llm_payload = get_chat_detail(llm_request,visionModel)
106+
llm_response = llm_client.chat(llm_payload)
107+
return llm_response.data.chat_response.choices[0].message.content[0].text
108+
except Exception as e:
109+
return f"Error processing frame: {str(e)}"
110+
111+
def process_frames_parallel(llm_client, frames, prompt):
112+
with ThreadPoolExecutor() as executor:
113+
results = list(tqdm(
114+
executor.map(lambda frame: process_frame(llm_client, frame, prompt), frames),
115+
total=len(frames),
116+
desc="Processing frames"
117+
))
118+
return results
119+
120+
def generate_final_summary(llm_client, frame_summaries):
121+
combined_summaries = "\n".join(frame_summaries)
122+
final_prompt = (
123+
"You are a video content summarizer. Below are summaries of individual frames extracted from a video. "
124+
"Using these frame summaries, create a cohesive and concise summary that describes the content of the video as a whole. "
125+
"Focus on providing insights about the overall theme, events, or key details present in the video, and avoid referring to individual frames or images explicitly.\n\n"
126+
f"{combined_summaries}"
127+
)
128+
try:
129+
llm_request = cohere_chat_request(user_prompt=final_prompt)
130+
llm_payload = get_chat_detail(llm_request,summarizeModel)
131+
llm_response = llm_client.chat(llm_payload)
132+
return llm_response.data.chat_response.text
133+
except Exception as e:
134+
return f"Error generating final summary: {str(e)}"
135+
136+
# Streamlit UI
137+
st.title("Decode Images and Videos with OCI GenAI")
138+
uploaded_file = st.sidebar.file_uploader("Upload an image or video", type=["png", "jpg", "jpeg", "mp4", "avi", "mov"])
139+
user_prompt = st.text_input("Enter your prompt for analysis:", value="Describe the content of this image.")
140+
lang_type = st.sidebar.selectbox("Output Language", ["English", "French", "Arabic", "Spanish", "Italian", "German", "Portuguese", "Japanese", "Korean", "Chinese"])
141+
142+
if uploaded_file:
143+
if uploaded_file.name.split('.')[-1].lower() in ["png", "jpg", "jpeg"]:
144+
# Image Analysis
145+
temp_image_path = "temp_uploaded_image.jpg"
146+
with open(temp_image_path, "wb") as f:
147+
f.write(uploaded_file.getbuffer())
148+
149+
st.image(temp_image_path, caption="Uploaded Image", width=500)
150+
151+
if st.button("Generate image Summary"):
152+
with st.spinner("Analyzing the image..."):
153+
try:
154+
encoded_image = encode_image(temp_image_path)
155+
llm_request = get_chat_request(encoded_image, user_prompt)
156+
llm_payload = get_chat_detail(llm_request,visionModel)
157+
llm_response = llm_client.chat(llm_payload)
158+
llm_text = llm_response.data.chat_response.choices[0].message.content[0].text
159+
st.success("OCI gen AI Response:")
160+
st.write(llm_text)
161+
except Exception as e:
162+
st.error(f"An error occurred: {str(e)}")
163+
elif uploaded_file.name.split('.')[-1].lower() in ["mp4", "avi", "mov"]:
164+
165+
# Video Analysis
166+
temp_video_path = "temp_uploaded_video.mp4"
167+
video_html = f"""
168+
<video width="600" height="300" controls>
169+
<source src="data:video/mp4;base64,{base64.b64encode(open(temp_video_path, 'rb').read()).decode()}" type="video/mp4">
170+
Your browser does not support the video tag.
171+
</video>
172+
"""
173+
st.markdown(video_html, unsafe_allow_html=True)
174+
with open(temp_video_path, "wb") as f:
175+
f.write(uploaded_file.getbuffer())
176+
177+
# st.video(temp_video_path)
178+
st.write("Processing the video...")
179+
180+
frame_interval = st.sidebar.slider("Frame extraction interval (seconds)", 1, 10, 1)
181+
frames = extract_frames(temp_video_path, interval=frame_interval)
182+
num_frames = len(frames)
183+
st.write(f"Total frames extracted: {num_frames}")
184+
185+
frame_range = st.sidebar.slider("Select frame range for analysis", 0, num_frames - 1, (0, num_frames - 1))
186+
187+
if st.button("Generate Video Summary"):
188+
with st.spinner("Analyzing selected frames..."):
189+
try:
190+
selected_frames = frames[frame_range[0]:frame_range[1] + 1]
191+
waiting_message = st.empty()
192+
waiting_message.write(f"Selected {len(selected_frames)} frames for processing.")
193+
# st.write(f"Selected {len(selected_frames)} frames for processing.")
194+
frame_summaries = process_frames_parallel(llm_client, selected_frames, user_prompt)
195+
# st.write("Generating final video summary...")
196+
waiting_message.empty()
197+
waiting_message.write("Generating final video summary...")
198+
final_summary = generate_final_summary(llm_client, frame_summaries)
199+
waiting_message.empty()
200+
st.success("Video Summary:")
201+
st.write(final_summary)
202+
except Exception as e:
203+
st.error(f"An error occurred: {str(e)}")
343 KB
Loading
455 KB
Loading
619 KB
Loading
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
streamlit==1.33.0
2+
oci==3.50.1
3+
Pillow
4+
opencv-python-headless==4.10.0.84
5+
tqdm==4.66.1
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright (c) 2024 Oracle and/or its affiliates.
2+
3+
The Universal Permissive License (UPL), Version 1.0
4+
5+
Subject to the condition set forth below, permission is hereby granted to any
6+
person obtaining a copy of this software, associated documentation and/or data
7+
(collectively the "Software"), free of charge and under any and all copyright
8+
rights in the Software, and any and all patent rights owned or freely
9+
licensable by each licensor hereunder covering either (i) the unmodified
10+
Software as contributed to or provided by such licensor, or (ii) the Larger
11+
Works (as defined below), to deal in both
12+
13+
(a) the Software, and
14+
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
one is included with the Software (each a "Larger Work" to which the Software
16+
is contributed by such licensors),
17+
18+
without restriction, including without limitation the rights to copy, create
19+
derivative works of, display, perform, and distribute the Software and make,
20+
use, sell, offer for sale, import, export, have made, and have sold the
21+
Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
either these or other terms.
23+
24+
This license is subject to the following condition:
25+
The above copyright notice and either this complete permission notice or at
26+
a minimum a reference to the UPL must be included in all copies or
27+
substantial portions of the Software.
28+
29+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
SOFTWARE.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# OIC Adapters clickthrough presentations
2+
3+
Assets that contain oic adapters configuration and implementation practice for the
4+
- EPM Adapter
5+
- ERP Adapter
6+
- HCM Adapter
7+
- Kafka Adapter
8+
- EBS Adapter
9+
10+
Review Date: 28.11.2024
11+
12+
# When to use these assets?
13+
14+
These assets should be used whenever needed to design solutions with mentioned applications and resources integration.
15+
16+
# How to use these asset?
17+
18+
The information is generic in nature and not specified for a particular customer.
19+
20+
# License
21+
22+
Copyright (c) 2024 Oracle and/or its affiliates.
23+
24+
Licensed under the Universal Permissive License (UPL), Version 1.0.
25+
26+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.

0 commit comments

Comments
 (0)