Skip to content

Commit 984c6b8

Browse files
committed
anshuman decode-Images-and-Videos-with-OCI-GenAI
1 parent 0487189 commit 984c6b8

File tree

7 files changed

+307
-1
lines changed

7 files changed

+307
-1
lines changed

ai/generative-ai-service/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Reviewed: 13.11.2024
5858

5959
## Reusable Assets Overview
6060
- [Podcast Generator](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/ai-speech/podcast-generator)
61-
61+
- [Decode Images and Videos with OCI GenAI](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/generative-ai-service/decode-Images-and-Videos-with-OCI-GenAI)
6262

6363
# Useful Links
6464

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
2+
# Decode Images and Videos with OCI GenAI
3+
4+
This is an AI-powered application designed to unlock insights hidden within media files using the Oracle Cloud Infrastructure (OCI) Generative AI services. This application enables users to analyze images and videos, generating detailed summaries in multiple languages. Whether you are a content creator, researcher, or media enthusiast, this app helps you interpret visual content with ease.
5+
6+
<img src="./image.png">
7+
</img>
8+
---
9+
10+
## Features
11+
12+
### 🌍 **Multi-Language Support**
13+
- Receive summaries in your preferred language, including:
14+
- English, French, Arabic, Spanish, Italian, German, Portuguese, Japanese, Korean, and Chinese.
15+
16+
### 🎥 **Customizable Frame Processing for Videos**
17+
- Extract video frames at user-defined intervals.
18+
- Analyze specific frame ranges to tailor your results for precision.
19+
20+
### **Parallel Processing**
21+
- Uses efficient parallel computation for quick and accurate frame analysis.
22+
23+
### 🖼️ **Image Analysis**
24+
- Upload images to generate detailed summaries based on your input prompt.
25+
26+
### 🧠 **Cohesive Summaries**
27+
- Combines individual frame insights to create a seamless, cohesive summary of the video’s overall theme, events, and key details.
28+
29+
---
30+
31+
## Technologies Used
32+
- **[Streamlit](https://streamlit.io/):** For building an interactive user interface.
33+
- **[Oracle Cloud Infrastructure (OCI) Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm):** For powerful image and video content analysis.
34+
- **[OpenCV](https://opencv.org/):** For video frame extraction and processing.
35+
- **[Pillow (PIL)](https://pillow.readthedocs.io/):** For image handling and processing.
36+
- **[tqdm](https://tqdm.github.io/):** For progress visualization in parallel processing.
37+
38+
---
39+
40+
## Installation
41+
42+
1. **Clone the repository:**
43+
44+
45+
2. **Install dependencies:**
46+
Make sure you have Python 3.8+ installed. Then, install the required libraries:
47+
```bash
48+
pip install -r requirements.txt
49+
```
50+
51+
3. **Configure OCI:**
52+
- Set up your OCI configuration by creating or updating the `~/.oci/config` file with your credentials and profile.
53+
- Replace placeholders like `compartmentId`, `llm_service_endpoint`, and `visionModel` in the code with your actual values.
54+
55+
---
56+
57+
## Usage
58+
59+
1. **Run the application:**
60+
```bash
61+
streamlit run app.py
62+
```
63+
64+
2. **Upload a file:**
65+
- Use the sidebar to upload an image (`.png`, `.jpg`, `.jpeg`) or a video (`.mp4`, `.avi`, `.mov`).
66+
67+
3. **Set parameters:**
68+
- For videos, adjust the frame extraction interval and select specific frame ranges for analysis.
69+
70+
4. **Analyze and summarize:**
71+
- Enter a custom prompt to guide the AI in generating a meaningful summary.
72+
- Choose the output language from the sidebar.
73+
74+
5. **Get results:**
75+
- View detailed image summaries or cohesive video summaries directly in the app.
76+
77+
---
78+
79+
## Screenshots
80+
### Image Analysis
81+
<img src="./image2.png">
82+
</img>
83+
84+
### Video Analysis
85+
<img src="./image3.png">
86+
</img>
87+
88+
---
89+
90+
91+
## Acknowledgments
92+
- Oracle Cloud Infrastructure Generative AI for enabling state-of-the-art visual content analysis.
93+
- Open-source libraries like OpenCV, Pillow, and Streamlit for providing powerful tools to build this application.
94+
95+
---
96+
97+
## Contact
98+
If you have questions or feedback, feel free to reach out via [[email protected]](mailto:[email protected]).
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Author: Ansh
2+
import streamlit as st
3+
import oci
4+
import base64
5+
import cv2
6+
from PIL import Image
7+
from concurrent.futures import ThreadPoolExecutor
8+
from tqdm import tqdm
9+
10+
# OCI Configuration
11+
compartmentId = "ocid1.compartment.oc1..XXXXXXXXXXXXXxxxxxxxxxxxxxxxxxxxxxxxxxxx"
12+
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
13+
CONFIG_PROFILE = "DEFAULT"
14+
visionModel = "meta.llama-3.2-90b-vision-instruct"
15+
summarizeModel = "cohere.command-r-plus-08-2024"
16+
config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)
17+
llm_client = oci.generative_ai_inference.GenerativeAiInferenceClient(
18+
config=config,
19+
service_endpoint=llm_service_endpoint,
20+
retry_strategy=oci.retry.NoneRetryStrategy(),
21+
timeout=(10, 240)
22+
)
23+
24+
# Functions for Image Analysis
25+
def encode_image(image_path):
26+
with open(image_path, "rb") as image_file:
27+
return base64.b64encode(image_file.read()).decode("utf-8")
28+
29+
# Functions for Video Analysis
30+
def encode_cv2_image(frame):
31+
_, buffer = cv2.imencode('.jpg', frame)
32+
return base64.b64encode(buffer).decode("utf-8")
33+
34+
# Common Functions
35+
def get_message(encoded_image=None, user_prompt=None):
36+
content1 = oci.generative_ai_inference.models.TextContent()
37+
content1.text = user_prompt
38+
39+
message = oci.generative_ai_inference.models.UserMessage()
40+
message.content = [content1]
41+
42+
if encoded_image:
43+
content2 = oci.generative_ai_inference.models.ImageContent()
44+
image_url = oci.generative_ai_inference.models.ImageUrl()
45+
image_url.url = f"data:image/jpeg;base64,{encoded_image}"
46+
content2.image_url = image_url
47+
message.content.append(content2)
48+
return message
49+
50+
def get_chat_request(encoded_image=None, user_prompt=None):
51+
chat_request = oci.generative_ai_inference.models.GenericChatRequest()
52+
chat_request.messages = [get_message(encoded_image, user_prompt)]
53+
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC
54+
chat_request.num_generations = 1
55+
chat_request.is_stream = False
56+
chat_request.max_tokens = 500
57+
chat_request.temperature = 0.75
58+
chat_request.top_p = 0.7
59+
chat_request.top_k = -1
60+
chat_request.frequency_penalty = 1.0
61+
return chat_request
62+
63+
def cohere_chat_request(encoded_image=None, user_prompt=None):
64+
print(" i am here")
65+
chat_request = oci.generative_ai_inference.models.CohereChatRequest()
66+
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_COHERE
67+
message = get_message(encoded_image, user_prompt)
68+
chat_request.message = message.content[0].text
69+
chat_request.is_stream = False
70+
chat_request.preamble_override = "Make sure you answer only in "+ lang_type
71+
chat_request.max_tokens = 500
72+
chat_request.temperature = 0.75
73+
chat_request.top_p = 0.7
74+
chat_request.top_k = 0
75+
chat_request.frequency_penalty = 1.0
76+
return chat_request
77+
78+
79+
def get_chat_detail(chat_request,model):
80+
chat_detail = oci.generative_ai_inference.models.ChatDetails()
81+
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id=model)
82+
chat_detail.compartment_id = compartmentId
83+
chat_detail.chat_request = chat_request
84+
return chat_detail
85+
86+
def extract_frames(video_path, interval=1):
87+
frames = []
88+
cap = cv2.VideoCapture(video_path)
89+
frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
90+
success, frame = cap.read()
91+
count = 0
92+
93+
while success:
94+
if count % (frame_rate * interval) == 0:
95+
frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
96+
success, frame = cap.read()
97+
count += 1
98+
cap.release()
99+
return frames
100+
101+
def process_frame(llm_client, frame, prompt):
102+
encoded_image = encode_cv2_image(frame)
103+
try:
104+
llm_request = get_chat_request(encoded_image, prompt)
105+
llm_payload = get_chat_detail(llm_request,visionModel)
106+
llm_response = llm_client.chat(llm_payload)
107+
return llm_response.data.chat_response.choices[0].message.content[0].text
108+
except Exception as e:
109+
return f"Error processing frame: {str(e)}"
110+
111+
def process_frames_parallel(llm_client, frames, prompt):
112+
with ThreadPoolExecutor() as executor:
113+
results = list(tqdm(
114+
executor.map(lambda frame: process_frame(llm_client, frame, prompt), frames),
115+
total=len(frames),
116+
desc="Processing frames"
117+
))
118+
return results
119+
120+
def generate_final_summary(llm_client, frame_summaries):
121+
combined_summaries = "\n".join(frame_summaries)
122+
final_prompt = (
123+
"You are a video content summarizer. Below are summaries of individual frames extracted from a video. "
124+
"Using these frame summaries, create a cohesive and concise summary that describes the content of the video as a whole. "
125+
"Focus on providing insights about the overall theme, events, or key details present in the video, and avoid referring to individual frames or images explicitly.\n\n"
126+
f"{combined_summaries}"
127+
)
128+
try:
129+
llm_request = cohere_chat_request(user_prompt=final_prompt)
130+
llm_payload = get_chat_detail(llm_request,summarizeModel)
131+
llm_response = llm_client.chat(llm_payload)
132+
return llm_response.data.chat_response.text
133+
except Exception as e:
134+
return f"Error generating final summary: {str(e)}"
135+
136+
# Streamlit UI
137+
st.title("Decode Images and Videos with OCI GenAI")
138+
uploaded_file = st.sidebar.file_uploader("Upload an image or video", type=["png", "jpg", "jpeg", "mp4", "avi", "mov"])
139+
user_prompt = st.text_input("Enter your prompt for analysis:", value="Describe the content of this image.")
140+
lang_type = st.sidebar.selectbox("Output Language", ["English", "French", "Arabic", "Spanish", "Italian", "German", "Portuguese", "Japanese", "Korean", "Chinese"])
141+
142+
if uploaded_file:
143+
if uploaded_file.name.split('.')[-1].lower() in ["png", "jpg", "jpeg"]:
144+
# Image Analysis
145+
temp_image_path = "temp_uploaded_image.jpg"
146+
with open(temp_image_path, "wb") as f:
147+
f.write(uploaded_file.getbuffer())
148+
149+
st.image(temp_image_path, caption="Uploaded Image", width=500)
150+
151+
if st.button("Generate image Summary"):
152+
with st.spinner("Analyzing the image..."):
153+
try:
154+
encoded_image = encode_image(temp_image_path)
155+
llm_request = get_chat_request(encoded_image, user_prompt)
156+
llm_payload = get_chat_detail(llm_request,visionModel)
157+
llm_response = llm_client.chat(llm_payload)
158+
llm_text = llm_response.data.chat_response.choices[0].message.content[0].text
159+
st.success("OCI gen AI Response:")
160+
st.write(llm_text)
161+
except Exception as e:
162+
st.error(f"An error occurred: {str(e)}")
163+
elif uploaded_file.name.split('.')[-1].lower() in ["mp4", "avi", "mov"]:
164+
165+
# Video Analysis
166+
temp_video_path = "temp_uploaded_video.mp4"
167+
video_html = f"""
168+
<video width="600" height="300" controls>
169+
<source src="data:video/mp4;base64,{base64.b64encode(open(temp_video_path, 'rb').read()).decode()}" type="video/mp4">
170+
Your browser does not support the video tag.
171+
</video>
172+
"""
173+
st.markdown(video_html, unsafe_allow_html=True)
174+
with open(temp_video_path, "wb") as f:
175+
f.write(uploaded_file.getbuffer())
176+
177+
# st.video(temp_video_path)
178+
st.write("Processing the video...")
179+
180+
frame_interval = st.sidebar.slider("Frame extraction interval (seconds)", 1, 10, 1)
181+
frames = extract_frames(temp_video_path, interval=frame_interval)
182+
num_frames = len(frames)
183+
st.write(f"Total frames extracted: {num_frames}")
184+
185+
frame_range = st.sidebar.slider("Select frame range for analysis", 0, num_frames - 1, (0, num_frames - 1))
186+
187+
if st.button("Generate Video Summary"):
188+
with st.spinner("Analyzing selected frames..."):
189+
try:
190+
selected_frames = frames[frame_range[0]:frame_range[1] + 1]
191+
waiting_message = st.empty()
192+
waiting_message.write(f"Selected {len(selected_frames)} frames for processing.")
193+
# st.write(f"Selected {len(selected_frames)} frames for processing.")
194+
frame_summaries = process_frames_parallel(llm_client, selected_frames, user_prompt)
195+
# st.write("Generating final video summary...")
196+
waiting_message.empty()
197+
waiting_message.write("Generating final video summary...")
198+
final_summary = generate_final_summary(llm_client, frame_summaries)
199+
waiting_message.empty()
200+
st.success("Video Summary:")
201+
st.write(final_summary)
202+
except Exception as e:
203+
st.error(f"An error occurred: {str(e)}")
343 KB
Loading
455 KB
Loading
619 KB
Loading
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
streamlit==1.33.0
2+
oci==3.50.1
3+
Pillow
4+
opencv-python-headless==4.10.0.84
5+
tqdm==4.66.1

0 commit comments

Comments
 (0)