Skip to content

Commit 2fc79b2

Browse files
authored
Merge branch 'recodehive:main' into main
2 parents 23e5ccb + 797b396 commit 2fc79b2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+73193
-546
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# NaviBot-Voice-Assistant
2+
3+
To get started with NaviBot-Voice-Assistant, follow these steps:
4+
5+
1. Navigate to the project directory:
6+
7+
```bash
8+
cd NaviBot-Voice-Assistant
9+
```
10+
11+
3. Activating venv (optional)
12+
13+
```bash
14+
conda create -n venv python=3.10+
15+
conda activate venv
16+
```
17+
18+
4. Install dependencies:
19+
20+
```python
21+
pip install -r requirements.txt
22+
```
23+
24+
5. Configure environment variables
25+
```
26+
Rename `.env-sample` to `.env` file
27+
Replace the API your Google API Key,
28+
```
29+
Kindly follow refer to this site for getting [your own key](https://ai.google.dev/tutorials/setup)
30+
<br/>
31+
32+
6. Run the chatbot:
33+
34+
```bash
35+
streamlit run app.py
36+
```
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
import streamlit as st
2+
import pyttsx3
3+
import speech_recognition as sr
4+
from PyPDF2 import PdfReader
5+
from langchain.text_splitter import RecursiveCharacterTextSplitter
6+
import os
7+
from langchain_google_genai import GoogleGenerativeAIEmbeddings
8+
import google.generativeai as genai
9+
from langchain_community.vectorstores import FAISS
10+
from langchain_google_genai import ChatGoogleGenerativeAI
11+
from langchain.chains.question_answering import load_qa_chain
12+
from langchain.prompts import PromptTemplate
13+
from dotenv import load_dotenv
14+
15+
load_dotenv()
16+
os.getenv("GOOGLE_API_KEY")
17+
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
18+
19+
# Initialize pyttsx3 for voice output
20+
engine = pyttsx3.init()
21+
22+
# Function to speak the text
23+
def speak(text):
24+
engine.say(text)
25+
engine.runAndWait()
26+
27+
# Function to listen to voice input
28+
def listen():
29+
r = sr.Recognizer()
30+
with sr.Microphone() as source:
31+
st.write("Listening...")
32+
r.adjust_for_ambient_noise(source)
33+
audio = r.listen(source)
34+
35+
try:
36+
user_input = r.recognize_google(audio)
37+
st.write(f"You said: {user_input}")
38+
return user_input
39+
except sr.UnknownValueError:
40+
st.write("Sorry, I could not understand what you said.")
41+
return None
42+
except sr.RequestError as e:
43+
st.write(f"Could not request results from Google Speech Recognition service; {e}")
44+
return None
45+
46+
47+
def get_pdf_text(pdf_docs):
48+
text=""
49+
for pdf in pdf_docs:
50+
pdf_reader= PdfReader(pdf)
51+
for page in pdf_reader.pages:
52+
text+= page.extract_text()
53+
return text
54+
55+
56+
def get_text_chunks(text):
57+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
58+
chunks = text_splitter.split_text(text)
59+
return chunks
60+
61+
62+
def get_vector_store(text_chunks):
63+
embeddings = GoogleGenerativeAIEmbeddings(model = "models/embedding-001")
64+
vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
65+
vector_store.save_local("faiss_index")
66+
67+
68+
def get_conversational_chain():
69+
70+
prompt_template = """
71+
Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in
72+
provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
73+
Context:\n {context}?\n
74+
Question: \n{question}\n
75+
76+
Answer:
77+
"""
78+
79+
model = ChatGoogleGenerativeAI(model="gemini-pro",
80+
temperature=0.3)
81+
82+
prompt = PromptTemplate(template = prompt_template, input_variables = ["context", "question"])
83+
chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
84+
85+
return chain
86+
87+
88+
def user_input(user_question):
89+
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
90+
91+
# Load the local FAISS index with dangerous deserialization allowed
92+
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
93+
docs = new_db.similarity_search(user_question)
94+
95+
chain = get_conversational_chain()
96+
97+
response = chain(
98+
{"input_documents": docs, "question": user_question},
99+
return_only_outputs=True
100+
)
101+
102+
speak(response["output_text"]) # Speak the response
103+
st.write("Reply: ", response["output_text"])
104+
105+
106+
def main():
107+
st.set_page_config("Beyond GPS Navigation")
108+
st.header("Beyond GPS Navigator for Blind")
109+
110+
user_question = st.text_input("Ask your query")
111+
voice_input_button = st.button("Voice Input")
112+
113+
if voice_input_button:
114+
user_question = listen() # Listen to voice input
115+
if user_question:
116+
user_input(user_question)
117+
118+
if user_question:
119+
user_input(user_question)
120+
121+
with st.sidebar:
122+
st.title("Menu:")
123+
pdf_docs = st.file_uploader("Upload your route data and Click on the Submit & Process Button", accept_multiple_files=True)
124+
if st.button("Submit & Process"):
125+
with st.spinner("Processing..."):
126+
raw_text = get_pdf_text(pdf_docs)
127+
text_chunks = get_text_chunks(raw_text)
128+
get_vector_store(text_chunks)
129+
st.success("Done")
130+
131+
132+
if __name__ == "__main__":
133+
main()
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
import os
2+
import streamlit as st
3+
from dotenv import load_dotenv
4+
import google.generativeai as gen_ai
5+
import speech_recognition as sr
6+
from gtts import gTTS
7+
from tempfile import TemporaryFile
8+
9+
# Load environment variables
10+
load_dotenv()
11+
12+
# Configure Streamlit page settings
13+
st.set_page_config(
14+
page_title="Chat with Gemini-Pro!",
15+
page_icon=":brain:", # Favicon emoji
16+
layout="centered", # Page layout option
17+
)
18+
19+
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
20+
21+
# Set up Google Gemini-Pro AI model
22+
gen_ai.configure(api_key=GOOGLE_API_KEY)
23+
model = gen_ai.GenerativeModel('gemini-pro')
24+
25+
# Function to translate roles between Gemini-Pro and Streamlit terminology
26+
def translate_role_for_streamlit(user_role):
27+
if user_role == "model":
28+
return "assistant"
29+
else:
30+
return user_role
31+
32+
# Function to recognize speech input
33+
def recognize_speech():
34+
r = sr.Recognizer()
35+
with sr.Microphone() as source:
36+
st.write("Speak now...")
37+
audio = r.listen(source)
38+
39+
try:
40+
user_prompt = r.recognize_google(audio)
41+
st.write("You said:", user_prompt)
42+
return user_prompt
43+
except sr.UnknownValueError:
44+
st.write("Sorry, I could not understand your audio.")
45+
return ""
46+
except sr.RequestError as e:
47+
st.write("Could not request results from Google Speech Recognition service; {0}".format(e))
48+
return ""
49+
50+
# Function to output voice
51+
def speak(text):
52+
tts = gTTS(text=text, lang='en')
53+
with TemporaryFile(suffix=".wav", delete=False) as f:
54+
tts.write_to_fp(f)
55+
filename = f.name
56+
st.audio(filename, format='audio/wav')
57+
58+
59+
# Initialize chat session in Streamlit if not already present
60+
if "chat_session" not in st.session_state:
61+
st.session_state.chat_session = model.start_chat(history=[])
62+
63+
# Display the chatbot's title on the page
64+
st.title("🤖 Gemini Pro - ChatBot")
65+
66+
# Display the chat history
67+
for message in st.session_state.chat_session.history:
68+
with st.chat_message(translate_role_for_streamlit(message.role)):
69+
st.markdown(message.parts[0].text)
70+
71+
# Input field for user's message
72+
voice_input = st.checkbox("Voice Input")
73+
if voice_input:
74+
user_prompt = recognize_speech()
75+
else:
76+
user_prompt = st.text_input("Ask Gemini-Pro...")
77+
78+
if user_prompt:
79+
# Add user's message to chat and display it
80+
st.chat_message("user").markdown(user_prompt)
81+
82+
# Send user's message to Gemini-Pro and get the response
83+
gemini_response = st.session_state.chat_session.send_message(user_prompt)
84+
85+
# Display Gemini-Pro's response
86+
with st.chat_message("assistant"):
87+
st.markdown(gemini_response.text)
88+
speak(gemini_response.text)
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
import os
2+
import streamlit as st
3+
from dotenv import load_dotenv
4+
import google.generativeai as gen_ai
5+
import speech_recognition as sr
6+
from gtts import gTTS
7+
from tempfile import TemporaryFile
8+
import webbrowser
9+
10+
# Load environment variables
11+
load_dotenv()
12+
13+
# Configure Streamlit page settings
14+
st.set_page_config(
15+
page_title="Beyond GPS Navigator!",
16+
page_icon=":brain:", # Favicon emoji
17+
layout="centered", # Page layout option
18+
)
19+
20+
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
21+
22+
# Set up Google Gemini-Pro AI model
23+
gen_ai.configure(api_key=GOOGLE_API_KEY)
24+
model = gen_ai.GenerativeModel('gemini-pro')
25+
26+
# Function to translate roles between Gemini-Pro and Streamlit terminology
27+
def translate_role_for_streamlit(user_role):
28+
if user_role == "model":
29+
return "assistant"
30+
else:
31+
return user_role
32+
33+
# Function to recognize speech input
34+
def recognize_speech():
35+
r = sr.Recognizer()
36+
with sr.Microphone() as source:
37+
st.write("Speak now...")
38+
audio = r.listen(source)
39+
40+
try:
41+
user_prompt = r.recognize_google(audio)
42+
st.write("You said:", user_prompt)
43+
return user_prompt
44+
except sr.UnknownValueError:
45+
st.write("Sorry, I could not understand your audio.")
46+
return ""
47+
except sr.RequestError as e:
48+
st.write("Could not request results from Google Speech Recognition service; {0}".format(e))
49+
return ""
50+
51+
# Function to output voice
52+
def speak(text):
53+
tts = gTTS(text=text, lang='en')
54+
with TemporaryFile(suffix=".wav", delete=False) as f:
55+
tts.write_to_fp(f)
56+
filename = f.name
57+
st.audio(filename, format='audio/wav')
58+
59+
# Get user's current location
60+
current_location = st.text_input("What is your current location?")
61+
62+
# Ask the user for their destination
63+
destination = recognize_speech()
64+
65+
# Initialize chat session in Streamlit if not already present
66+
if "chat_session" not in st.session_state:
67+
st.session_state.chat_session = model.start_chat(history=[])
68+
69+
# Display the chatbot's title on the page
70+
st.title("🤖 Gemini Pro - ChatBot")
71+
72+
# Display the chat history
73+
for message in st.session_state.chat_session.history:
74+
with st.chat_message(translate_role_for_streamlit(message.role)):
75+
st.markdown(message.parts[0].text)
76+
77+
# Input field for user's message
78+
voice_input = st.checkbox("Voice Input")
79+
if voice_input:
80+
user_prompt = recognize_speech()
81+
else:
82+
user_prompt = st.text_input("Ask Gemini-Pro...")
83+
84+
if user_prompt:
85+
# Add user's message to chat and display it
86+
st.chat_message("user").markdown(user_prompt)
87+
88+
# Send user's message to Gemini-Pro and get the response
89+
gemini_response = st.session_state.chat_session.send_message(user_prompt)
90+
91+
# Display Gemini-Pro's response
92+
with st.chat_message("assistant"):
93+
st.markdown(gemini_response.text)
94+
speak(gemini_response.text)
95+
96+
# If the response contains directions, open them in Google Maps
97+
if "directions" in gemini_response.text:
98+
directions_url = "https://www.google.com/maps/dir/?api=1&origin=" + current_location + "&destination=" + destination
99+
webbrowser.open(directions_url)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
streamlit
2+
dotenv
3+
google-generativeai
4+
speech_recognition
5+
gtts
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
## Dataset
2+
Dataset link : https://www.kaggle.com/datasets/quadeer15sh/amur-tiger-reidentification?select=Amur+Tigers
3+
4+
### Description
5+
The Amur tiger population is concentrated in the Far East, particularly the Russian Far East and Northeast China. The remaining wild population is estimated to be 600 individuals, so conservation is of crucial importance. Re-identification of a species plays an important role in the conservation of wild life. With the help of WWF, a third-party company (MakerCollider) collected more than 8,000 Amur tiger video clips of 92 individuals from ~10 zoos in China. We organize efforts to make bounding-box, keypoint-based pose, and identity annotations for sampled video frames and formuate the ATRW (Amur Tiger Re-identification in the Wild) dataset. Figure 1 illustrates some example bounding box and pose keypoint annotations in our ATRW dataset. Our dataset is the largest wildlife re-ID dataset to date, Table 1 lists a comparison of current wildlife re-ID datasets. The dataset will be divided into training, validation, and testing subsets. The training/validation subsets along with annotations will be released to public, with the annotations for the test subset withheld by the organizers. The dataset paper is released on Arxiv: 1906.05586.
6+
7+
Dataset contains cropped images with manual annotaetd ID and keypoints. Similar to most existing Re-ID tasks, the plain Re-ID task requires to build models on training-set, and evaluating on the test-set. During testing, each image will be taken as query image, while all the remained images in the test-set as "gallery" or "database", the query results should be rank-list of images in "gallery".
141 KB
Loading
137 KB
Loading
100 KB
Loading
136 KB
Loading

0 commit comments

Comments
 (0)