Skip to content

Commit 1927be4

Browse files
authored
Merge branch 'master' into sg-turtle-game-space-invaders
2 parents 47f2dc7 + ae17768 commit 1927be4

File tree

127 files changed

+112026
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+112026
-1
lines changed

langchain-rag-app/README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Build an LLM RAG Chatbot With LangChain
2+
3+
This repo contains the source code for [Build an LLM RAG Chatbot With LangChain](https://realpython.com/build-llm-rag-chatbot-with-langchain/)
4+
5+
To run the final application that you'll build in this tutorial, you can use the code provided in `source_code_final/`.
6+
7+
## Setup
8+
9+
Create a `.env` file in the root directory and add the following environment variables:
10+
11+
```.env
12+
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
13+
14+
NEO4J_URI=<YOUR_NEO4J_URI>
15+
NEO4J_USERNAME=<YOUR_NEO4J_USERNAME>
16+
NEO4J_PASSWORD=<YOUR_NEO4J_PASSWORD>
17+
18+
HOSPITALS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/hospitals.csv
19+
PAYERS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/payers.csv
20+
PHYSICIANS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/physicians.csv
21+
PATIENTS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/patients.csv
22+
VISITS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/visits.csv
23+
REVIEWS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/reviews.csv
24+
25+
HOSPITAL_AGENT_MODEL=gpt-3.5-turbo-1106
26+
HOSPITAL_CYPHER_MODEL=gpt-3.5-turbo-1106
27+
HOSPITAL_QA_MODEL=gpt-3.5-turbo-0125
28+
29+
CHATBOT_URL=http://host.docker.internal:8000/hospital-rag-agent
30+
```
31+
32+
The chatbot uses OpenAI LLMs, so you'll need to create an [OpenAI API key](https://realpython.com/generate-images-with-dalle-openai-api/#get-your-openai-api-key) and store it as `OPENAI_API_KEY`.
33+
34+
The three `NEO4J_` variables are used to connect to your Neo4j AuraDB instance. Follow the directions [here](https://neo4j.com/cloud/platform/aura-graph-database/?ref=docs-nav-get-started) to create a free instance.
35+
36+
Once you have a running Neo4j instance, and have filled out all the environment variables in `.env`, you can run the entire project with [Docker Compose](https://docs.docker.com/compose/). You can install Docker Compose by following [these directions](https://docs.docker.com/compose/install/).
37+
38+
Once you've filled in all of the environment variables, set up a Neo4j AuraDB instance, and installed Docker Compose, open a terminal and run:
39+
40+
```console
41+
$ docker-compose up --build
42+
```
43+
44+
After each container finishes building, you'll be able to access the chatbot API at `http://localhost:8000/docs` and the Streamlit app at `http://localhost:8501/`.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# chatbot_api/Dockerfile
2+
3+
FROM python:3.11-slim
4+
5+
WORKDIR /app
6+
COPY ./src/ /app
7+
8+
COPY ./pyproject.toml /code/pyproject.toml
9+
RUN pip install /code/.
10+
11+
EXPOSE 8000
12+
CMD ["sh", "entrypoint.sh"]
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
[project]
2+
name = "chatbot_api"
3+
version = "0.1"
4+
dependencies = [
5+
"asyncio==3.4.3",
6+
"fastapi==0.109.0",
7+
"langchain==0.1.0",
8+
"langchain-openai==0.0.2",
9+
"langchainhub==0.1.14",
10+
"neo4j==5.14.1",
11+
"numpy==1.26.2",
12+
"openai==1.7.2",
13+
"opentelemetry-api==1.22.0",
14+
"pydantic==2.5.1",
15+
"uvicorn==0.25.0"
16+
]
17+
18+
[project.optional-dependencies]
19+
dev = ["black", "flake8"]
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
import os
2+
3+
from chains.hospital_cypher_chain import hospital_cypher_chain
4+
from chains.hospital_review_chain import reviews_vector_chain
5+
from langchain import hub
6+
from langchain.agents import AgentExecutor, Tool, create_openai_functions_agent
7+
from langchain_openai import ChatOpenAI
8+
from tools.wait_times import (
9+
get_current_wait_times,
10+
get_most_available_hospital,
11+
)
12+
13+
HOSPITAL_AGENT_MODEL = os.getenv("HOSPITAL_AGENT_MODEL")
14+
15+
hospital_agent_prompt = hub.pull("hwchase17/openai-functions-agent")
16+
17+
tools = [
18+
Tool(
19+
name="Experiences",
20+
func=reviews_vector_chain.invoke,
21+
description="""Useful when you need to answer questions
22+
about patient experiences, feelings, or any other qualitative
23+
question that could be answered about a patient using semantic
24+
search. Not useful for answering objective questions that involve
25+
counting, percentages, aggregations, or listing facts. Use the
26+
entire prompt as input to the tool. For instance, if the prompt is
27+
"Are patients satisfied with their care?", the input should be
28+
"Are patients satisfied with their care?".
29+
""",
30+
),
31+
Tool(
32+
name="Graph",
33+
func=hospital_cypher_chain.invoke,
34+
description="""Useful for answering questions about patients,
35+
physicians, hospitals, insurance payers, patient review
36+
statistics, and hospital visit details. Use the entire prompt as
37+
input to the tool. For instance, if the prompt is "How many visits
38+
have there been?", the input should be "How many visits have
39+
there been?".
40+
""",
41+
),
42+
Tool(
43+
name="Waits",
44+
func=get_current_wait_times,
45+
description="""Use when asked about current wait times
46+
at a specific hospital. This tool can only get the current
47+
wait time at a hospital and does not have any information about
48+
aggregate or historical wait times. Do not pass the word "hospital"
49+
as input, only the hospital name itself. For example, if the prompt
50+
is "What is the current wait time at Jordan Inc Hospital?", the
51+
input should be "Jordan Inc".
52+
""",
53+
),
54+
Tool(
55+
name="Availability",
56+
func=get_most_available_hospital,
57+
description="""
58+
Use when you need to find out which hospital has the shortest
59+
wait time. This tool does not have any information about aggregate
60+
or historical wait times. This tool returns a dictionary with the
61+
hospital name as the key and the wait time in minutes as the value.
62+
""",
63+
),
64+
]
65+
66+
chat_model = ChatOpenAI(
67+
model=HOSPITAL_AGENT_MODEL,
68+
temperature=0,
69+
)
70+
71+
hospital_rag_agent = create_openai_functions_agent(
72+
llm=chat_model,
73+
prompt=hospital_agent_prompt,
74+
tools=tools,
75+
)
76+
77+
hospital_rag_agent_executor = AgentExecutor(
78+
agent=hospital_rag_agent,
79+
tools=tools,
80+
return_intermediate_steps=True,
81+
verbose=True,
82+
)
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
import os
2+
3+
from langchain.chains import GraphCypherQAChain
4+
from langchain.prompts import PromptTemplate
5+
from langchain_community.graphs import Neo4jGraph
6+
from langchain_openai import ChatOpenAI
7+
8+
HOSPITAL_QA_MODEL = os.getenv("HOSPITAL_QA_MODEL")
9+
HOSPITAL_CYPHER_MODEL = os.getenv("HOSPITAL_CYPHER_MODEL")
10+
11+
graph = Neo4jGraph(
12+
url=os.getenv("NEO4J_URI"),
13+
username=os.getenv("NEO4J_USERNAME"),
14+
password=os.getenv("NEO4J_PASSWORD"),
15+
)
16+
17+
graph.refresh_schema()
18+
19+
cypher_generation_template = """
20+
Task:
21+
Generate Cypher query for a Neo4j graph database.
22+
23+
Instructions:
24+
Use only the provided relationship types and properties in the schema.
25+
Do not use any other relationship types or properties that are not provided.
26+
27+
Schema:
28+
{schema}
29+
30+
Note:
31+
Do not include any explanations or apologies in your responses.
32+
Do not respond to any questions that might ask anything other than
33+
for you to construct a Cypher statement. Do not include any text except
34+
the generated Cypher statement. Make sure the direction of the relationship is
35+
correct in your queries. Make sure you alias both entities and relationships
36+
properly. Do not run any queries that would add to or delete from
37+
the database. Make sure to alias all statements that follow as with
38+
statement (e.g. WITH v as visit, c.billing_amount as billing_amount)
39+
If you need to divide numbers, make sure to
40+
filter the denominator to be non zero.
41+
42+
Examples:
43+
# Who is the oldest patient and how old are they?
44+
MATCH (p:Patient)
45+
RETURN p.name AS oldest_patient,
46+
duration.between(date(p.dob), date()).years AS age
47+
ORDER BY age DESC
48+
LIMIT 1
49+
50+
# Which physician has billed the least to Cigna
51+
MATCH (p:Payer)<-[c:COVERED_BY]-(v:Visit)-[t:TREATS]-(phy:Physician)
52+
WHERE p.name = 'Cigna'
53+
RETURN phy.name AS physician_name, SUM(c.billing_amount) AS total_billed
54+
ORDER BY total_billed
55+
LIMIT 1
56+
57+
# Which state had the largest percent increase in Cigna visits
58+
# from 2022 to 2023?
59+
MATCH (h:Hospital)<-[:AT]-(v:Visit)-[:COVERED_BY]->(p:Payer)
60+
WHERE p.name = 'Cigna' AND v.admission_date >= '2022-01-01' AND
61+
v.admission_date < '2024-01-01'
62+
WITH h.state_name AS state, COUNT(v) AS visit_count,
63+
SUM(CASE WHEN v.admission_date >= '2022-01-01' AND
64+
v.admission_date < '2023-01-01' THEN 1 ELSE 0 END) AS count_2022,
65+
SUM(CASE WHEN v.admission_date >= '2023-01-01' AND
66+
v.admission_date < '2024-01-01' THEN 1 ELSE 0 END) AS count_2023
67+
WITH state, visit_count, count_2022, count_2023,
68+
(toFloat(count_2023) - toFloat(count_2022)) / toFloat(count_2022) * 100
69+
AS percent_increase
70+
RETURN state, percent_increase
71+
ORDER BY percent_increase DESC
72+
LIMIT 1
73+
74+
# How many non-emergency patients in North Carolina have written reviews?
75+
match (r:Review)<-[:WRITES]-(v:Visit)-[:AT]->(h:Hospital)
76+
where h.state_name = 'NC' and v.admission_type <> 'Emergency'
77+
return count(*)
78+
79+
String category values:
80+
Test results are one of: 'Inconclusive', 'Normal', 'Abnormal'
81+
Visit statuses are one of: 'OPEN', 'DISCHARGED'
82+
Admission Types are one of: 'Elective', 'Emergency', 'Urgent'
83+
Payer names are one of: 'Cigna', 'Blue Cross', 'UnitedHealthcare', 'Medicare',
84+
'Aetna'
85+
86+
A visit is considered open if its status is 'OPEN' and the discharge date is
87+
missing.
88+
Use abbreviations when
89+
filtering on hospital states (e.g. "Texas" is "TX",
90+
"Colorado" is "CO", "North Carolina" is "NC",
91+
"Florida" is "FL", "Georgia" is "GA, etc.)
92+
93+
Make sure to use IS NULL or IS NOT NULL when analyzing missing properties.
94+
Never return embedding properties in your queries. You must never include the
95+
statement "GROUP BY" in your query. Make sure to alias all statements that
96+
follow as with statement (e.g. WITH v as visit, c.billing_amount as
97+
billing_amount)
98+
If you need to divide numbers, make sure to filter the denominator to be non
99+
zero.
100+
101+
The question is:
102+
{question}
103+
"""
104+
105+
cypher_generation_prompt = PromptTemplate(
106+
input_variables=["schema", "question"], template=cypher_generation_template
107+
)
108+
109+
qa_generation_template = """You are an assistant that takes the results
110+
from a Neo4j Cypher query and forms a human-readable response. The
111+
query results section contains the results of a Cypher query that was
112+
generated based on a users natural language question. The provided
113+
information is authoritative, you must never doubt it or try to use
114+
your internal knowledge to correct it. Make the answer sound like a
115+
response to the question.
116+
117+
Query Results:
118+
{context}
119+
120+
Question:
121+
{question}
122+
123+
If the provided information is empty, say you don't know the answer.
124+
Empty information looks like this: []
125+
126+
If the information is not empty, you must provide an answer using the
127+
results. If the question involves a time duration, assume the query
128+
results are in units of days unless otherwise specified.
129+
130+
When names are provided in the query results, such as hospital names,
131+
beware of any names that have commas or other punctuation in them.
132+
For instance, 'Jones, Brown and Murray' is a single hospital name,
133+
not multiple hospitals. Make sure you return any list of names in
134+
a way that isn't ambiguous and allows someone to tell what the full
135+
names are.
136+
137+
Never say you don't have the right information if there is data in
138+
the query results. Make sure to show all the relevant query results
139+
if you're asked.
140+
141+
Helpful Answer:
142+
"""
143+
144+
qa_generation_prompt = PromptTemplate(
145+
input_variables=["context", "question"], template=qa_generation_template
146+
)
147+
148+
hospital_cypher_chain = GraphCypherQAChain.from_llm(
149+
cypher_llm=ChatOpenAI(model=HOSPITAL_CYPHER_MODEL, temperature=0),
150+
qa_llm=ChatOpenAI(model=HOSPITAL_QA_MODEL, temperature=0),
151+
graph=graph,
152+
verbose=True,
153+
qa_prompt=qa_generation_prompt,
154+
cypher_prompt=cypher_generation_prompt,
155+
validate_cypher=True,
156+
top_k=100,
157+
)
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
import os
2+
3+
from langchain.chains import RetrievalQA
4+
from langchain.prompts import (
5+
ChatPromptTemplate,
6+
HumanMessagePromptTemplate,
7+
PromptTemplate,
8+
SystemMessagePromptTemplate,
9+
)
10+
from langchain.vectorstores.neo4j_vector import Neo4jVector
11+
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
12+
13+
HOSPITAL_QA_MODEL = os.getenv("HOSPITAL_QA_MODEL")
14+
15+
neo4j_vector_index = Neo4jVector.from_existing_graph(
16+
embedding=OpenAIEmbeddings(),
17+
url=os.getenv("NEO4J_URI"),
18+
username=os.getenv("NEO4J_USERNAME"),
19+
password=os.getenv("NEO4J_PASSWORD"),
20+
index_name="reviews",
21+
node_label="Review",
22+
text_node_properties=[
23+
"physician_name",
24+
"patient_name",
25+
"text",
26+
"hospital_name",
27+
],
28+
embedding_node_property="embedding",
29+
)
30+
31+
review_template = """Your job is to use patient
32+
reviews to answer questions about their experience at
33+
a hospital. Use the following context to answer questions.
34+
Be as detailed as possible, but don't make up any information
35+
that's not from the context. If you don't know an answer,
36+
say you don't know.
37+
{context}
38+
"""
39+
40+
review_system_prompt = SystemMessagePromptTemplate(
41+
prompt=PromptTemplate(
42+
input_variables=["context"], template=review_template
43+
)
44+
)
45+
46+
review_human_prompt = HumanMessagePromptTemplate(
47+
prompt=PromptTemplate(input_variables=["question"], template="{question}")
48+
)
49+
messages = [review_system_prompt, review_human_prompt]
50+
51+
review_prompt = ChatPromptTemplate(
52+
input_variables=["context", "question"], messages=messages
53+
)
54+
55+
reviews_vector_chain = RetrievalQA.from_chain_type(
56+
llm=ChatOpenAI(model=HOSPITAL_QA_MODEL, temperature=0),
57+
chain_type="stuff",
58+
retriever=neo4j_vector_index.as_retriever(k=12),
59+
)
60+
reviews_vector_chain.combine_documents_chain.llm_chain.prompt = review_prompt
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
3+
# Run any setup steps or pre-processing tasks here
4+
echo "Starting hospital RAG FastAPI service..."
5+
6+
# Start the main application
7+
uvicorn main:app --host 0.0.0.0 --port 8000

0 commit comments

Comments
 (0)