Skip to content

Commit e4d8c16

Browse files
authored
Example langchain update (#7108)
* minor updates to langchain demo * update langchain example
1 parent b1f3c7c commit e4d8c16

File tree

8 files changed

+157
-197
lines changed

8 files changed

+157
-197
lines changed

examples/langchain/.env.example

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
1-
OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2-
CUBE_API_URL=https://anonymous-colstrip.gcp-us-central1.cubecloudapp.dev/cubejs-api/v1
3-
CUBE_API_SECRET=SECRET
4-
DATABASE_URL=postgresql://cube:[email protected]:5432/anonymous-colstrip
5-
LANGCHAIN_TRACING_V2=true
6-
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
7-
LANGCHAIN_API_KEY=ls__XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
1+
OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXX
2+
CUBE_API_URL=https://example-url.gcp-us-central1.cubecloudapp.dev/cubejs-api/v1
3+
CUBE_API_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4+
DATABASE_URL=postgresql://cube:[email protected]:5432/example

examples/langchain/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.env
2+
__pycache__
3+
vectorstore.pkl

examples/langchain/README.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,30 @@
1-
# Cube and Langchain demo app
1+
# Tabular Data Retrieval
22

3-
This is an example of a chatbot built with Cube, Langchain, and Streamlit.
3+
This is an example of a chatbot built with Cube, Langchain, Snowflake and Streamlit.
44

5-
[Why use a semantic layer with LLM for chatbots?](https://cube.dev/blog/semantic-layer-the-backbone-of-ai-powered-data-experiences)
5+
[Check this app deployed on Streamlit Cloud.](https://cube-langchain.streamlit.app/)
66

7-
## Pre-requisites
7+
## Why Semantic Layer for LLM-powered apps?
88

9-
- Valid Cube Cloud deployment. Your data model should have at least one view.
10-
- This example uses OpenAI API, so you'll need an OpenAI API key.
11-
- Python version `>=` 3.8
9+
When building text-to-SQL applications, it is crucial to provide LLM with rich context about underlying data model. Without enough context it’s hard for humans to comprehend data, LLM will simply compound on that confusion to produce wrong answers.
1210

13-
## How to run
11+
In many cases it is not enough to feed LLM with database schema and expect it to generate the correct SQL. To operate correctly and execute trustworthy actions, it needs to have enough context and semantics about the data it consumes; it must understand the metrics, dimensions, entities, and relational aspects of the data by which it's powered. Basically—LLM needs a semantic layer.
1412

13+
![architecture](https://ucarecdn.com/32e98c8b-a920-4620-a8d2-05d57618db8e/)
14+
15+
[Read more on why to use a semantic layer with LLM-power apps.](https://cube.dev/blog/semantic-layer-the-backbone-of-ai-powered-data-experiences)
16+
17+
18+
19+
20+
## Getting Started
21+
22+
- **Cube project**. If you don't have a Cube project already, you follow [this tutorial](https://cube.dev/docs/product/getting-started/cloud) to get started with with sample e-commerce data model.
23+
- **OpenAI API**. This example uses OpenAI API, so you'll need an OpenAI API key.
24+
- Make sure you have Python version >= 3.8
1525
- Install dependencies: `pip install -r requirements.txt`
16-
- Copy `.env.example` as `.env` and fill it in with your credentials
17-
- Run `python ingest.py`. It will use `CubeSemanticLoader` Langchain library to load metadata and save it in vectorstore
18-
- Run `streamlit run main.py`
26+
- Copy `.env.example` as `.env` and fill it in with your credentials. You need OpenAI API Key and credentials to access your Cube deployment.
27+
- Run `streamlit run streamlit_app.py`
28+
29+
## Community
30+
If you have any questions or need help - please [join our Slack community](https://slack.cube.dev/?ref=langchain-example-readme) of amazing developers and data engineers.

examples/langchain/ingest.py

Lines changed: 0 additions & 32 deletions
This file was deleted.

examples/langchain/main.py

Lines changed: 0 additions & 142 deletions
This file was deleted.

examples/langchain/requirements.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ streamlit
22
pandas
33
python-dotenv
44
langchain
5-
psycopg2
65
pathlib
76
PyJWT
8-
faiss-cpu
97
openai
10-
tiktoken
8+
tiktoken
9+
faiss-cpu
10+
psycopg2-binary
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
import streamlit as st
2+
import pandas as pd
3+
import os
4+
import re
5+
import pickle
6+
import jwt
7+
8+
from dotenv import load_dotenv
9+
from langchain import OpenAI
10+
from langchain.embeddings import OpenAIEmbeddings
11+
from langchain.vectorstores.faiss import FAISS
12+
from langchain.document_loaders import CubeSemanticLoader
13+
from pathlib import Path
14+
15+
from utils import (
16+
create_docs_from_values,
17+
create_vectorstore,
18+
init_vectorstore,
19+
check_input,
20+
log,
21+
call_sql_api,
22+
CUBE_SQL_API_PROMPT,
23+
_NO_ANSWER_TEXT,
24+
PROMPT_POSTFIX,
25+
)
26+
27+
load_dotenv()
28+
29+
def ingest_cube_meta():
30+
security_context = {}
31+
token = jwt.encode(security_context, os.environ["CUBE_API_SECRET"], algorithm="HS256")
32+
33+
loader = CubeSemanticLoader(os.environ["CUBE_API_URL"], token)
34+
documents = loader.load()
35+
36+
embeddings = OpenAIEmbeddings()
37+
vectorstore = FAISS.from_documents(documents, embeddings)
38+
39+
# Save vectorstore
40+
with open("vectorstore.pkl", "wb") as f:
41+
pickle.dump(vectorstore, f)
42+
43+
if not Path("vectorstore.pkl").exists():
44+
with st.spinner('Loading context from Cube API...'):
45+
ingest_cube_meta();
46+
47+
llm = OpenAI(
48+
temperature=0, openai_api_key=os.environ.get("OPENAI_API_KEY"), verbose=True
49+
)
50+
51+
st.title("Cube and LangChain demo 🤖🚀")
52+
53+
multi = '''
54+
Follow [this tutorial on Github](https://github.com/cube-js/cube/tree/master/examples/langchain) to clone this project and run it locally.
55+
56+
You can use these sample questions to quickly test the demo --
57+
* How many orders?
58+
* How many completed orders?
59+
* What are top selling product categories?
60+
* What product category drives the highest average order value?
61+
'''
62+
st.markdown(multi)
63+
64+
question = st.text_input(
65+
"Your question: ", placeholder="Ask me anything ...", key="input"
66+
)
67+
68+
if st.button("Submit", type="primary"):
69+
check_input(question)
70+
vectorstore = init_vectorstore()
71+
72+
# log("Quering vectorstore and building the prompt...")
73+
74+
docs = vectorstore.similarity_search(question)
75+
# take the first document as the best guess
76+
table_name = docs[0].metadata["table_name"]
77+
78+
# Columns
79+
columns_question = "All available columns"
80+
column_docs = vectorstore.similarity_search(
81+
columns_question, filter=dict(table_name=table_name), k=15
82+
)
83+
84+
lines = []
85+
for column_doc in column_docs:
86+
column_title = column_doc.metadata["column_title"]
87+
column_name = column_doc.metadata["column_name"]
88+
column_data_type = column_doc.metadata["column_data_type"]
89+
print(column_name)
90+
lines.append(
91+
f"title: {column_title}, column name: {column_name}, datatype: {column_data_type}, member type: {column_doc.metadata['column_member_type']}"
92+
)
93+
columns = "\n\n".join(lines)
94+
95+
# Construct the prompt
96+
prompt = CUBE_SQL_API_PROMPT.format(
97+
input_question=question,
98+
table_info=table_name,
99+
columns_info=columns,
100+
top_k=1000,
101+
no_answer_text=_NO_ANSWER_TEXT,
102+
)
103+
104+
# Call LLM API to get the SQL query
105+
log("Calling LLM API to generate SQL query...")
106+
llm_answer = llm(prompt)
107+
bare_llm_answer = re.sub(r"(?i)Answer:\s*", "", llm_answer)
108+
109+
if llm_answer.strip() == _NO_ANSWER_TEXT:
110+
st.stop()
111+
112+
sql_query = llm_answer
113+
114+
log("Query generated by LLM:")
115+
st.info(sql_query)
116+
117+
# Call Cube SQL API
118+
log("Sending the above query to Cube...")
119+
columns, rows = call_sql_api(sql_query)
120+
121+
# Display the result
122+
df = pd.DataFrame(rows, columns=columns)
123+
st.table(df)

examples/langchain/utils.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,7 @@ def call_sql_api(sql_query: str):
106106

107107
# Initializing Cube SQL API connection)
108108
connection = psycopg2.connect(CONN_STR)
109-
110-
log("Running query...")
109+
111110
cursor = connection.cursor()
112111
cursor.execute(sql_query)
113112

0 commit comments

Comments
 (0)