Skip to content

Commit cd31614

Browse files
authored
LlamaIndex Workflows with routing and multisource RAG (#222)
* Multisource + routing initial version * Running black * Removing unnecessary prompt details * Updating starters * Remove unnecessary styling
1 parent b2482e1 commit cd31614

21 files changed

+946
-0
lines changed
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
[project]
2+
# Whether to enable telemetry (default: true). No personal data is collected.
3+
enable_telemetry = true
4+
5+
6+
# List of environment variables to be provided by each user to use the app.
7+
user_env = []
8+
9+
# Duration (in seconds) during which the session is saved when the connection is lost
10+
session_timeout = 3600
11+
12+
# Enable third parties caching (e.g LangChain cache)
13+
cache = false
14+
15+
# Authorized origins
16+
allow_origins = ["*"]
17+
18+
# Follow symlink for asset mount (see https://github.com/Chainlit/chainlit/issues/317)
19+
# follow_symlink = false
20+
21+
[features]
22+
# Process and display HTML in messages. This can be a security risk (see https://stackoverflow.com/questions/19603097/why-is-it-dangerous-to-render-user-generated-html-or-javascript)
23+
unsafe_allow_html = false
24+
25+
# Process and display mathematical expressions. This can clash with "$" characters in messages.
26+
latex = false
27+
28+
# Automatically tag threads with the current chat profile (if a chat profile is used)
29+
auto_tag_thread = true
30+
31+
# Allow users to edit their own messages
32+
edit_message = true
33+
34+
# Authorize users to spontaneously upload files with messages
35+
[features.spontaneous_file_upload]
36+
enabled = true
37+
accept = ["*/*"]
38+
max_files = 20
39+
max_size_mb = 500
40+
41+
[features.audio]
42+
# Threshold for audio recording
43+
min_decibels = -45
44+
# Delay for the user to start speaking in MS
45+
initial_silence_timeout = 3000
46+
# Delay for the user to continue speaking in MS. If the user stops speaking for this duration, the recording will stop.
47+
silence_timeout = 1500
48+
# Above this duration (MS), the recording will forcefully stop.
49+
max_duration = 15000
50+
# Duration of the audio chunks in MS
51+
chunk_duration = 1000
52+
# Sample rate of the audio
53+
sample_rate = 44100
54+
55+
[UI]
56+
# Name of the assistant.
57+
name = "Assistant"
58+
59+
# Description of the assistant. This is used for HTML tags.
60+
# description = ""
61+
62+
# Large size content are by default collapsed for a cleaner ui
63+
default_collapse_content = true
64+
65+
# Chain of Thought (CoT) display mode. Can be "hidden", "tool_call" or "full".
66+
cot = "full"
67+
68+
# Link to your github repo. This will add a github button in the UI's header.
69+
# github = ""
70+
71+
# Specify a CSS file that can be used to customize the user interface.
72+
# The CSS file can be served from the public directory or via an external link.
73+
# custom_css = "/public/test.css"
74+
75+
# Specify a Javascript file that can be used to customize the user interface.
76+
# The Javascript file can be served from the public directory.
77+
# custom_js = "/public/test.js"
78+
79+
# Specify a custom font url.
80+
# custom_font = "https://fonts.googleapis.com/css2?family=Inter:wght@400;500;700&display=swap"
81+
82+
# Specify a custom meta image url.
83+
# custom_meta_image_url = "https://chainlit-cloud.s3.eu-west-3.amazonaws.com/logo/chainlit_banner.png"
84+
85+
# Specify a custom build directory for the frontend.
86+
# This can be used to customize the frontend code.
87+
# Be careful: If this is a relative path, it should not start with a slash.
88+
# custom_build = "./public/build"
89+
90+
[UI.theme]
91+
default = "dark"
92+
#layout = "wide"
93+
#font_family = "Inter, sans-serif"
94+
# Override default MUI light theme. (Check theme.ts)
95+
[UI.theme.light]
96+
#background = "#FAFAFA"
97+
#paper = "#FFFFFF"
98+
99+
[UI.theme.light.primary]
100+
#main = "#F80061"
101+
#dark = "#980039"
102+
#light = "#FFE7EB"
103+
[UI.theme.light.text]
104+
#primary = "#212121"
105+
#secondary = "#616161"
106+
107+
# Override default MUI dark theme. (Check theme.ts)
108+
[UI.theme.dark]
109+
#background = "#FAFAFA"
110+
#paper = "#FFFFFF"
111+
112+
[UI.theme.dark.primary]
113+
#main = "#F80061"
114+
#dark = "#980039"
115+
#light = "#FFE7EB"
116+
[UI.theme.dark.text]
117+
#primary = "#EEEEEE"
118+
#secondary = "#BDBDBD"
119+
120+
[meta]
121+
generated_by = "1.2.0"
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.DS_Store
2+
*.db
3+
*.db.lock
4+
.venv
5+
.env
6+
7+
.chainlit/translations
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# LlamaIndex + NVIDIA
2+
This project shows how to use NVIDIA's APIs for large language models with LlamaIndex Workflow and ingestion functionality. Chainlit provides the chat UI.
3+
4+
### Project highlights:
5+
- Interfaces with chat completion and embedding models from [build.nvidia.com](https://build.nvidia.com)
6+
- Routs queries based on whether they require access to a document database
7+
- Answers queries using the [Perplexity API for web search](https://docs.perplexity.ai/home)
8+
- Performs vector lookup [using Milvus Lite](https://milvus.io/docs/milvus_lite.md) (WIP)
9+
- Stores user chat history with a local SQLite database (WIP)
10+
11+
### Technologies used:
12+
- **Frontend**: Chainlit
13+
- **Web search**: Perplexity API
14+
- **LLM**: Llama 3.1 8b and Mistral Large 2
15+
- **Database**: Milvus Lite
16+
- **Chat application**: LlamaIndex Workflows
17+
18+
![System architecture diagram](architecture.png)
19+
20+
### Getting started
21+
To run this code, make sure you have environment variables set for the following:
22+
- `NVIDIA_API_KEY` for access to NVIDIA LLM APIs (required). You can set this by running `export NVIDIA_API_KEY="nvapi-*******************"`. If you don't have an API key, follow [these instructions](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/api-catalog.md#get-an-api-key-for-the-accessing-models-on-the-api-catalog) to sign up for an NVIDIA AI Foundation developer account and obtain access.
23+
24+
25+
- `PERPLEXITY_API_KEY` (optional) if you are interested in using Perplexity to answer queries using the web.
26+
27+
Then, clone this project and (optionally) create a new virtual environment in Python. Run `pip install -r requirements.txt` for the dependencies and begin the application using `chainlit run app.py` from this directory. The application should then be available at http://localhost:8000.
28+
29+
### Design
30+
This project uses Chainlit to host a combined frontend and backend. The chat logic is implemented a LlamaIndex Workflow class, which runs the user's query through the following steps:
31+
- Decide whether user query warrants usage of LLM with or without RAG (`QueryFlow.route_query`)
32+
- If using RAG, the query is transformed (`QueryFlow.rewrite_query`)into a format better suited for web search and document retrieval and a vector embedding is produced (`QueryFlow.embed_query`)
33+
- Documents are retrieved from the document database (`QueryFlow.milvus_retrieve`)
34+
- An answer is solicited from the Perplexity API (`QueryFlow.pplx_retrieve`)
35+
- The results are combined and used for generating a final response which is streamed to the user (`QueryFlow.synthesize_response`)
36+
37+
![Workflow diagram](diagram.png)
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
import logging
17+
import time
18+
19+
import chainlit as cl
20+
from dotenv import load_dotenv
21+
from llama_index.core import Settings
22+
23+
from workflow import QueryFlow
24+
25+
load_dotenv()
26+
27+
workflow = QueryFlow(timeout=45, verbose=False)
28+
29+
30+
@cl.on_chat_start
31+
async def on_chat_start():
32+
33+
cl.user_session.set("message_history", [])
34+
35+
workflow = QueryFlow(timeout=90, verbose=False)
36+
37+
cl.user_session.set("workflow", workflow)
38+
39+
40+
@cl.set_starters
41+
async def set_starters():
42+
return [
43+
cl.Starter(
44+
label="Write a haiku about CPUs",
45+
message="Write a haiku about CPUs.",
46+
icon="/avatars/servers",
47+
),
48+
cl.Starter(
49+
label="Write Docker Compose",
50+
message="Write a Docker Compose file for deploying a web app with a Redis cache and Postgres database",
51+
icon="/avatars/screen",
52+
),
53+
cl.Starter(
54+
label="What NIMs are available?",
55+
message="Summarize the different large language models that have NVIDIA inference microservices (NIMs) available for them. List as many as you can.",
56+
icon="/avatars/container",
57+
),
58+
cl.Starter(
59+
label="Summarize BioNemo use cases",
60+
message="Write a table summarizing how customers are using bionemo. Use one sentence per customer and include columns for customer, industry, and use case. Make the table between 5 to 10 rows and relatively narrow.",
61+
icon="/avatars/dna",
62+
),
63+
]
64+
65+
66+
@cl.on_chat_end
67+
def end():
68+
logging.info("Chat ended.")
69+
70+
71+
@cl.on_message
72+
async def main(user_message: cl.Message, count_tokens: bool = True):
73+
"""
74+
Executes when a user sends a message. We send the message off to the LlamaIndex chat engine
75+
for a streaming answer. When the answer is done streaming, we go back over the response
76+
to identify the sources used, and then add a block of text about the sources.
77+
"""
78+
79+
msg_start_time = time.time()
80+
logging.info(f"Received message: <{user_message.content[0:50]}...> ")
81+
message_history = cl.user_session.get("message_history", [])
82+
83+
# In case the chat workflow needs extra time to start up,
84+
# we block until it's ready.
85+
86+
assistant_message = cl.Message(content="")
87+
88+
token_count = 0
89+
with cl.Step(name="Mistral Large 2", type="tool"):
90+
91+
response, source_nodes = await workflow.run(
92+
query=user_message.content,
93+
chat_messages=message_history,
94+
)
95+
96+
async for chunk in response:
97+
token_count += 1
98+
chars = chunk.delta
99+
await assistant_message.stream_token(chars)
100+
101+
msg_time = time.time() - msg_start_time
102+
logging.info(f"Message generated in {msg_time:.1f} seconds.")
103+
104+
message_history += [
105+
{"role": "user", "content": user_message.content},
106+
{"role": "assistant", "content": assistant_message.content},
107+
]
108+
109+
cl.user_session.set("message_history", message_history)
110+
111+
await assistant_message.send()
52 KB
Loading
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Welcome to Chainlit! 🚀🤖
2+
3+
Hi there, Developer! 👋 We're excited to have you on board. Chainlit is a powerful tool designed to help you prototype, debug and share applications built on top of LLMs.
4+
5+
## Useful Links 🔗
6+
7+
- **Documentation:** Get started with our comprehensive [Chainlit Documentation](https://docs.chainlit.io) 📚
8+
- **Discord Community:** Join our friendly [Chainlit Discord](https://discord.gg/k73SQ3FyUh) to ask questions, share your projects, and connect with other developers! 💬
9+
10+
We can't wait to see what you create with Chainlit! Happy coding! 💻😊
11+
12+
## Welcome screen
13+
14+
To modify the welcome screen, edit the `chainlit.md` file at the root of your project. If you do not want a welcome screen, just leave this file empty.
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
import os
17+
18+
from pydantic import BaseModel, Field
19+
20+
21+
class WorkflowConfig(BaseModel):
22+
perplexity_timeout: int = Field(
23+
default=20, description="Timeout in seconds for Perplexity API call"
24+
)
25+
source_field_name: str = Field(
26+
default="source_uri",
27+
description="Field name for source URI in document metadata",
28+
)
29+
display_field_name: str = Field(
30+
default="display_name",
31+
description="Field name for display name in document metadata",
32+
)
33+
n_messages_in_history: int = Field(
34+
default=6, description="Number of messages to include in chat history"
35+
)
36+
max_tokens_generated: int = Field(
37+
default=1024, description="Maximum number of tokens to generate in response"
38+
)
39+
context_window: int = Field(
40+
default=128_000, description="Size of the context window for the LLM"
41+
)
42+
43+
chat_model_name: str = Field(
44+
default="mistralai/mistral-large-2-instruct",
45+
description="Model for final response synthesis. ",
46+
)
47+
routing_model_name: str = Field(
48+
default="meta/llama-3.1-8b-instruct",
49+
description="Model for performing query routing. Can be a bit dumber.",
50+
)
51+
perplexity_model_name: str = Field(
52+
default="llama-3.1-sonar-large-128k-online",
53+
description="Name of the Perplexity model; alternatives are huge and small.",
54+
)
55+
embedding_model_name: str = Field(
56+
default="nvidia/nv-embed-v1", description="Name of the embedding model"
57+
)
58+
embedding_model_dim: int = Field(
59+
default=4096, description="Dimension of the embedding model"
60+
)
61+
similarity_top_k: int = Field(
62+
default=5,
63+
description="Number of similar documents to return from vector search",
64+
)
65+
66+
nvidia_api_key: str = Field(
67+
default=os.getenv("NVIDIA_API_KEY"), description="NVIDIA API key"
68+
)
69+
perplexity_api_key: str = Field(
70+
default=os.getenv("PERPLEXITY_API_KEY"),
71+
description="Perplexity API key (optional)",
72+
)
73+
74+
data_dir: str = Field(
75+
default="data", description="Directory containing the documents to be indexed"
76+
)
77+
milvus_path: str = Field(
78+
default="db/milvus_lite.db", description="Path to the Milvus database"
79+
)
80+
81+
def __init__(self, **data):
82+
super().__init__(**data)
83+
if not self.nvidia_api_key:
84+
raise ValueError("NVIDIA_API_KEY is required and must not be null")
85+
86+
87+
config = WorkflowConfig()

community/routing-multisource-rag/data/.gitkeep

Whitespace-only changes.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)