Retrieval augmented generation chat bot

This folder contains the source code for RAG chat bot using Amazon Kendra and Amazon Bedrock.

RAG architecture

RAG design pattern is an extension of ICL where you connect a model to a knowledge base. Refer to the original paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks for a technical overview.

A user asks a question in a chatbot application
The chatbot sends the question to a retriever component
The retriever prepares and sends a search query to a knowledge base which is an information retrieval (IR) engine. The IR engine can be implemented using any technology. Refer to a deep dive on different IR approaches How to Build an Open-Domain Question Answering System? for more information. Very often the IR engine is a semantic search
The IR engine returns search results with document excerpts and links to relevant documents
The retriever sends the response to the chatbot/orchestrator
The chatbot/orchestrator send the user question concatenated with the search results as an LLM prompt to an LLM of your choice. Here is important to understand the context length limitation and prompt engineering approaches for a specific LLM
The LLM summarize the information and uses the context in the prompt to provide a factual response
The chatbot sends the response back to the user

The following exhibit shows a RAG example:

Implementation

This section contains step-by-step instructions and all details needed to implement your first RAG-based generative AI application.

Architecture overview

Aimed with the theoretical knowledge, you're about to implement the following architecture. You use AWS services as building blocks to implement a scalable, secure, and reliable solution.

Knowledge base

In this section you're going to create and populate a knowledge base you're going to connect to the chatbot.

Navigate to the AWS Cloud9 environment.

If you'd like to ingest some documents from an Amazon S3 bucket, you can create a dedicated bucket to be connected to Amazon Kendra:

aws s3 mb s3://document-storage-<your alias>-<account-id>-<region>

Amazon Kendra

Amazon Kendra is an enterprise search service and uses semantic and contextual understanding capabilities to return relevant documents to a natural language search query.

In this workshop you are going to use Amazon Kendra to implement a retriever part of the RAG chatbot application.

Create Kendra index:

Navigate to Amazon Kendra console
Choose Create and index
Provide a name for the index
Choose create a new role, provide the role name suffix, choose Next
Choose No for token access control, choose Next
Choose Developer edition, choose Next
Review and choose Create

Wait until the Amazon Kendra index is created and ready:

TODO: add boto3/AWS CLI Kendra index creation

Amazon OpenSearch

🚧 Available in the next version of the workshop!

Ingestion - Amazon Kendra

You're going to ingest public press releases from Swiss Government web site https://www.admin.ch/ using a built-in Amazon Kendra Web Crawler connector 2.0.

To create web crawler and ingest documents to the index:

Navigate to the created index in the Amazon Kendra console
Choose Add data source
Choose Web Crawler v2.0 connector and click Add connector You can choose any other connector to connect to a data source of your choice and ingest documents from that data source

The following instruction assumes you use the Web Crawler to ingest the documents from the site https://www.admin.ch/. If you use another data source or another web URL, configure the Amazon Kendra connector accordingly.

In the Add data source pane:

Provide a name for the data source, e.g. admin-ch-public, choose Next
Choose Source URLs and enter the following seed URLs:

https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=0
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=1
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=2
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=3
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=4
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=5
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=6
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=7
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=8
https://www.admin.ch/gov/en/start/documentation/media-releases.html?dyn_pageIndex=9

These URLs contain about 200 the most recent press releases in English.
- Choose No authentication
- Choose Create a new role, provide a role suffix, choose Next
3. Provide the following settings on Configure sync settings page:
- Sync domains with subdomains only
- Crawl depth: 1
- Check Include the files that has links to web pages
- Sync mode: Full sync
- Frequency: Run on demand
4. Leave the default settings on Set field mappings
5. Review and create

Choose Sync now

The crawling and document indexing takes about 15 minutes. You don't need to wait until the sync finished and can move to the next task.

TODO: add boto3 or AWS CLI kendra index creation and ingestion

Ingestion - OpenSearch

🚧 Available in the next version of the workshop!

Generator

Create a SageMaker inference LLM endpoint only if you'd like to experiment with SageMaker endpoints and JumpStart, otherwise move to Chatbot app section.

You use an LLM as a generator to generate answers to the question using retrieved context. Navigate to SageMaker Studio and open content/notebooks/llm-endpoints.ipynb notebook. Follow the instructions in the notebook to create an LLM real-time endpoint.

The deployment of an LLM real-time endpoint takes about 15 minutes.

Chatbot app

In this section you create a front-end app container to run on AWS Fargate.

Navigate to the Cloud9 environment.

The chatbot application source code is in the folder chatbot of the workshop. This folder contain the following files:

app.py: the frontend utilizing the popular streamlit framework
Dockerfile: Dockerfile providing a script for creation of a Docker image
requirements.txt: specifies dependencies required to be installed for hosting the frontend application
setup.sh: setup script consisting all the necessary steps to create a ECR repository, build the Docker image and push it to the respective repository you created

In Cloud9 terminal:

cd ~/environment/generative-ai-on-aws-architecture-patterns/content/lab-02/chatbot/
bash setup.sh

Make sure the container build and push finished successfully and the new image has been pushed to the ECR.

Retriever and orchestration

The final and the most complex part of the application is to connect all components, such as LLM, UX, and retriever, together and implement orchestration of the data flow.

The retriever in the RAG design pattern is responsible for sending search request to the knowledge base or information retrieval engine and retrieving search results.

You use LangChain framework to implement the retriever and also the orchestration layer

The main components for the LangChain-based orchestrator are:

AmazonKendraRetriever
You use a built-in Amazon Kendra retriever in LangChain. This class provides an abstraction of a retriever component and allows LangChain to interact with Amazon Kendra as part of conversation chain.

You also have a custom implementation of a Amazon Kendra retriever, KendraIndexRetriever class in the orchestration/rag-app/kendra folder. This class is not used in the workshop. The implementation for your reference, you can try to use own retriever for any specific requirements.

ConversationalBufferWindowMemory
This built-in LangChain class implements chat memory. There are two types of memory:

short-term memory: related to one chain of a conversation. The workshop uses this type of memory
long-term memory: related to all conversations between a user and a model. Long-term memory is useful for data analytics, validation, and model fine-tuning.

DynamoDBChatMessageHistory
Since you use Lambda as a stateless serverless microservice for the orchestration layer, you use Amazon DynamoDB to persist conversation memory to a DynamoDB table. The workshop also uses a built-in LangChain class to minimize the implementation effort.

LLM endpoint
You use a SageMaker real-time endpoint created in the Generator section or Amazon Bedrock API. The workshop uses a built-in LangChain class to abstract an LLM.

PromptTemplate
LangChain provides prompt templates for specific use cases and LLMs.

ConversationalRetrievalChain
Here the workshop again uses the existing LangChain class to implement a more complex flow for multi-hop conversation between an LLM, a user, and a retriever.

The conversational chain has two steps.

The chain condenses the current question and the chat history into a standalone prompt which is sent to the retriever
After the retrieving the search result sends the question and search results to a LLM.

With the declarative nature of LangChain you can easily use a separate language model for each step. For example, you can use a cheaper and faster model for question summarization task, and a larger, more advanced and expensive model for answering the question. In this workshop you use one model for both steps.

To understand how the end-to-end orchestration works and how the components are linked together, look into the orchestration implementation in the Lambda function content/lab-02/orchestration/rag-app/rag_app.py.

Orchestration layer deployment

In this section you're going to deploy the end-to-end application stack, including UX, the backend API, and the serverless orchestration layer implemented as a Lambda function.

Navigate to the AWS Cloud9 environment.

You're going to use AWS Serverless Application Model (AWS SAM) to deploy the RAG chatbot application.

The SAM CloudFormation template deploys the following resources:

Network infrastructure including VPC, two public subnets, and an Internet Gateway
IAM execution roles for AWS Lambda and ECS task
ECS cluster for hosting the front-end
Application Load Balancer for public access of the front-end
Amazon API Gateway API exposing the orchestration layer to the frond-end via REST API
AWS Lambda function with the orchestration layer implementation
Amazon DynamoDB table for conversation history persistence

Look in template.yaml CloudFormation template and content/lab-02/orchestration/rag-app/rag_app.py Lambda function code to understand how the main components connected and how the serverless backend works.

Now deploy the SAM application.

Make sure you're in the workshop folder:

cd ~/environment/generative-ai-on-aws-architecture-patterns/content/lab-02/

Build AWS SAM application:

sam build

Deploy the application:

sam deploy --guided

You need to provide following parameters to pass to the SAM CloudFormation template:

LLMContextLength: use default 2048 if you use Falcon 40B endpoint otherwise set accordingly to your LLM of choice
ECRImageURI: use the ECR URI for rag-app image you built in the Chatbot app step
KendraRegion: provide AWS Region name if the Kendra index you created not in the same Region as the app. Otherwise leave empty
KendraIndexId: use the id of the Amazon Kendra index
SageMakerLLMEndpointName: use the endpoint name you created if you use a SageMaker endpoint, otherwise leave it empty if you use Amazon Bedrock
CreateVPC: Choose YES if you want to create a new VPC to host the app, otherwise leave default NO. In this case the default VPC in the AWS Account will be used
VPCCIDR: CIDR block for the VPC if you choose to create a new VPC, you can leave default if there is no conflicts with existing VPCs in your account

Please note, if you don't deploy a SageMaker LLM endpoint, you can use Amazon Bedrock API only.

Provide configuration parameters and wait until the CloudFormation stack deployment succeeded.

Print the stack output (provide your stack name):

aws cloudformation describe-stacks \
    --stack-name <sam stack name>  \
    --output table \
    --query "Stacks[0].Outputs[*].[OutputKey, OutputValue]"

Copy the value of RAGChatBotUrl in a browser and start the chatbot.

If everything works fine, you should see the chatbot user interface:

Experimentation

Now ask some questions about Switzerland or on generally any topic, for example: What is the usage of fossil fuels in Switzerland? or What is the inflation forecast in Switzerland in 2023?.

Try out various Amazon Bedrock LLMs in console

Optional activity, time permits:

Use Amazon Kendra search functionality in the console
Use Amazon Bedrock playground to try:
- zero-shot prompt without search context
- zero-shot prompt with search context
- Engineered prompt with search context. For a prompt example see here
- Conversational chain with search context

Conclusion

Congratulations, you just build your first RAG-based generative AI application on AWS!

Clean up

If you use own AWS account, you must delete provisioning resources to avoid unnecessary charges. You don't need to clean up if you use a workshop instructor provided account.

Remove the application CloudFormation stack:

Execute in the Cloud9 terminal: sam delete. Wait until stacks are deleted

If you used a SageMaker LLM endpoint, remove it:

Navigate to SageMaker Studio
Execute the Clean up section of the content/notebooks/llm-endpoints.ipynb notebook

Delete the AWS Cloud9 environment is you don't need it anymore.

Delete the Amazon Kendra data source and Amazon Kendra index.

Resources

The following is the collection of useful links to the related resources.

A Survey on In-context Learning – a paper on in-context learning for LLMs
Deploy self-service question answering with the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra and large language models – in-depth example of RAG chatbot with Amazon Kendra
How to Build an Open-Domain Question Answering System? – a good overview of information retrieval (IR) approaches
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks – an original paper on RAG approach
Retrieval augmented generation (RAG) – Amazon SageMaker Developer Guide
Implementing Generative AI on AWS workshop – a public workshop
Large Language Model - Query Disambiguation for Conversational Retrieval, and Generative Question Answering
QnABot on AWS - a public solution in AWS Solutions Library
Building (and Breaking) WebLangChain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval augmented generation chat bot

RAG architecture

Implementation

Architecture overview

Knowledge base

Amazon Kendra

Amazon OpenSearch

Ingestion - Amazon Kendra

Ingestion - OpenSearch

Generator

Chatbot app

Retriever and orchestration

Orchestration layer deployment

Experimentation

Try out various Amazon Bedrock LLMs in console

Conclusion

Clean up

Resources

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Retrieval augmented generation chat bot

RAG architecture

Implementation

Architecture overview

Knowledge base

Amazon Kendra

Amazon OpenSearch

Ingestion - Amazon Kendra

Ingestion - OpenSearch

Generator

Chatbot app

Retriever and orchestration

Orchestration layer deployment

Experimentation

Try out various Amazon Bedrock LLMs in console

Conclusion

Clean up

Resources