Skip to content

Commit 4c15624

Browse files
Create aws_ques
1 parent db0b39a commit 4c15624

File tree

1 file changed

+48
-0
lines changed
  • docs/chatbot_comparison/results/gemini_cloud_assist

1 file changed

+48
-0
lines changed
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
AI: How can I create and test a RAG system on AWS?
2+
3+
You can create and test a Retrieval Augmented Generation (RAG) system on AWS using a combination of services. A common approach involves:
4+
5+
Storage and Retrieval (Vector Database/Embeddings):
6+
7+
Amazon OpenSearch Service: Can be used to store and search vector embeddings of your knowledge base documents.
8+
Amazon Aurora with pgvector: Postgres extension for vector search directly within your database.
9+
Other Vector Databases: Consider services like Pinecone, Weaviate, or Chroma (often integrated with AWS).
10+
Large Language Model (LLM):
11+
12+
Amazon Bedrock: Provides access to foundation models (FMs) like Jurassic and Titan, suitable for the generation step in RAG.
13+
Amazon SageMaker JumpStart: Deploy and fine-tune open-source LLMs.
14+
Hugging Face on SageMaker: Another option for deploying open-source LLMs.
15+
Workflow Orchestration:
16+
17+
AWS Lambda: For serverless functions to handle retrieval, prompt engineering, and response generation.
18+
Amazon Step Functions: Orchestrate complex RAG workflows, including error handling and retries.
19+
Testing and Evaluation:
20+
21+
Manual Testing: Initially, test with a curated set of queries.
22+
Automated Evaluation: Use metrics like ROUGE, BLEU, or METEOR to assess the quality of generated responses. Build a test harness using Lambda or other compute services.
23+
Example Workflow:
24+
25+
A user query is received.
26+
An embedding of the query is generated (potentially using an LLM or embedding model deployed on SageMaker).
27+
The embedding is used to query the vector database (OpenSearch, Aurora/pgvector, etc.) to retrieve relevant documents.
28+
The retrieved documents and the original query are combined into a prompt for the LLM (Bedrock, JumpStart, etc.).
29+
The LLM generates a response, augmented with information from the retrieved documents.
30+
Data and Analytics: What are the key features of Amazon SageMaker for data science workflows?
31+
32+
Amazon SageMaker offers a comprehensive suite of tools for data science workflows:
33+
34+
Data Preparation: SageMaker Data Wrangler for data exploration, cleaning, and feature engineering. SageMaker Feature Store for managing and sharing features.
35+
Model Building: SageMaker Studio Notebooks, pre-built containers for popular frameworks (TensorFlow, PyTorch, etc.), and bring-your-own container options. Automated model tuning (SageMaker Automatic Model Tuning).
36+
Model Training: Distributed training, managed spot instances for cost optimization, and experiment tracking.
37+
Model Deployment: Easy deployment to endpoints for real-time inference or batch transformations. Support for A/B testing, autoscaling, and model monitoring.
38+
MLOps: SageMaker Pipelines for building and managing ML workflows. Model Registry for model versioning and lineage.
39+
Compute Services: How do you configure Auto Scaling for EC2 instances in AWS?
40+
41+
Launch Configuration: Define a launch template or launch configuration specifying the AMI, instance type, storage, and other settings for your EC2 instances.
42+
Auto Scaling Group: Create an Auto Scaling group and associate the launch template/configuration.
43+
Scaling Policies: Configure scaling policies to define how the Auto Scaling group should scale in response to changes in demand.
44+
Dynamic Scaling: Scale based on metrics like CPU utilization, request count, or queue length.
45+
Scheduled Scaling: Scale at specific times or intervals.
46+
Manual Scaling: Adjust the desired capacity of the Auto Scaling group manually.
47+
Health Checks: Configure health checks to ensure that Auto Scaling replaces unhealthy instances.
48+
Load Balancing: Integrate your Auto Scaling group with a load balancer (e.g., Elastic Load Balancing) to distribute traffic across your instances.

0 commit comments

Comments
 (0)