Last Updated: February 5, 2026 Status: Production Ready Difficulty: Intermediate ⭐⭐
Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) with external knowledge retrieval:
User Query → [Retrieve Relevant Docs] → [Generate Answer with Context] → Response
Why RAG?
- Up-to-date: Uses current documentation, not training data cutoff
- Accurate: Grounds responses in factual sources
- Transparent: Shows sources for answers
- Customizable: Works with any knowledge base
The Challenge:
"RAG is powerful, but 70% of the work is data preparation: scraping, chunking, cleaning, structuring, and maintaining documentation. This preprocessing is tedious, error-prone, and time-consuming."
Skill Seekers automates the hardest part of RAG: documentation preparation.
┌─────────────────────────────────────────────────────────────────┐
│ Documentation Sources │
│ • Websites • GitHub • PDFs • Local codebases │
└───────────────────┬─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Skill Seekers (Preprocessing Engine) │
│ • Smart scraping • Categorization • Pattern extraction │
│ • Multi-source merging • Quality checks • Format conversion │
└───────────────────┬─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Universal Output Formats │
│ • LangChain Documents • LlamaIndex Nodes • Generic Markdown │
└───────────────────┬─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Your RAG Pipeline │
│ • Pinecone • Weaviate • Chroma • FAISS • Custom │
└─────────────────────────────────────────────────────────────────┘
Key Value Proposition:
- 15-45 minutes → Complete documentation preprocessing
- 300+ tests → Production-quality reliability
- 24+ presets → Popular frameworks ready to use
- Multi-source → Combine docs + code + PDFs
- Platform-agnostic → Works with any vector store or RAG framework
"""
Basic RAG Pipeline Architecture
Components:
1. Data Ingestion (Skill Seekers)
2. Vector Storage (Pinecone/Chroma/FAISS)
3. Retrieval (Semantic search)
4. Generation (OpenAI/Claude/Local LLM)
"""
from skill_seekers import package_docs
from pinecone import Pinecone
from openai import OpenAI
import json
# ============================================================
# STEP 1: PREPROCESSING (Skill Seekers)
# ============================================================
# One-time setup: Generate structured docs
# $ skill-seekers scrape --config configs/react.json
# $ skill-seekers package output/react --target langchain
# Load preprocessed documents
with open("output/react-langchain.json") as f:
documents = json.load(f)
print(f"Loaded {len(documents)} preprocessed documents")
# ============================================================
# STEP 2: VECTOR STORAGE (Pinecone)
# ============================================================
pc = Pinecone(api_key="your-key")
index = pc.Index("react-docs")
# Create embeddings and upsert
openai_client = OpenAI()
for i, doc in enumerate(documents):
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=doc["page_content"]
)
index.upsert(vectors=[{
"id": f"doc_{i}",
"values": response.data[0].embedding,
"metadata": {
"text": doc["page_content"][:1000],
**doc["metadata"] # Skill Seekers metadata preserved
}
}])
# ============================================================
# STEP 3: RETRIEVAL (Semantic Search)
# ============================================================
def retrieve_context(query: str, top_k: int = 3) -> list:
"""Retrieve relevant documents for query."""
# Create query embedding
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=query
)
query_embedding = response.data[0].embedding
# Search vector store
results = index.query(
vector=query_embedding,
top_k=top_k,
include_metadata=True
)
return results["matches"]
# ============================================================
# STEP 4: GENERATION (OpenAI)
# ============================================================
def rag_answer(question: str) -> dict:
"""Generate answer using RAG."""
# Retrieve relevant docs
relevant_docs = retrieve_context(question)
# Build context
context = "\n\n".join([
doc["metadata"]["text"] for doc in relevant_docs
])
# Generate answer
response = openai_client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "Answer based on the provided context. If you don't know, say so."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}
]
)
return {
"answer": response.choices[0].message.content,
"sources": [
{
"category": doc["metadata"]["category"],
"score": doc["score"]
}
for doc in relevant_docs
]
}
# Usage
result = rag_answer("How do I create a React component?")
print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")Use Case: Customer support, internal documentation Q&A
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.schema import Document
import json
# Load Skill Seekers documents
with open("output/product-docs-langchain.json") as f:
docs_data = json.load(f)
documents = [
Document(
page_content=doc["page_content"],
metadata=doc["metadata"]
)
for doc in docs_data
]
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
documents=documents,
embedding=embeddings,
persist_directory="./chroma_db"
)
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0),
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
# Query
result = qa_chain({"query": "How do I reset my password?"})
print(f"Answer: {result['result']}")
print(f"Sources: {[doc.metadata['file'] for doc in result['source_documents']]}")Skill Seekers Value:
- Structured documents with categories → Better retrieval accuracy
- Metadata preserved → Source attribution automatic
- Pattern extraction → Consistent answer format
Use Case: Combining official docs + community knowledge + internal notes
from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
import json
# Load multiple sources (all preprocessed by Skill Seekers)
sources = {
"official_docs": "output/fastapi-llama-index.json",
"github_issues": "output/fastapi-issues-llama-index.json",
"internal_wiki": "output/company-wiki-llama-index.json"
}
all_nodes = []
for source_name, path in sources.items():
with open(path) as f:
nodes_data = json.load(f)
for node_data in nodes_data:
# Add source marker to metadata
node_data["metadata"]["source_type"] = source_name
all_nodes.append(TextNode(
text=node_data["text"],
metadata=node_data["metadata"],
id_=node_data["id_"]
))
print(f"Combined {len(all_nodes)} nodes from {len(sources)} sources")
# Create unified index
index = VectorStoreIndex(all_nodes)
# Query with source filtering
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
# Only query official docs
official_query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[ExactMatchFilter(key="source_type", value="official_docs")]
)
)
# Query all sources (community + official)
all_sources_query_engine = index.as_query_engine()
# Compare results
official_answer = official_query_engine.query("How to deploy FastAPI?")
community_answer = all_sources_query_engine.query("How to deploy FastAPI?")Skill Seekers Value:
unifiedcommand merges multiple sources automatically- Conflict detection identifies discrepancies
- Consistent formatting across all sources
Use Case: Technical documentation with specific terminology
from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoder
from openai import OpenAI
import json
# Load Skill Seekers documents
with open("output/django-langchain.json") as f:
documents = json.load(f)
# Initialize clients
pc = Pinecone(api_key="your-key")
openai_client = OpenAI()
# Create BM25 encoder (keyword search)
bm25 = BM25Encoder()
bm25.fit([doc["page_content"] for doc in documents])
# Create index with hybrid search support
index_name = "django-hybrid"
index = pc.Index(index_name)
# Upsert with both dense and sparse vectors
for i, doc in enumerate(documents):
# Dense embedding (semantic)
dense_response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=doc["page_content"]
)
dense_vector = dense_response.data[0].embedding
# Sparse embedding (keyword)
sparse_vector = bm25.encode_documents(doc["page_content"])
# Upsert with both
index.upsert(vectors=[{
"id": f"doc_{i}",
"values": dense_vector,
"sparse_values": sparse_vector,
"metadata": {
"text": doc["page_content"][:1000],
**doc["metadata"]
}
}])
# Query with hybrid search
def hybrid_search(query: str, alpha: float = 0.5):
"""
Hybrid search combining semantic and keyword.
Args:
query: Search query
alpha: Weight for semantic search (0=keyword only, 1=semantic only)
"""
# Dense query embedding
dense_response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=query
)
dense_query = dense_response.data[0].embedding
# Sparse query embedding
sparse_query = bm25.encode_queries(query)
# Hybrid query
results = index.query(
vector=dense_query,
sparse_vector=sparse_query,
top_k=5,
include_metadata=True
)
return results["matches"]
# Test
results = hybrid_search("Django model relationships foreign key")
for match in results:
print(f"Score: {match['score']:.3f}")
print(f"Category: {match['metadata']['category']}")
print(f"Text: {match['metadata']['text'][:150]}...")
print()Skill Seekers Value:
- Pattern extraction identifies technical terminology
- Category tags improve keyword targeting
- Code examples preserved with syntax highlighting
Use Case: Interactive documentation assistant
from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
from llama_index.core.memory import ChatMemoryBuffer
import json
# Load documents
with open("output/react-llama-index.json") as f:
nodes_data = json.load(f)
nodes = [
TextNode(
text=node["text"],
metadata=node["metadata"],
id_=node["id_"]
)
for node in nodes_data
]
# Create index
index = VectorStoreIndex(nodes)
# Create chat engine with memory
chat_engine = index.as_chat_engine(
chat_mode="condense_question",
memory=ChatMemoryBuffer.from_defaults(token_limit=3000),
verbose=True
)
# Multi-turn conversation
print("React Documentation Assistant\n")
conversations = [
"What is React?",
"How do I create components?", # Remembers context from previous question
"What about state management?", # Continues conversation
"Show me an example", # Contextual follow-up
]
for user_msg in conversations:
print(f"\nUser: {user_msg}")
response = chat_engine.chat(user_msg)
print(f"Assistant: {response}")
# Show sources
if hasattr(response, 'source_nodes'):
print(f"Sources: {[n.metadata['file'] for n in response.source_nodes[:3]]}")Skill Seekers Value:
- Hierarchical structure (overview → details) helps conversational flow
- Cross-references enable contextual follow-ups
- Examples with context improve chat quality
Use Case: Multi-tenant SaaS, per-user documentation
from pinecone import Pinecone
from openai import OpenAI
import json
pc = Pinecone(api_key="your-key")
openai_client = OpenAI()
# Use namespaces for multi-tenancy
customers = ["customer_a", "customer_b", "customer_c"]
for customer in customers:
# Load customer-specific docs (generated by Skill Seekers)
with open(f"output/{customer}-docs-langchain.json") as f:
documents = json.load(f)
index = pc.Index("saas-docs")
# Upsert to customer namespace
vectors = []
for i, doc in enumerate(documents):
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=doc["page_content"]
)
vectors.append({
"id": f"{customer}_doc_{i}",
"values": response.data[0].embedding,
"metadata": {
"text": doc["page_content"][:1000],
"customer": customer, # Additional metadata
**doc["metadata"]
}
})
index.upsert(vectors=vectors, namespace=customer)
print(f"✅ Upserted {len(documents)} docs for {customer}")
# Query customer-specific namespace
def query_customer_docs(customer: str, query: str):
"""Query only specific customer's documentation."""
index = pc.Index("saas-docs")
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=query
)
query_embedding = response.data[0].embedding
results = index.query(
vector=query_embedding,
namespace=customer, # Isolated per customer
top_k=3,
include_metadata=True
)
return results["matches"]
# Usage
results = query_customer_docs("customer_a", "How do I configure X?")Skill Seekers Value:
- Custom configs per customer/project
- Consistent processing across all tenants
- Easy updates: regenerate + re-upsert
# lambda_function.py
import json
from pinecone import Pinecone
from openai import OpenAI
import os
# Initialize clients (reuse across invocations)
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
index = pc.Index("production-docs")
def lambda_handler(event, context):
"""
API Gateway → Lambda → Pinecone RAG → Response
"""
body = json.loads(event["body"])
query = body["query"]
# Create embedding
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=query
)
query_embedding = response.data[0].embedding
# Retrieve
results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
# Build context
context = "\n\n".join([m["metadata"]["text"] for m in results["matches"]])
# Generate
completion = openai_client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Answer based on provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQ: {query}"}
]
)
return {
"statusCode": 200,
"body": json.dumps({
"answer": completion.choices[0].message.content,
"sources": [m["metadata"]["category"] for m in results["matches"]]
})
}Deployment:
# 1. Preprocess docs with Skill Seekers
skill-seekers scrape --config configs/product-docs.json
skill-seekers package output/product-docs --target langchain
# 2. One-time: Upsert to Pinecone (can be separate Lambda or script)
python upsert_to_pinecone.py
# 3. Deploy Lambda
zip -r function.zip lambda_function.py
aws lambda create-function \
--function-name rag-api \
--zip-file fileb://function.zip \
--handler lambda_function.lambda_handler \
--runtime python3.11 \
--environment Variables={PINECONE_API_KEY=xxx,OPENAI_API_KEY=xxx}# app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.schema import Document
import json
app = FastAPI()
# Load documents on startup (from Skill Seekers output)
@app.on_event("startup")
async def load_documents():
global qa_chain
with open("data/docs-langchain.json") as f:
docs_data = json.load(f)
documents = [
Document(page_content=d["page_content"], metadata=d["metadata"])
for d in docs_data
]
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
documents=documents,
embedding=embeddings,
persist_directory="./chroma_db"
)
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0),
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
class Query(BaseModel):
question: str
@app.post("/query")
async def query_docs(query: Query):
"""RAG endpoint."""
result = qa_chain({"query": query.question})
return {
"answer": result["result"],
"sources": [
{
"category": doc.metadata["category"],
"file": doc.metadata["file"]
}
for doc in result["source_documents"]
]
}
@app.get("/health")
async def health():
return {"status": "healthy"}Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
COPY data/ ./data/
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]Deploy:
# Build
docker build -t rag-api .
# Run
docker run -p 8000:8000 \
-e OPENAI_API_KEY=sk-... \
rag-api
# Test
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "How do I...?"}'Skill Seekers provides smart chunking based on content type:
# Skill Seekers automatically:
# - Chunks by sections for documentation
# - Preserves code blocks intact
# - Maintains context with metadata
# If you need custom chunking:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
# Apply to Skill Seekers output
chunks = text_splitter.split_documents(documents)# Pinecone: Choose right index type
from pinecone import ServerlessSpec, PodSpec
# Serverless (recommended for most cases)
spec = ServerlessSpec(cloud="aws", region="us-east-1")
# Pod-based (for high throughput)
spec = PodSpec(environment="us-east1-gcp", pod_type="p1.x2")
# Chroma: Use persistent directory
vectorstore = Chroma(
embedding_function=embeddings,
persist_directory="./chroma_db" # Reuse across restarts
)from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def get_cached_embedding(text: str) -> list[float]:
"""Cache embeddings to avoid redundant API calls."""
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return response.data[0].embedding
# Use in retrieval
query_embedding = get_cached_embedding(query)# Track retrieval quality
import time
def retrieve_with_metrics(query: str):
start = time.time()
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True
)
latency = time.time() - start
# Log metrics
print(f"Query latency: {latency*1000:.2f}ms")
print(f"Top score: {results['matches'][0]['score']:.3f}")
print(f"Avg score: {sum(m['score'] for m in results['matches'])/len(results['matches']):.3f}")
return results
# Evaluate answer quality (LLM-as-judge)
def evaluate_answer(question: str, answer: str, context: str) -> float:
"""Use LLM to evaluate RAG answer quality."""
eval_prompt = f"""
Evaluate the quality of this RAG answer on a scale of 1-10.
Question: {question}
Answer: {answer}
Context: {context[:500]}...
Criteria:
- Relevance to question
- Accuracy based on context
- Completeness
Return only a number 1-10.
"""
response = openai_client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": eval_prompt}]
)
return float(response.choices[0].message.content.strip())# Set up automation (GitHub Actions example)
# .github/workflows/update-docs.yml
name: Update RAG Documentation
on:
schedule:
- cron: '0 0 * * 0' # Weekly on Sunday
workflow_dispatch: # Manual trigger
jobs:
update-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Skill Seekers
run: pip install skill-seekers
- name: Regenerate documentation
run: |
skill-seekers scrape --config configs/product-docs.json
skill-seekers package output/product-docs --target langchain
- name: Upload to S3 (for Lambda to pick up)
run: |
aws s3 cp output/product-docs-langchain.json \
s3://my-bucket/rag-docs/latest.json
- name: Trigger re-index
run: |
curl -X POST https://api.example.com/reindex \
-H "Authorization: Bearer ${{ secrets.API_TOKEN }}"| Documentation Size | Pages | Skill Seekers Time | Manual Time (Est.) |
|---|---|---|---|
| Small (React Core) | 150 | 5 min | 2-3 hours |
| Medium (Django) | 500 | 15 min | 5-8 hours |
| Large (AWS SDK) | 2000+ | 45 min | 20+ hours |
| Vector Store | Avg Latency | Throughput | Cost |
|---|---|---|---|
| Pinecone (Serverless) | 50-100ms | 100 QPS | ~$0.025/100k |
| Pinecone (Pod p1.x1) | 20-50ms | 100 QPS | ~$70/month |
| Chroma (Local) | 10-30ms | Unlimited | Free |
| FAISS (Local) | 5-20ms | Unlimited | Free |
| Setup | Answer Quality (1-10) | Source Attribution |
|---|---|---|
| Raw LLM (no RAG) | 6.5 | None |
| Manual RAG | 8.0 | 60% accurate |
| Skill Seekers RAG | 9.2 | 95% accurate |
Company: SaaS startup with 5 product lines
Requirements:
- Unified search across all products
- Fast updates (weekly releases)
- Multi-language support
- Cost-effective
Solution:
# 1. Preprocess all product docs
skill-seekers scrape --config configs/product-a.json
skill-seekers scrape --config configs/product-b.json
# ... repeat for all products
# 2. Package for LangChain
for product in product-a product-b product-c product-d product-e; do
skill-seekers package output/$product --target langchain
done
# 3. Combine into single Chroma vector store
python scripts/build_unified_index.py
# 4. Deploy FastAPI + Chroma (see Deployment 2)
docker-compose up -d
# 5. Update weekly via GitHub ActionsResults:
- 99% answer accuracy
- <100ms query latency
- $0 vector store costs (Chroma local)
- 5-minute update time (weekly)
Company: E-commerce platform
Requirements:
- 24/7 availability
- Handle 10k queries/day
- Multi-tenant (per merchant)
- Source attribution for compliance
Solution:
# 1. Generate merchant-specific docs
for merchant in merchants/*; do
skill-seekers analyze --directory $merchant/docs
skill-seekers package output/$merchant --target langchain
done
# 2. Deploy to Pinecone with namespaces (see Pattern 5)
python scripts/upsert_multi_tenant.py
# 3. Deploy serverless API (see Deployment 1)
serverless deploy
# 4. Connect to Slack/Discord/Web widgetResults:
- 85% query deflection rate
- $200/month total cost (Pinecone + OpenAI)
- <2s end-to-end response time
- 100% source attribution accuracy
Company: 500-person engineering org
Requirements:
- Combine docs + internal wikis + Slack knowledge
- Secure (on-premise vector store)
- No external API calls (compliance)
- Low maintenance
Solution:
# 1. Scrape all sources
skill-seekers scrape --config configs/docs.json
skill-seekers unified --docs-config configs/docs.json \
--github internal/repo \
--name internal-kb
# 2. Package for LlamaIndex
skill-seekers package output/internal-kb --target llama-index
# 3. Deploy with local models
# - Use SentenceTransformers for embeddings (no API)
# - Use Ollama/LM Studio for generation (no API)
# - Store in FAISS (local vector store)
python scripts/build_private_rag.py
# 4. Deploy on internal Kubernetes cluster
kubectl apply -f k8s/Results:
- Zero external API calls
- Full GDPR/SOC2 compliance
- <50ms average latency
- 2-hour setup, zero ongoing maintenance
- Questions: GitHub Discussions
- Issues: GitHub Issues
- Documentation: https://skillseekersweb.com/
- LangChain Integration - Build QA chains and agents
- LlamaIndex Integration - Create query engines
- Pinecone Integration - Production vector storage
- Cursor Integration - IDE AI assistance
- Start simple - Try Pattern 1 (Simple QA Bot) first
- Measure baseline - Track accuracy and latency
- Iterate - Add hybrid search, caching, filters as needed
- Deploy - Choose deployment pattern based on scale
- Monitor - Track metrics and user feedback
- Update regularly - Automate doc refresh with Skill Seekers
Last Updated: February 5, 2026 Tested With: LangChain 0.1.0+, LlamaIndex 0.10.0+, Pinecone 3.0+ Skill Seekers Version: v2.9.0+