Learn to build modern AI systems from the ground up through hands-on implementation
Master the most in-demand AI engineering skills: RAG (Retrieval-Augmented Generation)
This is a learner-focused project where you'll build a complete research assistant system that automatically fetches academic papers, understands their content, and answers your research questions using advanced RAG techniques.
The arXiv Paper Curator will teach you to build a production-grade RAG system using industry best practices. Unlike tutorials that jump straight to vector search, we follow the professional path: master keyword search foundations first, then enhance with vectors for hybrid retrieval.
π― The Professional Difference: We build RAG systems the way successful companies do - solid search foundations enhanced with AI, not AI-first approaches that ignore search fundamentals.
By the end of this course, you'll have your own AI research assistant and the deep technical skills to build production RAG systems for any domain.
- Week 1: Complete infrastructure with Docker, FastAPI, PostgreSQL, OpenSearch, and Airflow
- Week 2: Automated data pipeline fetching and parsing academic papers from arXiv
- Week 3: Production BM25 keyword search with filtering and relevance scoring
- Week 4: Intelligent chunking + hybrid search combining keywords with semantic understanding
- Week 5: Complete RAG pipeline with local LLM, streaming responses, and Gradio interface
- Week 6: Evaluation system to measure and improve RAG performance
- Docker Desktop (with Docker Compose)
- Python 3.12+
- UV Package Manager (Install Guide)
- 8GB+ RAM and 20GB+ free disk space
# 1. Clone and setup
git clone <repository-url>
cd arxiv-paper-curator
# 2. Configure environment (IMPORTANT!)
cp .env.example .env
# The .env file contains all necessary configuration for OpenSearch,
# arXiv API, and service connections. Defaults work out of the box.
# For Week 4: Add JINA_API_KEY=your_key_here for hybrid search
# 3. Install dependencies
uv sync
# 4. Start all services
docker compose up --build -d
# 5. Verify everything works
curl http://localhost:8000/health
Week | Topic | Blog Post | Code Release |
---|---|---|---|
Week 0 | The Mother of AI project - 6 phases | The Mother of AI project | - |
Week 1 | Infrastructure Foundation | The Infrastructure That Powers RAG Systems | week1.0 |
Week 2 | Data Ingestion Pipeline | Building Data Ingestion Pipelines for RAG | week2.0 |
Week 3 | OpenSearch ingestion & BM25 retrieval | The Search Foundation Every RAG System Needs | week3.0 |
Week 4 | Chunking & Hybrid Search | The Chunking Strategy That Makes Hybrid Search Work | week4.0 |
Week 5 | Complete RAG system | The Complete RAG System | week5.0 |
Week 6 | Setting up evals | Coming Soon | Coming Soon |
π₯ Clone a specific week's release:
# Clone a specific week's code
git clone --branch <WEEK_TAG> https://github.com/jamwithai/arxiv-paper-curator
cd arxiv-paper-curator
uv sync
docker compose down -v
docker compose up --build -d
# Replace <WEEK_TAG> with: week1.0, week2.0, etc.
Service | URL | Purpose |
---|---|---|
API Documentation | http://localhost:8000/docs | Interactive API testing |
Gradio RAG Interface | http://localhost:7861 | User-friendly chat interface |
Airflow Dashboard | http://localhost:8080 | Workflow management |
OpenSearch Dashboards | http://localhost:5601 | Hybrid search engine UI |
Start here! Master the infrastructure that powers modern RAG systems.
- Complete infrastructure setup with Docker Compose
- FastAPI development with automatic documentation and health checks
- PostgreSQL database configuration and management
- OpenSearch hybrid search engine setup
- Ollama local LLM service configuration
- Service orchestration and health monitoring
- Professional development environment with code quality tools
Infrastructure Components:
- FastAPI: REST endpoints with async support (Port 8000)
- PostgreSQL 16: Paper metadata storage (Port 5432)
- OpenSearch 2.19: Search engine with dashboards (Ports 9200, 5601)
- Apache Airflow 3.0: Workflow orchestration (Port 8080)
- Ollama: Local LLM server (Port 11434)
# Launch the Week 1 notebook
uv run jupyter notebook notebooks/week1/week1_setup.ipynb
Complete when you can:
- Start all services with
docker compose up -d
- Access API docs at http://localhost:8000/docs
- Login to Airflow at http://localhost:8080
- Browse OpenSearch at http://localhost:5601
- All tests pass:
uv run pytest
Blog Post: The Infrastructure That Powers RAG Systems - Detailed walkthrough and production insights
Building on Week 1 infrastructure: Learn to fetch, process, and store academic papers automatically.
- arXiv API integration with rate limiting and retry logic
- Scientific PDF parsing using Docling
- Automated data ingestion pipelines with Apache Airflow
- Metadata extraction and storage workflows
- Complete paper processing from API to database
Data Pipeline Components:
- MetadataFetcher: π― Main orchestrator coordinating the entire pipeline
- ArxivClient: Rate-limited paper fetching with retry logic
- PDFParserService: Docling-powered scientific document processing
- Airflow DAGs: Automated daily paper ingestion workflows
- PostgreSQL Storage: Structured paper metadata and content
# Launch the Week 2 notebook
uv run jupyter notebook notebooks/week2/week2_arxiv_integration.ipynb
arXiv API Integration:
# Example: Fetch papers with rate limiting
from src.services.arxiv.factory import make_arxiv_client
async def fetch_recent_papers():
client = make_arxiv_client()
papers = await client.search_papers(
query="cat:cs.AI",
max_results=10,
from_date="20240801",
to_date="20240807"
)
return papers
PDF Processing Pipeline:
# Example: Parse PDF with Docling
from src.services.pdf_parser.factory import make_pdf_parser_service
async def process_paper_pdf(pdf_url: str):
parser = make_pdf_parser_service()
parsed_content = await parser.parse_pdf_from_url(pdf_url)
return parsed_content # Structured content with text, tables, figures
Complete Ingestion Workflow:
# Example: Full paper ingestion pipeline
from src.services.metadata_fetcher import make_metadata_fetcher
async def ingest_papers():
fetcher = make_metadata_fetcher()
results = await fetcher.fetch_and_store_papers(
query="cat:cs.AI",
max_results=5,
from_date="20240807"
)
return results # Papers stored in database with full content
Complete when you can:
- Fetch papers from arXiv API: Test in Week 2 notebook
- Parse PDF content with Docling: View extracted structured content
- Run Airflow DAG:
arxiv_paper_ingestion
executes successfully - Verify database storage: Papers appear in PostgreSQL with full content
- API endpoints work:
/papers
returns stored papers with metadata
Blog Post: Building Data Ingestion Pipelines for RAG - arXiv API integration and PDF processing
π¨ The 90% Problem: Most RAG systems jump straight to vector search and miss the foundation that powers the best retrieval systems. We're doing it right!
Building on Weeks 1-2 foundation: Implement the keyword search foundation that professional RAG systems rely on.
The Reality Check: Vector search alone is not enough. The most effective RAG systems use hybrid retrieval - combining keyword search (BM25) with vector search. Here's why we start with keywords:
- π Exact Match Power: Keywords excel at finding specific terms, technical jargon, and precise phrases
- π Interpretable Results: You can understand exactly why a document was retrieved
- β‘ Speed & Efficiency: BM25 is computationally fast and doesn't require expensive embedding models
- π― Domain Knowledge: Technical papers often require exact terminology matches that vector search might miss
- π Production Reality: Companies like Elasticsearch, Algolia, and enterprise search all use keyword search as their foundation
Complete Week 3 architecture showing the OpenSearch integration flow
Search Infrastructure: Master full-text search with OpenSearch before adding vector complexity.
- Foundation First: Why keyword search is essential for RAG systems
- OpenSearch Mastery: Index management, mappings, and search optimization
- BM25 Algorithm: Understanding the math behind effective keyword search
- Query DSL: Building complex search queries with filters and boosting
- Search Analytics: Measuring search relevance and performance
- Production Patterns: How real companies structure their search systems
src/services/opensearch/
: Professional search service implementationsrc/routers/search.py
: Search API endpoints with BM25 scoringnotebooks/week3/
: Complete OpenSearch integration guide- Search Quality Metrics: Precision, recall, and relevance scoring
Week 3: Master keyword search (BM25) β YOU ARE HERE
Week 4: Add intelligent chunking strategies
Week 5: Introduce vector embeddings for hybrid retrieval
Week 6: Optimize the complete hybrid system
This progression mirrors how successful companies build search systems - solid foundation first, then enhance with advanced techniques.
# Launch the Week 3 notebook
uv run jupyter notebook notebooks/week3/week3_opensearch.ipynb
BM25 Search Implementation:
# Example: Search papers with BM25 scoring
from src.services.opensearch.factory import make_opensearch_client
async def search_papers():
client = make_opensearch_client()
results = await client.search_papers(
query="transformer attention mechanism",
max_results=10,
categories=["cs.AI", "cs.LG"]
)
return results # Papers ranked by BM25 relevance
Search API Usage:
# Example: Use the search endpoint
import httpx
async def query_papers():
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/api/v1/search", json={
"query": "neural networks optimization",
"max_results": 5,
"latest_papers": True
})
return response.json()
Complete when you can:
- Index papers in OpenSearch: Papers searchable via OpenSearch Dashboards
- Search via API:
/search
endpoint returns relevant papers with BM25 scoring - Filter by categories: Search within specific arXiv categories (cs.AI, cs.LG, etc.)
- Sort by relevance or date: Toggle between BM25 scoring and latest papers
- View search analytics: Understanding why papers matched your query
Blog Post: The Search Foundation Every RAG System Needs - Complete BM25 implementation with OpenSearch
π The Intelligence Upgrade: Now we enhance our solid BM25 foundation with semantic understanding through intelligent chunking and hybrid retrieval.
Building on Week 3 foundation: Add the semantic layer that makes search truly intelligent.
The Next Level: With solid BM25 search proven, we can now intelligently add semantic capabilities:
- π§© Smart Chunking: Break documents into coherent, searchable segments that preserve context
- π€ Semantic Understanding: Find relevant content even when users paraphrase or use synonyms
- βοΈ Hybrid Excellence: Combine keyword precision with semantic recall using RRF fusion
- π Best of Both Worlds: Fast exact matching + deep semantic understanding
- π Production Reality: How modern RAG systems actually work in practice
Complete Week 4 hybrid search architecture with chunking, embeddings, and RRF fusion
Hybrid Search Infrastructure: Production-grade chunking strategies with unified search supporting BM25, vector, and hybrid modes.
- Section-Based Chunking: Intelligent document segmentation that respects structure
- Production Embeddings: Jina AI integration with fallback strategies
- Hybrid Search Mastery: RRF fusion combining keyword + semantic retrieval
- Unified API Design: Single endpoint supporting multiple search modes
- Performance Analysis: Understanding trade-offs between search approaches
src/services/indexing/text_chunker.py
: Section-aware chunking with overlap strategiessrc/services/embeddings/
: Production embedding pipeline with Jina AIsrc/routers/hybrid_search.py
: Unified search API supporting all modesnotebooks/week4/
: Complete hybrid search implementation guide
# Launch the Week 4 notebook
uv run jupyter notebook notebooks/week4/week4_hybrid_search.ipynb
Section-Based Chunking:
# Example: Intelligent document chunking
from src.services.indexing.text_chunker import TextChunker
chunker = TextChunker(chunk_size=600, overlap_size=100)
chunks = chunker.chunk_paper(
title="Attention Mechanisms in Neural Networks",
abstract="Recent advances in attention...",
full_text=paper_content,
sections=parsed_sections # From Docling PDF parsing
)
# Result: Coherent chunks respecting document structure
Hybrid Search Implementation:
# Example: Unified search supporting multiple modes
async def search_papers(query: str, use_hybrid: bool = True):
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/api/v1/hybrid-search/", json={
"query": query,
"use_hybrid": use_hybrid, # Auto-generates embeddings
"size": 10,
"categories": ["cs.AI"]
})
return response.json()
# BM25 only: Fast keyword matching (~50ms)
bm25_results = await search_papers("transformer attention", use_hybrid=False)
# Hybrid search: Semantic + keyword understanding (~400ms)
hybrid_results = await search_papers("how to make models more efficient", use_hybrid=True)
Complete when you can:
- Chunk documents intelligently: Papers broken into coherent 600-word segments
- Generate embeddings: Jina AI integration working with automatic query embedding
- Hybrid search working: RRF fusion combining BM25 + vector similarity
- Compare search modes: Understand when to use BM25 vs hybrid search
- Production API ready:
/hybrid-search
endpoint handling all search types
Search Mode | Speed | Precision@10 | Recall@10 | Use Case |
---|---|---|---|---|
BM25 Only | ~50ms | 0.67 | 0.71 | Exact keywords, author names |
Hybrid (RRF) | ~400ms | 0.84 | 0.89 | Conceptual queries, synonyms |
Blog Post: The Chunking Strategy That Makes Hybrid Search Work - Production chunking and RRF fusion implementation
π― The RAG Completion: Transform search results into intelligent answers with local LLM integration and streaming responses.
Building on Week 4 hybrid search: Add the LLM layer that turns search into intelligent conversation.
The Production Advantage: Complete the RAG pipeline with privacy-first, optimized generation:
- π Local LLM Control: Complete data privacy with Ollama - no external API calls
- β‘ 6x Performance Boost: Optimized from 120s β 15-20s through prompt engineering
- π‘ Real-time Streaming: Server-Sent Events for immediate user feedback
- ποΈ User-Friendly Interface: Gradio web UI for non-technical users
- π§ Production Ready: Clean API design with comprehensive error handling
Complete RAG system with LLM generation layer (Ollama), hybrid retrieval pipeline, and Gradio interface
Complete RAG Infrastructure: Local LLM generation with optimized prompting, dual API endpoints, and interactive web interface.
- Local LLM Mastery: Ollama service integration with multiple model support
- Performance Optimization: 80% prompt reduction, 6x speed improvement techniques
- Streaming Implementation: Server-Sent Events for real-time response generation
- Dual API Design: Standard and streaming endpoints for different use cases
- Interactive UI: Gradio interface with advanced parameter controls
src/routers/ask.py
: Dual RAG endpoints (/api/v1/ask
+/api/v1/stream
)src/services/ollama/
: LLM client with optimized prompts and 300-word response limitssrc/services/ollama/prompts/rag_system.txt
: Optimized system prompt for academic paperssrc/gradio_app.py
: Interactive web interface with real-time streaming supportgradio_launcher.py
: Easy-launch script for the web UI (runs on port 7861)
# Launch the Week 5 notebook
uv run jupyter notebook notebooks/week5/week5_complete_rag_system.ipynb
# Launch Gradio interface
uv run python gradio_launcher.py
# Open http://localhost:7861
Complete RAG Query:
# Example: Standard RAG endpoint
import httpx
async def ask_question(query: str):
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/api/v1/ask", json={
"query": query,
"top_k": 3,
"use_hybrid": True,
"model": "llama3.2:1b"
})
result = response.json()
return result["answer"], result["sources"]
# Ask a question
answer, sources = await ask_question("What are transformers in machine learning?")
Streaming RAG Implementation:
# Example: Real-time streaming responses
import httpx
import json
async def stream_rag_response(query: str):
async with httpx.AsyncClient() as client:
async with client.stream("POST", "http://localhost:8000/api/v1/stream", json={
"query": query,
"top_k": 3,
"use_hybrid": True
}) as response:
async for line in response.aiter_lines():
if line.startswith('data: '):
data = json.loads(line[6:])
if 'chunk' in data:
print(data['chunk'], end='', flush=True)
elif data.get('done'):
break
# Stream an answer in real-time
await stream_rag_response("Explain attention mechanisms")
Standard RAG Endpoint: /api/v1/ask
- Response Type: Complete JSON response
- Use Case: Batch processing, API integrations
- Response Time: 15-20 seconds
Streaming RAG Endpoint: /api/v1/stream
- Response Type: Server-Sent Events (SSE)
- Use Case: Interactive UIs, real-time feedback
- Time to First Token: 2-3 seconds
Request Format (both endpoints):
{
"query": "Your question here",
"top_k": 3, // Number of chunks (1-10)
"use_hybrid": true, // Hybrid vs BM25 search
"model": "llama3.2:1b", // LLM model to use
"categories": ["cs.AI"] // Optional category filter
}
Complete when you can:
- Standard RAG: Get complete answers with sources via
/api/v1/ask
- Streaming RAG: See real-time generation via
/api/v1/stream
- Gradio Interface: Interactive chat at http://localhost:7861
- Performance: 15-20s total response time (6x improvement from baseline)
- Local LLM: Ollama running with llama3.2:1b model
- Source Attribution: Automatic deduplication of paper sources
Metric | Before | After (Week 5) | Improvement |
---|---|---|---|
Response Time | 120+ seconds | 15-20 seconds | 6x faster |
Time to First Token | N/A | 2-3 seconds | Streaming enabled |
Prompt Efficiency | ~10KB | ~2KB | 80% reduction |
User Experience | API only | Web interface + streaming | Production ready |
Key Optimizations:
- Removed redundant metadata (80% prompt size reduction)
- 300-word response limit for focused answers
- Shared code architecture (DRY principles)
- Automatic source deduplication
Issue | Solution |
---|---|
404 on /stream endpoint |
Rebuild API: docker compose build api && docker compose restart api |
Slow response times | Use smaller model (llama3.2:1b ) or reduce top_k parameter |
Gradio not accessible | Port changed to 7861: http://localhost:7861 |
Ollama connection errors | Check service: docker exec rag-ollama ollama list |
No streaming response | Verify SSE format, check browser network tab |
Out of memory errors | Increase Docker memory limit to 8GB+ |
Quick Health Check:
# Check all services
curl http://localhost:8000/api/v1/health | jq
# Test RAG endpoint
curl -X POST http://localhost:8000/api/v1/ask \
-H "Content-Type: application/json" \
-d '{"query": "test", "top_k": 1}'
# Test streaming endpoint
curl -X POST http://localhost:8000/api/v1/stream \
-H "Content-Type: application/json" \
-d '{"query": "test", "top_k": 1}' --no-buffer
Blog Post: [Coming Soon] - Complete RAG system with local LLM integration and optimization techniques
With your complete RAG system now operational, consider these enhancements:
Immediate Improvements:
- Experiment with different Ollama models (llama3.2:3b, qwen2.5:7b)
- Customize the Gradio interface with your branding
- Add conversation memory for multi-turn dialogues
- Implement user feedback and rating system
Production Readiness:
- Set up monitoring and alerting
- Add authentication and rate limiting
- Implement caching for frequent queries
- Configure backup and recovery processes
Advanced Features:
- Document upload functionality
- Multiple knowledge base support
- Advanced search filters and sorting
- Export conversations and analytics
The project uses a unified .env
file with nested configuration structure to manage settings across all services.
# Application Settings
DEBUG=true
ENVIRONMENT=development
# arXiv API (Week 2)
ARXIV__MAX_RESULTS=15
ARXIV__SEARCH_CATEGORY=cs.AI
ARXIV__RATE_LIMIT_DELAY=3.0
# PDF Parser (Week 2)
PDF_PARSER__MAX_PAGES=30
PDF_PARSER__DO_OCR=false
# OpenSearch (Week 3)
OPENSEARCH__HOST=http://opensearch:9200
OPENSEARCH__INDEX_NAME=arxiv-papers
# Jina AI Embeddings (Week 4)
JINA_API_KEY=your_jina_api_key_here
EMBEDDINGS__MODEL=jina-embeddings-v3
EMBEDDINGS__TASK=retrieval.passage
EMBEDDINGS__DIMENSIONS=1024
# Chunking Configuration (Week 4)
CHUNKING__CHUNK_SIZE=600
CHUNKING__OVERLAP_SIZE=100
CHUNKING__MIN_CHUNK_SIZE=100
# Ollama LLM (Week 5)
OLLAMA_HOST=http://ollama:11434
OLLAMA__DEFAULT_MODEL=llama3.2:1b
OLLAMA__TIMEOUT=120
OLLAMA__MAX_RESPONSE_WORDS=300
# Services
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODEL=llama3.2:1b
Variable | Default | Description |
---|---|---|
DEBUG |
true |
Debug mode for development |
ARXIV__MAX_RESULTS |
15 |
Papers to fetch per API call |
ARXIV__SEARCH_CATEGORY |
cs.AI |
arXiv category to search |
PDF_PARSER__MAX_PAGES |
30 |
Max pages to process per PDF |
OPENSEARCH__INDEX_NAME |
arxiv-papers |
OpenSearch index name |
OPENSEARCH__HOST |
http://opensearch:9200 |
OpenSearch cluster endpoint |
JINA_API_KEY |
Required for Week 4 | Jina AI API key for embeddings |
CHUNKING__CHUNK_SIZE |
600 |
Target words per document chunk |
CHUNKING__OVERLAP_SIZE |
100 |
Overlapping words between chunks |
EMBEDDINGS__MODEL |
jina-embeddings-v3 |
Jina embeddings model |
OLLAMA_MODEL |
llama3.2:1b |
Local LLM model |
The configuration system automatically detects the service context:
- API Service: Uses
localhost
for database and service connections - Airflow Service: Uses Docker container hostnames (
postgres
,opensearch
)
# Configuration is automatically loaded based on context
from src.config import get_settings
settings = get_settings() # Auto-detects API vs Airflow
print(f"ArXiv max results: {settings.arxiv.max_results}")
Service | Purpose | Status |
---|---|---|
FastAPI | REST API with automatic docs | β Ready |
PostgreSQL 16 | Paper metadata and content storage | β Ready |
OpenSearch 2.19 | Hybrid search engine (BM25 + Vector) | β Ready |
Apache Airflow 3.0 | Workflow automation | β Ready |
Jina AI | Embedding generation (Week 4) | β Ready |
Ollama | Local LLM serving (Week 5) | β Ready |
Development Tools: UV, Ruff, MyPy, Pytest, Docker Compose
arxiv-paper-curator/
βββ src/ # Main application code
β βββ main.py # FastAPI application
β βββ routers/ # API endpoints
β β βββ ping.py # Health check endpoints
β β βββ papers.py # Paper retrieval endpoints
β β βββ hybrid_search.py # π NEW: Week 4 hybrid search endpoints
β βββ models/ # Database models (SQLAlchemy)
β βββ repositories/ # Data access layer
β βββ schemas/ # Pydantic validation schemas
β β βββ api/ # API request/response schemas
β β β βββ health.py # Health check schemas
β β β βββ search.py # Search request/response schemas
β β βββ arxiv/ # arXiv data schemas
β β βββ pdf_parser/ # PDF parsing schemas
β β βββ database/ # π NEW: Database configuration schemas
β β βββ indexing/ # π NEW: Week 4 chunking schemas
β β βββ embeddings/ # π NEW: Week 4 embedding schemas
β βββ services/ # Business logic
β β βββ arxiv/ # arXiv API client
β β βββ pdf_parser/ # Docling PDF processing
β β βββ opensearch/ # OpenSearch integration
β β β βββ client.py # Unified search client (BM25 + Vector + Hybrid)
β β β βββ factory.py # Client factory pattern
β β β βββ index_config_hybrid.py # π NEW: Week 4 hybrid index configuration
β β β βββ query_builder.py # BM25 query construction
β β βββ indexing/ # π NEW: Week 4 document processing
β β β βββ text_chunker.py # Section-based chunking strategy
β β β βββ hybrid_indexer.py # Document indexing with embeddings
β β β βββ factory.py # Indexing service factory
β β βββ embeddings/ # π NEW: Week 4 embedding services
β β β βββ jina_client.py # Jina AI embedding service
β β β βββ factory.py # Embedding service factory
β β βββ metadata_fetcher.py # Complete ingestion pipeline
β β βββ ollama/ # Ollama LLM service
β βββ db/ # Database configuration
β βββ config.py # Environment configuration
β βββ dependencies.py # Dependency injection
β
βββ notebooks/ # Learning materials
β βββ week1/ # Week 1: Infrastructure setup
β β βββ week1_setup.ipynb # Complete setup guide
β βββ week2/ # Week 2: Data ingestion
β β βββ week2_arxiv_integration.ipynb # Data pipeline guide
β βββ week3/ # Week 3: Keyword search
β β βββ week3_opensearch.ipynb # OpenSearch & BM25 guide
β βββ week4/ # Week 4: Chunking & hybrid search
β βββ week4_hybrid_search.ipynb # Complete hybrid search guide
β βββ README.md # Week 4 implementation documentation
β
βββ airflow/ # Workflow orchestration
β βββ dags/ # Workflow definitions
β β βββ arxiv_ingestion/ # arXiv ingestion modules
β β βββ arxiv_paper_ingestion.py # Main ingestion DAG
β βββ requirements-airflow.txt # Airflow dependencies
β
βββ tests/ # Comprehensive test suite
βββ static/ # Assets (images, GIFs)
βββ compose.yml # Service orchestration
Endpoint | Method | Description | Week |
---|---|---|---|
/health |
GET | Service health check | Week 1 |
/api/v1/papers |
GET | List stored papers | Week 2 |
/api/v1/papers/{id} |
GET | Get specific paper | Week 2 |
/api/v1/search |
POST | BM25 keyword search | Week 3 |
/api/v1/hybrid-search/ |
POST | Hybrid search (BM25 + Vector) | Week 4 |
API Documentation: Visit http://localhost:8000/docs for interactive API explorer
# View all available commands
make help
# Quick workflow
make start # Start all services
make health # Check all services health
make test # Run tests
make stop # Stop services
Command | Description |
---|---|
make start |
Start all services |
make stop |
Stop all services |
make restart |
Restart all services |
make status |
Show service status |
make logs |
Show service logs |
make health |
Check all services health |
make setup |
Install Python dependencies |
make format |
Format code |
make lint |
Lint and type check |
make test |
Run tests |
make test-cov |
Run tests with coverage |
make clean |
Clean up everything |
# If you prefer using commands directly
docker compose up --build -d # Start services
docker compose ps # Check status
docker compose logs # View logs
uv run pytest # Run tests
Who | Why |
---|---|
AI/ML Engineers | Learn production RAG architecture beyond tutorials |
Software Engineers | Build end-to-end AI applications with best practices |
Data Scientists | Implement production AI systems using modern tools |
Common Issues:
- Services not starting? Wait 2-3 minutes, check
docker compose logs
- Port conflicts? Stop other services using ports 8000, 8080, 5432, 9200
- Memory issues? Increase Docker Desktop memory allocation
Get Help:
- Check the comprehensive Week 1 notebook troubleshooting section
- Review service logs:
docker compose logs [service-name]
- Complete reset:
docker compose down --volumes && docker compose up --build -d
This course is completely free! You'll only need minimal costs for optional services:
- Local Development: $0 (everything runs locally)
- Optional Cloud APIs: ~$2-5 for external LLM services (if chosen)
Begin with the Week 1 setup notebook and build your first production RAG system!
For learners who want to master modern AI engineering
Built with love by Jam With AI
MIT License - see LICENSE file for details.