Exploring GenAI and Analytics use-cases with Small Language Models and Retrieval Models.
Small Giants is a curated collection of practical demonstrations showing how small language models (SLMs) and Retrieval models can power real-world AI applications. We focus on models in the 350M-3B parameter range, emphasizing efficiency, local inference, and structured outputs over raw scale.
- Small doesn't mean simple: Modern foundational models (LFMs) achieve remarkable performance in specialized domains
- Local-first approach: Run inference on your hardware with Ollama
- Cost-effective: Reduce API costs and improve privacy
- Practical: Built on proven architectures (DSPy, structured extraction, agent orchestration)
- Research-friendly: Playground for exploring scaling laws, Retrieval models, and few-shot learning
📄 Invoice Parser — Document Extraction
Multimodal document processing with structured extraction
Extract utility billing information (amount, currency, type) from invoice images using a two-stage pipeline:
- Stage 1: Vision-language model (Liquid AI LFM2-VL-3B) extracts text from images
- Stage 2: Compact extraction model (LFM2-1.2B-Extract) parses structured data
| Attribute | Value |
|---|---|
| Type | Document Extraction |
| Architecture | DSPy Multi-stage |
| Models | LFM2-VL-3B, LFM2-1.2B-Extract |
| UI | Streamlit |
🤖 Granite Coder — Coding Agent
Token-efficient coding agent using IBM Granite 4
A lightweight coding assistant leveraging the "Greedy" architecture (Recursive Language Models) for token-efficient code assistance. Runs locally via Ollama with MCP server support for IDE integration.
| Attribute | Value |
|---|---|
| Type | Coding Agent |
| Architecture | RLM (Recursive Language Model) |
| Models | IBM Granite 4 |
| Interface | CLI + MCP Server |
Usage:
cd granite-coder
uv sync
# CLI mode
granite-coder solve "Write a hello world function"
# Interactive chat
granite-coder chat
# MCP server mode
granite-coder mcp🔍 LangChain RAG — Retrieval-Augmented Generation
Local RAG pipeline with Qdrant vector search and RAGAS evaluation
A complete RAG benchmark demonstrating semantic retrieval with local models. Uses Qdrant for vector storage, Ollama for embeddings and generation, and RAGAS for automated evaluation of RAG metrics.
| Attribute | Value |
|---|---|
| Type | RAG Pipeline |
| Architecture | LangChain + Qdrant + RAGAS |
| Models | nomic-embed-text, gpt-oss:20b-cloud |
| Evaluation | RAGAS (faithfulness, relevancy, recall, precision) |
Usage:
cd langchain-qdrant-ollama-rag
make setup # Install deps, pull models
make run-baseline # Run RAG pipeline
make run-ragas-eval # Run evaluationPlanned use-cases exploring advanced GenAI + Analytics concepts:
- Financial Analytics Pipeline: Multi-document reasoning across invoices/receipts/statements with time-series forecasting
- Retrieval Model Framework: Develop evaluators for extraction accuracy, confidence estimation, and model comparison
- Entity Linking System: Connect extracted data to knowledge bases for enriched analysis
- Active Learning Loop: Identify high-uncertainty predictions for human review and model improvement
- Benchmark Suite: Comparative evaluation of small models vs. larger alternatives on document understanding tasks
# Clone repository
git clone https://github.com/olanigan/small-giants.git
cd small-giants
# Install project dependencies
cd dspy-liquid-agent && uv pip install -e . && cd ..
cd granite-coder && uv sync && cd ..
cd langchain-qdrant-ollama-rag && poetry install && cd ..Invoice Parser:
cd dspy-liquid-agent
make download-samples # Optional: create sample invoices
make run # Launch Streamlit app at http://localhost:8501Granite Coder:
cd granite-coder
granite-coder solve "What is 2+2?"LangChain RAG:
cd langchain-qdrant-ollama-rag
# Prerequisites: Qdrant running (docker run -p 6333:6333 qdrant/qdrant)
make setup # Install deps, pull Ollama models
make run-ragas-eval # Run RAG pipeline with evaluationAgent-based orchestration using DSPy's modular framework:
- Separates concerns: model inference, data validation, UI
- Supports swappable model providers
- Structured output enforcement via Pydantic schemas
- Extensible for chain-of-thought and optimization
User Input → Agent → Stage 1 (Vision) → Stage 2 (Extraction) → Structured Output
RAG Pipeline using LangChain components:
- Semantic search with Qdrant vector database
- Local inference with Ollama embeddings and generation
- Automated evaluation with RAGAS metrics
User Query → Embedding → Qdrant Search → Context → LLM → Answer → RAGAS Eval
| Component | Tools |
|---|---|
| Framework | DSPy, LangChain |
| Vector DB | Qdrant |
| Models | Liquid AI LFMs, IBM Granite |
| Inference | Ollama |
| UI | Streamlit |
| Validation | Pydantic |
| Evaluation | RAGAS |
| Dev Tools | Black, isort, mypy, pytest |
We welcome contributions! Whether you're adding new use-cases, improving Retrieval models, or enhancing existing pipelines:
- Fork the repository
- Create a feature branch
- Follow existing code style (see Makefile linting commands)
- Submit a pull request
See each use-case's directory for specific contribution guidelines.
If you use Small Giants in your research, please cite:
@software{small_giants_2024,
author = {Olanigan, Ibrahim},
title = {Small Giants: GenAI and Analytics with Small Language Models},
url = {https://github.com/olanigan/small-giants},
year = {2025}
}- DSPy Documentation
- LangChain Documentation
- Qdrant Documentation
- RAGAS Evaluation
- Liquid AI Models
- Ollama Getting Started
- LLM Efficiency Benchmarks
Questions or ideas? Open an issue or start a discussion in the repository.