Bilingual AI chatbot helping Indian farmers with disease detection, mandi prices, and government schemes. Supports English, Hindi, and Hinglish through a conversational interface powered by a ReAct agent.
| Disease Detection | Mandi Prices | Government Schemes |
|---|---|---|
| Upload a crop photo and get diagnosis with treatment recommendations | Ask for real-time market prices by commodity and location | Ask about eligibility, benefits, and application steps for agricultural schemes |
| Powered by GPT-4o vision | Powered by data.gov.in API | Powered by RAG over scheme PDFs with Qdrant |
graph LR
UI[Streamlit UI] --> API[FastAPI Backend]
API --> Agent[ReAct Agent Orchestrator]
Agent --> Disease[Disease Detector<br/>GPT-4o Vision]
Agent --> Mandi[Mandi Price Tool<br/>data.gov.in API]
Agent --> Schemes[Scheme Retriever<br/>Qdrant RAG]
Schemes --> Qdrant[(Qdrant)]
Mandi --> DataGov[(data.gov.in)]
Disease --> OpenAI[(OpenAI API)]
- Python 3.11+
- uv package manager (
curl -LsSf https://astral.sh/uv/install.sh | sh) - Docker (for running Qdrant locally)
- OpenAI API key (requires an OpenAI account)
git clone https://github.com/thedatamonk/farmer-buddy.git
cd farmer-buddy
uv synccp .env.example .env
# Edit .env and set:
# OPENAI_API_KEY=sk-...
# MANDI_API_KEY=... (optional, for live mandi prices)docker run -d -p 6333:6333 qdrant/qdrant# Place PDF files in data/schemes/, then:
uv run python scripts/index_schemes.py
# Force re-index all PDFs (replaces existing chunks):
uv run python scripts/index_schemes.py --forceNote: Scheme PDFs are not included in the repository. You must supply your own PDF documents in
data/schemes/before indexing. The script automatically skips PDFs that are already indexed — use--forceto re-index everything.
uv run uvicorn kisan.api.main:app --reload --port 8080From a new terminal window
uv run streamlit run ui/app.py| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/api/v1/chat |
Send a message (text and/or image) |
GET |
/api/v1/sessions/{session_id} |
Get conversation history |
DELETE |
/api/v1/sessions/{session_id} |
Delete a session |
POST |
/api/v1/search/chunks |
Search scheme chunks with metadata and similarity scores |
Interactive docs available at http://localhost:8080/docs when the server is running.
Chunk Search Examples
The /api/v1/search/chunks endpoint returns raw retrieved chunks without LLM answer generation. This is useful for debugging retrieval quality and building custom UIs on top of the indexed data.
Note: Documents must be indexed first (see step 4 in Quick Start above).
Basic query:
curl -X POST http://localhost:8080/api/v1/search/chunks \
-H "Content-Type: application/json" \
-d '{"query": "crop insurance premium subsidy"}'Limit results with top_k:
curl -X POST http://localhost:8080/api/v1/search/chunks \
-H "Content-Type: application/json" \
-d '{"query": "crop insurance premium subsidy", "top_k": 3}'Filter by scheme name:
curl -X POST http://localhost:8080/api/v1/search/chunks \
-H "Content-Type: application/json" \
-d '{"query": "premium subsidy", "scheme_name": "PMFBY"}'Filter by section header:
curl -X POST http://localhost:8080/api/v1/search/chunks \
-H "Content-Type: application/json" \
-d '{"query": "who can apply", "section_header": "Eligibility Criteria"}'Response fields:
| Field | Description |
|---|---|
content |
The text content of the chunk |
source |
Source PDF filename |
score |
Similarity score (higher is more relevant) |
scheme_name |
Name of the government scheme |
section_header |
Section heading the chunk belongs to |
section_hierarchy |
Full hierarchy of section headings |
content_type |
Type of content (e.g. text, table) |
page_numbers |
Page numbers in the source PDF |
Chat Query Examples
The /api/v1/chat endpoint uses an LLM-powered query analysis pipeline that rewrites queries, decomposes multi-part questions, and extracts metadata filters automatically. Here are sample queries to test different capabilities:
Simple single-scheme query:
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is PM-KISAN?"}'Targeted section query (triggers section_header filter extraction):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "What documents are needed to apply for KCC?"}'Multi-part question (triggers sub-query decomposition):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the eligibility and benefit amount of PM-KISAN?"}'Cross-scheme comparison (triggers multiple sub-queries):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "Compare the premium rates of PMFBY and the interest subvention under KCC"}'Colloquial / Hinglish query (tests query rewriting):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "mujhe fasal bima ke liye kaise apply karna hai?"}'Broad exploratory query (no specific scheme, tests rewriting without filters):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "Which government schemes provide subsidies for irrigation equipment?"}'Specific + complex (triggers scheme filter + section filter + rewriting):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the funding pattern between central and state governments for RKVY?"}'src/kisan/
├── agent/ # ReAct agent orchestrator
│ ├── orchestrator.py # Agent loop & tool dispatch
│ ├── prompts.py # System prompts
│ ├── memory.py # Conversation memory
│ └── tools.py # Tool definitions
├── api/ # FastAPI application
│ ├── main.py # App entrypoint & lifespan
│ ├── dependencies.py # Dependency injection
│ └── routes/ # Route handlers (chat, health)
├── core/ # Shared configuration
│ ├── config.py # Settings (pydantic-settings)
│ ├── exceptions.py # Custom exceptions
│ └── logging.py # Loguru setup
├── modules/ # Feature modules
│ ├── disease/ # GPT-4o vision disease detector
│ ├── mandi/ # Mandi price client, parser, scheduler
│ └── schemes/ # Scheme RAG (embeddings, indexer, retriever)
├── schemas/ # Pydantic models (chat, disease, mandi, scheme)
├── services/ # Infrastructure services (LLM, session, vectordb, database)
└── utils/ # Utilities
├── pdf.py # High-level PDF-to-chunks pipeline
├── pdf_parser.py # Docling PDF → Markdown conversion
└── chunker.py # Token-aware Markdown chunking
Docker
Build and run the API container:
docker build -t kisan-api .
docker run -p 8080:8080 \
-e OPENAI_API_KEY=sk-... \
-e QDRANT_URL=http://host.docker.internal:6333 \
kisan-apiBuild and run the UI container:
docker build -f Dockerfile.ui -t kisan-ui .
docker run -p 8501:8501 \
-e API_URL=http://host.docker.internal:8080/api/v1 \
kisan-uiRender
Create a free cluster at Qdrant Cloud:
- Sign up / log in at cloud.qdrant.io
- Create a new Free Tier cluster (choose the region closest to your Render services)
- Note your Cluster URL (e.g.
https://abc-123.us-east4-0.gcp.cloud.qdrant.io) - Generate an API key from the cluster dashboard
You'll use these values for QDRANT_URL and QDRANT_API_KEY in the steps below.
The project includes a render.yaml Blueprint for one-click deployment.
| Service | Type | Plan |
|---|---|---|
kisan-api |
Web Service (Docker) | Free |
kisan-ui |
Web Service (Docker) | Free |
kisan-db |
PostgreSQL | Free |
Required environment variables on Render:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
MANDI_API_KEY |
data.gov.in API key (optional) |
QDRANT_URL |
Qdrant Cloud cluster URL |
QDRANT_API_KEY |
Qdrant Cloud API key |
QDRANT_COLLECTION |
Qdrant collection name (e.g. kisan_schemes) |
DATABASE_URL |
Auto-set from kisan-db |
The deployed API queries Qdrant Cloud but does not run indexing itself. Run the indexing script from your local machine, pointed at your Qdrant Cloud instance:
# 1. Place scheme PDFs in data/schemes/
ls data/schemes/
# scheme1.pdf scheme2.pdf ...
# 2. Run the indexer against Qdrant Cloud
OPENAI_API_KEY="sk-..." \
QDRANT_URL="https://your-cluster.cloud.qdrant.io" \
QDRANT_API_KEY="your-qdrant-cloud-api-key" \
QDRANT_COLLECTION="kisan_schemes" \
uv run python scripts/index_schemes.py
# 3. To re-index after updating a PDF, just run again — only new/changed PDFs are processed
# Use --force to re-index all PDFs from scratch
uv run python scripts/index_schemes.py --forceUse the same QDRANT_URL, QDRANT_API_KEY, and QDRANT_COLLECTION values you configured on Render. The script generates embeddings via OpenAI and uploads them directly to Qdrant Cloud. Already-indexed PDFs are skipped unless --force is used. Once complete, the deployed app can serve scheme queries immediately.
The project has a 4-phase test strategy: deterministic unit tests, retrieval quality evaluation, generator quality evaluation, and end-to-end agent evaluation.
# Run unit + integration tests (no external services needed)
uv run pytest tests/unit/ tests/integration/ -v
# Run all evaluation tests (requires Qdrant + OpenAI key)
uv run pytest tests/evaluation/ -m eval -v -s
# Run everything
uv run pytest tests/ -vSee TESTING.md for the full testing and evaluation guide.
| Layer | Technology |
|---|---|
| Language | Python 3.11+ |
| Package Manager | uv |
| Web Framework | FastAPI |
| UI | Streamlit |
| LLM | OpenAI GPT-4o (text + vision) |
| Embeddings | OpenAI Embeddings |
| Vector Database | Qdrant |
| PDF Processing | Docling (structure-aware), PyMuPDF (fallback) |
| Token Counting | tiktoken |
| Market Data | data.gov.in API |
| Evaluation | DeepEval |
| Database | PostgreSQL (asyncpg) |
| Scheduling | APScheduler |
| Logging | Loguru |
| Linting | Ruff |
| Deployment | Render / Docker |
MIT