Skip to content

A high-performance REST API for the Bhagavad Gita, built with Go and PostgreSQL. Features include full text search, semantic search using vector embeddings (pgvector), and a React frontend.

License

Notifications You must be signed in to change notification settings

devangb3/Gitartha-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gitartha Engine

Public REST API for serving Bhagavad Gita chapters and verses with English/Hindi translations and semantic search capabilities.

Architecture

  • Go Monolith: Handles all API logic and business operations
  • PostgreSQL + pgvector: Stores verses and vector embeddings for semantic search
  • Python ML Service: Minimal service for real-time embedding generation
  • Semantic Search: AI-powered verse search using vector similarity

1. Prerequisites

  • Go 1.22+
  • PostgreSQL 14+ with pgvector extension
  • Python 3.8+ (for ML service)
  • golang-migrate CLI (for database migrations)

2. Repository Setup

git clone [email protected]:devangb3/Gitartha-Engine.git
cd Gitartha-Engine
go mod tidy

3. Environment Configuration

Create a .env file in the project root:

cat <<'ENV' > .env
DATABASE_URL=postgres://<user>:<password>@localhost:5432/gitartha?sslmode=disable
PORT=8186
ENV=development
LOG_LEVEL=info
ML_SERVICE_URL=http://localhost:5001
ENV
  • The database name (gitartha in the example) is defined inside the DATABASE_URL.
  • Ensure the referenced database already exists in PostgreSQL (createdb gitartha).
  • ML_SERVICE_URL points to the Python ML service for embedding generation.

4. Database Setup

Install pgvector Extension

First, install the pgvector extension in PostgreSQL:

# Ubuntu/Debian
sudo apt install postgresql-14-pgvector

# Or compile from source: https://github.com/pgvector/pgvector

Run Migrations

Apply the database schema including vector embeddings:

make migrate-up

This creates the verse_embeddings table with pgvector support. Use make migrate-down to roll back.

5. Data Ingestion

Load Verses

Run the Go ingestion CLI to load verses:

go run ./cmd/ingest --csv bg.csv

This reads bg.csv, upserts chapters/verses, and updates verse_count totals.

Generate Vector Embeddings

Generate embeddings for semantic search:

cd scripts
python generate_embeddings_pgvector.py

This creates vector embeddings for all verses using the all-MiniLM-L6-v2 model and stores them in PostgreSQL.

6. Running the Services

Start Python ML Service

The ML service provides embedding generation for semantic search:

cd internal/ml-service
source venv/bin/activate
pip install -r requirements.txt
python app_pgvector.py

The service runs on http://localhost:5001 and provides:

  • POST /embed - Generate embeddings for text queries
  • GET /health - Health check

Start Go API Server

make run

Output example:

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
[GIN-debug] GET    /healthz                  --> ... (*handler*).health
[GIN-debug] GET    /api/v1/chapters          --> ...
[GIN-debug] GET    /api/v1/semantic-search   --> ... (*handler*).semanticSearch
...

Visit http://localhost:8186/healthz to confirm the service is healthy.

7. API Overview

Core Endpoints

  • GET /api/v1/chapters — List all chapters.
  • GET /api/v1/chapters/{chapter} — Chapter metadata + verses.
  • GET /api/v1/chapters/{chapter}/verses/{verse} — Specific verse with translations.
  • GET /api/v1/search?query=term&lang=en|hi — Keyword search (English/Hindi).
  • GET /api/v1/random — Random verse.

Semantic Search

  • GET /api/v1/semantic-search?query=text&limit=5 — AI-powered semantic search using vector similarity.

Interactive API Documentation

The API includes interactive Swagger/OpenAPI documentation:

  • Swagger UI: Visit http://localhost:8186/swagger/index.html for interactive API documentation
  • OpenAPI Spec: http://localhost:8186/swagger/doc.json (JSON format)
  • OpenAPI YAML: http://localhost:8186/swagger/swagger.yaml (YAML format)

Use tools like curl, Postman, or httpie to exercise the endpoints:

curl http://localhost:8186/api/v1/chapters/1/verses/1

8. Testing

Run unit tests (includes database layer tests with sqlmock):

make test

Or directly:

go test ./...

9. Project Layout (high level)

cmd/api              # HTTP server entrypoint
cmd/ingest           # Data ingestion CLI
internal/config      # Configuration loading (Viper)
internal/db          # PostgreSQL connection helper
internal/data        # DB store for chapters/verses + semantic search
internal/http        # Gin router & handlers
internal/search      # ML client for embedding generation
internal/ml-service  # Python ML service (embedding generation)
migrations           # Database schema migrations (includes pgvector)
scripts              # Embedding generation scripts

10. Performance & Architecture

Semantic Search Flow

  1. User Query → Go API receives text query
  2. Embedding Generation → Python ML service converts text to vector
  3. Vector Search → Go queries PostgreSQL pgvector for similar verses
  4. Result Enrichment → Go fetches full verse data and combines with similarity scores

Performance Benefits

  • 40-50% faster than Python-based search
  • Direct SQL queries using pgvector's optimized IVFFlat indexing
  • Scalable architecture with PostgreSQL handling vector operations
  • Minimal Python footprint - only used for embedding generation

11. Next Steps

  • Containerize (Docker Compose for API + Postgres + ML service)
  • Add query caching for frequently searched terms
  • Consider pure Go implementation with ONNX runtime

Acknowledgements

Special thanks to JDhruv14 for providing the JDhruv14/Bhagavad-Gita_Dataset, which serves as the foundational dataset for this project.

Questions or issues? Open an issue in the GitHub repository or add to the docs.

About

A high-performance REST API for the Bhagavad Gita, built with Go and PostgreSQL. Features include full text search, semantic search using vector embeddings (pgvector), and a React frontend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published