Gitartha Engine

Public REST API for serving Bhagavad Gita chapters and verses with English/Hindi translations and semantic search capabilities.

Architecture

Go Monolith: Handles all API logic and business operations
PostgreSQL + pgvector: Stores verses and vector embeddings for semantic search
Python ML Service: Minimal service for real-time embedding generation
Semantic Search: AI-powered verse search using vector similarity

1. Prerequisites

Go 1.22+
PostgreSQL 14+ with pgvector extension
Python 3.8+ (for ML service)
golang-migrate CLI (for database migrations)

2. Repository Setup

git clone [email protected]:devangb3/Gitartha-Engine.git
cd Gitartha-Engine
go mod tidy

3. Environment Configuration

Create a .env file in the project root:

cat <<'ENV' > .env
DATABASE_URL=postgres://<user>:<password>@localhost:5432/gitartha?sslmode=disable
PORT=8186
ENV=development
LOG_LEVEL=info
ML_SERVICE_URL=http://localhost:5001
ENV

The database name (gitartha in the example) is defined inside the DATABASE_URL.
Ensure the referenced database already exists in PostgreSQL (createdb gitartha).
ML_SERVICE_URL points to the Python ML service for embedding generation.

4. Database Setup

Install pgvector Extension

First, install the pgvector extension in PostgreSQL:

# Ubuntu/Debian
sudo apt install postgresql-14-pgvector

# Or compile from source: https://github.com/pgvector/pgvector

Run Migrations

Apply the database schema including vector embeddings:

make migrate-up

This creates the verse_embeddings table with pgvector support. Use make migrate-down to roll back.

5. Data Ingestion

Load Verses

Run the Go ingestion CLI to load verses:

go run ./cmd/ingest --csv bg.csv

This reads bg.csv, upserts chapters/verses, and updates verse_count totals.

Generate Vector Embeddings

Generate embeddings for semantic search:

cd scripts
python generate_embeddings_pgvector.py

This creates vector embeddings for all verses using the all-MiniLM-L6-v2 model and stores them in PostgreSQL.

6. Running the Services

Start Python ML Service

The ML service provides embedding generation for semantic search:

cd internal/ml-service
source venv/bin/activate
pip install -r requirements.txt
python app_pgvector.py

The service runs on http://localhost:5001 and provides:

POST /embed - Generate embeddings for text queries
GET /health - Health check

Start Go API Server

make run

Output example:

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
[GIN-debug] GET    /healthz                  --> ... (*handler*).health
[GIN-debug] GET    /api/v1/chapters          --> ...
[GIN-debug] GET    /api/v1/semantic-search   --> ... (*handler*).semanticSearch
...

Visit http://localhost:8186/healthz to confirm the service is healthy.

7. API Overview

Core Endpoints

GET /api/v1/chapters — List all chapters.
GET /api/v1/chapters/{chapter} — Chapter metadata + verses.
GET /api/v1/chapters/{chapter}/verses/{verse} — Specific verse with translations.
GET /api/v1/search?query=term&lang=en|hi — Keyword search (English/Hindi).
GET /api/v1/random — Random verse.

Semantic Search

GET /api/v1/semantic-search?query=text&limit=5 — AI-powered semantic search using vector similarity.

Interactive API Documentation

The API includes interactive Swagger/OpenAPI documentation:

Swagger UI: Visit http://localhost:8186/swagger/index.html for interactive API documentation
OpenAPI Spec: http://localhost:8186/swagger/doc.json (JSON format)
OpenAPI YAML: http://localhost:8186/swagger/swagger.yaml (YAML format)

Use tools like curl, Postman, or httpie to exercise the endpoints:

curl http://localhost:8186/api/v1/chapters/1/verses/1

8. Testing

Run unit tests (includes database layer tests with sqlmock):

make test

Or directly:

go test ./...

9. Project Layout (high level)

cmd/api              # HTTP server entrypoint
cmd/ingest           # Data ingestion CLI
internal/config      # Configuration loading (Viper)
internal/db          # PostgreSQL connection helper
internal/data        # DB store for chapters/verses + semantic search
internal/http        # Gin router & handlers
internal/search      # ML client for embedding generation
internal/ml-service  # Python ML service (embedding generation)
migrations           # Database schema migrations (includes pgvector)
scripts              # Embedding generation scripts

10. Performance & Architecture

Semantic Search Flow

User Query → Go API receives text query
Embedding Generation → Python ML service converts text to vector
Vector Search → Go queries PostgreSQL pgvector for similar verses
Result Enrichment → Go fetches full verse data and combines with similarity scores

Performance Benefits

40-50% faster than Python-based search
Direct SQL queries using pgvector's optimized IVFFlat indexing
Scalable architecture with PostgreSQL handling vector operations
Minimal Python footprint - only used for embedding generation

11. Next Steps

Containerize (Docker Compose for API + Postgres + ML service)
Add query caching for frequently searched terms
Consider pure Go implementation with ONNX runtime

Acknowledgements

Special thanks to JDhruv14 for providing the JDhruv14/Bhagavad-Gita_Dataset, which serves as the foundational dataset for this project.

Questions or issues? Open an issue in the GitHub repository or add to the docs.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cmd		cmd
docs		docs
frontend		frontend
internal		internal
migrations		migrations
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Transliteration.ipynb		Transliteration.ipynb
bg.csv		bg.csv
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gitartha Engine

Architecture

1. Prerequisites

2. Repository Setup

3. Environment Configuration

4. Database Setup

Install pgvector Extension

Run Migrations

5. Data Ingestion

Load Verses

Generate Vector Embeddings

6. Running the Services

Start Python ML Service

Start Go API Server

7. API Overview

Core Endpoints

Semantic Search

Interactive API Documentation

8. Testing

9. Project Layout (high level)

10. Performance & Architecture

Semantic Search Flow

Performance Benefits

11. Next Steps

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

devangb3/Gitartha-Engine

Folders and files

Latest commit

History

Repository files navigation

Gitartha Engine

Architecture

1. Prerequisites

2. Repository Setup

3. Environment Configuration

4. Database Setup

Install pgvector Extension

Run Migrations

5. Data Ingestion

Load Verses

Generate Vector Embeddings

6. Running the Services

Start Python ML Service

Start Go API Server

7. API Overview

Core Endpoints

Semantic Search

Interactive API Documentation

8. Testing

9. Project Layout (high level)

10. Performance & Architecture

Semantic Search Flow

Performance Benefits

11. Next Steps

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages