| Metadata | Value |
|---|---|
| Status | Active |
| Version | 1.2.0 |
| Last Updated | 2026-02-28 |
| Author | Sangeetha Grantha Team |
Status: Draft
Version: 0.4
Owners: Sangita Grantha Data & Backend Team
This document describes the authoritative PostgreSQL schema for Sangita Grantha.
The schema is designed to:
- Accurately model Carnatic musical compositions across forms
- Preserve musicological structure (sections, ragas, tala)
- Support notation-centric forms such as Varnams and Swarajathis
- Enable fast search, editorial workflows, and data provenance
- Align with Kotlin Multiplatform shared DTOs
This document is the single source of truth for how musical concepts are persisted.
-
Musical Form Awareness
Every composition is classified by amusical_form, which determines:- Required sections
- Presence of notation
- Validation rules
-
Lyrics and Notation Are Distinct
- Lyrics are language/script specific
- Notation is performance- and tala-specific
Swara notation MUST NOT be stored in lyric tables
-
Normalized, Editorially Safe Schema
- No denormalized blobs for core musical data
- All mutations audited
- Imported data never auto-published
-
Read-Optimized for Mobile
- Ordered sections and notation rows
- Deterministic rendering
- Trigram indexes for lyric search
Defined in 01__baseline-schema-and-types.sql.
draft
in_review
published
archived
Controls editorial lifecycle.
sa – Sanskrit
ta – Tamil
te – Telugu
kn – Kannada
ml – Malayalam
hi – Hindi
en – English
devanagari
tamil
telugu
kannada
malayalam
latin
KRITHI
VARNAM
SWARAJATHI
Defines structural and notation requirements.
pending
in_review
mapped
rejected
discarded
Canonical list of composers.
Key fields:
name,name_normalizedbirth_year,death_yearplace,notes
Supports janya/melakarta hierarchy.
Key fields:
name,name_normalizedmelakarta_numberparent_raga_idarohanam,avarohanam
Key fields:
name,name_normalizedanga_structurebeat_count
Canonical deity registry.
Supports kṣetram attribution.
Includes:
- Location (city/state/country)
- Primary deity
- Geo-coordinates
- Multilingual names via
temple_names
Represents all compositions, regardless of musical form.
Key fields:
title,title_normalized,incipit,incipit_normalizedcomposer_idmusical_formprimary_languageprimary_raga_idtala_iddeity_idtemple_idis_ragamalikaworkflow_state
Rules
- Public APIs expose only
published - Musical form drives validation and UI behavior
Supports ragamalika.
- Ordered list of ragas
- Optional section-level association
Defines the structural grammar of a composition.
Supported section_type values:
PALLAVI
ANUPALLAVI
CHARANAM
MUKTAAYI_SWARAM
CHITTASWARAM
JATHI
SAMASHTI_CHARANAM
SWARA_SAHITYA
MADHYAMA_KALA
OTHER
Each section:
- Belongs to a single composition
- Has an explicit
order_index
VARNAM
- Pallavi + Anupallavi mandatory
- Exactly one Muktaayi Swaram
- At least one Chittaswaram
SWARAJATHI
- Alternating Jathi and Sahitya sections
- Tala alignment mandatory
Represents language/script-specific lyric versions.
Key fields:
languagescripttransliteration_schemesampradaya_id(optional)variant_labelsource_reference
Stores section-wise lyric text.
Key points:
- One row per (variant × section)
- Supports normalized text for search
- Indexed for substring queries
Notation is first-class and independent of lyrics.
Represents a notation interpretation of a composition.
Key fields:
notation_type(SWARA|JATHI)tala_idkalaieduppu_offset_beatsvariant_labelsource_referenceis_primary
Examples:
- Lalgudi bani
- SSP notation
- Shivkumar.org transcription
Stores line-by-line notation, aligned to sections.
Each row:
- Belongs to a notation variant
- Belongs to a section
- Has an
order_index
Fields:
swara_textsahitya_text(optional)tala_markers
This models:
- Avartanam boundaries
- Swara–sahitya alignment
- Multiple cycles per section
Notation variants are stored independently of lyric variants to keep lyrics and swara notation separate.
Controlled taxonomy:
- Bhava
- Festival
- Philosophy
- Kṣetra
- Style
Supports:
- Pathantaram
- Bani
- School
Used by:
- Lyric variants
- Notation variants
Tracks provenance of scraped data.
Source Authority Enhancement (Migration 23):
source_tier— Authority level 1–5 (1 = scholarly/published, 5 = individual blogs). See Source Authority Hierarchy.supported_formats— Array of supported formats (HTML,PDF,DOCX, etc.)composer_affinity— JSONB map of composer → weight (e.g.,{"dikshitar": 1.0}for guruguha.org)last_harvested_at— Timestamp of most recent harvest from this source
Stores raw imported records.
Rules:
- Never auto-published
- Must be reviewed and mapped
- Retains original raw text and metadata
Format Tracking:
source_format— Format of source document (HTML,PDF,DOCX,IMAGE). Default:HTML.page_range— For PDF sources, the specific pages extracted (e.g.,42-43).
Links each Krithi to all sources that contributed data, with per-source extraction metadata and confidence scores. Supports the multi-source provenance model.
Key columns:
krithi_id— FK tokrithisimport_source_id— FK toimport_sourcessource_url,source_format,extraction_methodpage_range— For PDFs (e.g.,42-43)confidence— Extraction confidence scorecontributed_fields— Array of field names this source asserted (e.g.,{title, raga, tala, sections})raw_extraction— Full extraction payload (JSONB) for audit/replay
Indexes: krithi_id, import_source_id.
Audit trail for cross-source structural voting decisions. When multiple sources provide section structures for the same Krithi, the Structural Voting Engine records the outcome.
Key columns:
krithi_id— FK tokrithisparticipating_sources— JSONB array of{sourceId, tier, sectionStructure}consensus_structure— JSONB array of{type, order, label}(the winning structure)consensus_type—UNANIMOUS,MAJORITY,AUTHORITY_OVERRIDE, orMANUALconfidence—HIGH,MEDIUM, orLOWdissenting_sources— JSONB array of sources that disagreedreviewer_id— Optional FK tousers(for manual overrides)
Database-backed work queue for Kotlin ↔ Python integration. The Kotlin backend writes extraction requests; the Python PDF extraction service polls, processes, and writes results back. Uses SELECT ... FOR UPDATE SKIP LOCKED for exactly-once processing.
Key columns:
import_batch_id,import_task_run_id— FKs for orchestration contextsource_url,source_format,source_name,source_tier— Request metadatarequest_payload— JSONB extraction parameters (written by Kotlin)status— Enum:PENDING→PROCESSING→DONE/FAILED/CANCELLEDresult_payload— JSONB array ofCanonicalExtractionDto(written by Python)result_count,extraction_method,extractor_version,confidence,duration_ms— Result metadataattempts,max_attempts— Retry trackingsource_checksum,cached_artifact_path— Artifact tracking
Indexes: Partial indexes on status = 'PENDING' and status = 'DONE' for efficient polling.
See Extraction Queue Architecture for the integration pattern.
Immutable audit trail.
Captures:
- Actor
- Action
- Entity
- Field-level diff (JSONB)
- Timestamp
All admin mutations MUST write an audit entry.
PL/pgSQL function that converts IAST diacritics to ASCII equivalents for search normalisation. Used to populate title_normalized and incipit_normalized columns, enabling ASCII search queries against diacritics-rich Carnatic music titles (e.g., searching "akhilandesvari" matches "akhilāṇḍeśvaryai").
Mappings include:
- Macron vowels: ā→a, ī→i, ū→u, ē→e, ō→o
- Retroflex/palatal consonants: ṭ→t, ḍ→d, ṇ→n, ṅ→n, ñ→n, ś→s, ṣ→s
- Anusvara/visarga: ṃ→m, ḥ→h
- Chandrabindu and other marks stripped
- Trigram indexes on lyric text
- Normalized name indexes on reference data (
title_normalized,incipit_normalized,name_normalized) - Ordered indexes on sections and notation rows
Target:
- p95 lyric search < 300ms
- Deterministic rendering for mobile clients
Shared DTOs in modules/shared/domain mirror this schema:
ComposerDto,RagaDto,TalaDto,DeityDto,TempleDto.KrithiDto,KrithiRagaDto,KrithiLyricVariantDto.KrithiSectionDto,KrithiLyricSectionDto.KrithiNotationVariantDto,KrithiNotationRowDto.TagDto,KrithiTagDto,SampradayaDto,TempleNameDto.CanonicalExtractionDto— Universal extraction format for all source adapters (PDF, HTML, DOCX). Seemodules/shared/domain/.../import/CanonicalExtractionDto.kt.
Any schema change MUST:
- Add a migration
- Update DTOs
- Update this document
| Migration | Table/Columns | Purpose |
|---|---|---|
| 23 | import_sources + 4 columns |
Source authority tiers, supported formats, composer affinity |
| 24 | krithi_source_evidence (new) |
Per-source provenance tracking for each Krithi |
| 25 | structural_vote_log (new) |
Cross-source structural voting audit trail |
| 26 | import_task_run + 2 columns |
Source format and page range tracking |
| 27 | extraction_queue (new) |
Database-backed work queue for Kotlin ↔ Python extraction |
See Krithi Data Sourcing Quality Strategy for full schema design rationale.
Planned but out of scope for v1:
- Audio / notation synchronization
- Tala animation
- Line-level gamaka annotations
- Public user annotations
source_documents,extraction_runs,field_assertionstables — Per-field provenance tracking (planned for medium-term, see Strategy §7.2)
Sangita Grantha stores Carnatic music as it is taught, learned, and performed — not merely as text.
This document is authoritative for schema design.