Replies: 4 comments 2 replies
-
|
Thanks for writing this up, would be a great project for automatically generating word-by-word glosses suitable for translation help, chanting, memorization, etc. 😊 I'm a little confused by your technical note about full DB access for the AI. Shouldn't just giving them the relevant records in the context window be good enough? Why give the model the ability to populate its own context window with extra information? From the research I've seen, irrelevant context can significantly degrade performance. |
Beta Was this translation helpful? Give feedback.
-
|
Bhante @bdhrs , could you please describe in a few sentences how you would approach this task and what you would start with? |
Beta Was this translation helpful? Give feedback.
-
|
Just as an experiment, I fed the link for this conversation into Gemini CLI conductor and it produced a pretty useful working version in minutes. Please feel free to refine the database calls and prompts for the LLM to exactly suit your needs. At the moment it's just using a free Openrouter model You can test it out by running uv run python exporter/mcp/ai_pali_translate.pyThe output will be appear in the terminal, and get saved in markdown format to the This is a typical example of the output: Analysis of: tatra, bhikkhave, ye te makkaṭā abālajātikā alolajātikā, te taṃ lepaṃ disvā ārakā parivajjanti.Of course. Here is the translation and analysis of the provided Pāḷi sentence. English TranslationFluent Translation: Literal Translation: Word-by-Word Analysis
Grammatical Commentary
|
Beta Was this translation helpful? Give feedback.
-
|
It already looks quite good! Next steps may include: Pali analysis and formatting-related:
GUI-related:
For the online format:
Further developmentAI powered tool which go through text and try to attribute each word to DPD entries, pointing out those without examples or even missing meanings. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It is a continuation of the sasanarakkha#25 and this discussion
Create a tool that analyzes Pali text, maps each word to the Digital Pali Dictionary (DPD) database entries, and generates contextualized translations.
Input: Pali text (sentences/paragraphs)
Process:
1. Tokenize input word by word
2. Match words against DPD database using inflections column for declensions/conjugations
3. Handle compound words which not yet in db, using lookup table’s deconstructor column
4. Use AI (Gemini/DeepSeek/OpenRouter) to disambiguate word senses when multiple meanings exist (e.g., buddha_1 vs buddha_2)
5. Select appropriate id from dpd_headword table based on context
Database Structure:
• SQLite database
• dpd_headword table: id, lemma_1, pos, grammar, meaning_1, inflections etc.
• lookup table: lookup_key, deconstructor
Output:
Detailed Table:
• Original Pali sentence
• Table with columns: word | id | pos | grammar | meaning
• English translation of the sentence using meaning_1 with context
CSV Export:
• All unique words from analyzed text
• Columns: id, lemma_1, pos, meaning_1
Technical Notes
• AI should be able to read full database for context-aware disambiguation
• All Tipitaka words have corresponding constructions in lookup table no
Beta Was this translation helpful? Give feedback.
All reactions