The code in this repository is mostly AI-generated, though the source data is not.
This repository contains the YAML to Javascript build pipeline for Bible Passage Reference Parser language data.
data/_defaults.yaml: shared defaults for variables/options.data/*.yaml: per-language data (ISO 639-3 codes likeeng,zho).translation_systems/*.yaml: versification systems and translation aliases.src/: TypeScript source for building language output.bin/compile.sh: bundles TypeScript CLI tools insrc/tobin/*.js.bin/build_spec.js: builds localized-book Jasmine specs frombook_names/all/.bin/build_all_langs.js: builds all languages in parallel and runs specs for each language.lang/: generated output files (optional; regenerate as needed).book_names/all/: generated book-name lists used to build tests.book_names/preferred/: preferred display names (default + optional translation overrides).test/: generated localized-book specs.
npm install
bin/compile.sh
node bin/build_lang.js eng
Build a cross-language parser:
node bin/build_lang.js --cross eng spa --out eng_spa
node bin/build_spec.js eng_spa
Load a language module programmatically:
import load_language_code from "bible-passage-reference-parser-languages";
const lang = await load_language_code("eng");
// Reserved Windows code example:
const con = await load_language_code("con");Generate specs:
node bin/build_spec.js # all languages
node bin/build_spec.js eng # single language
Build all languages:
Cross-language options:
- `--cross`: enable cross-language build mode.
- `--out <code>`: output language code (must not be 3 characters).
- `--merge-mode append|smart`: book merge mode (default `append`).
node bin/build_all_langs.js # parallel build + specs + tests
node bin/build_all_langs.js -j 4 # set worker count
node bin/build_all_langs.js --test-only # skip lang rebuild, build specs + run tests
The build outputs three JS classes:
bcv_regexpsbcv_translationsbcv_grammar_options_default
The output matches the expected format in Bible-Passage-Reference-Parser.
- Pick an existing language as a starting point (for example
data/eng.yaml) and copy it to a new ISO 639-3 code, likedata/isl.yaml. - Update the language file contents:
variables: text tokens and patterns used by the parser (titles, next, ff, etc.).options: language-specific parsing options; any missing options fall back todata/_defaults.yaml.books: list of Bible book names/abbreviations and regex-related data for the language.ordinalsandtranslationsare optional; include them if needed for the language.
- Build and verify:
bin/compile.sh
node bin/build_lang.js isl
Notes for new languages:
- The first language file passed to
build_langsets the primaryvariablesandoptions; additional languages are only used to mergebooks. data/_defaults.yamlprovides required defaults forvariablesandoptions, so you only need to override what differs.
Used to build the core grammar and separator patterns. Values can be:
- simple strings:
- "cap." - objects for fine control:
text: string valueregexp: raw regex (no escaping)regexp_after: appended raw regex after the text/regexpnormalize: noneto skip combining-character normalization for that item
Example:
variables:
and:
- text: a
regexp_after: (?!\p{L})
- vedi
to:
- "-"
- "a"Common options (see data/_defaults.yaml for full list):
normalize:combining_characters(default) ornonetrailing_dots_in_variables:optionaloras_isexpand_characters: array of{ character, expand: [ ... ] }to allow alternates anywhere that character appears in book names or variables. Example:expand_characters: - character: "'" expand: ["'", "’"]
replace_characters_with:{ regexp, replacement }(default converts spaces to\s*)before_book_allowed_characters,after_book_allowed_characters: regex character classes used to enforce valid boundaries before/after a matched book name.before_every_book,after_every_book: regex patterns inserted immediately before/after every book match. Use these to add language-specific required prefixes/suffixes around all books (rare). These are applied in addition to the before/after allowed character boundaries.join_before,join_after: default join strings used when expandingbefore/afterbook patterns (for example the default space between an ordinal and the book name). Override to control whether the joiner is a space, empty string, punctuation, etc.
Each entry declares OSIS code(s) and the localized texts. Forms:
osis: "Gen"orosis: ["Jonah","Job"]osisobjects withbefore,after,joinfor numbered books:- osis: - osis: 1Sam before: *first - osis: 2Sam before: *second texts: - Samuel
textscan be strings or objects withtextand optionalnormalize: none, which prevents diacritics and spacing from changing.
Defines ordinal suffixes and optional Psalm handling:
ordinals:
- after: ["st"]
numbers: [1, 21, 31]
- after: ["nd"]
numbers: [2, 22, 32]
- between:
regexp: \s*
texts: ["Psalm"]node bin/build_lang.js <lang> writes book_names/all/<lang>.yaml, which is a normalized list of book texts used by bin/build_spec.js.
Names output collapses whitespace to single spaces and normalizes to NFC (combining-character variants are unified), but does not add extra variants.
book_names/preferred/<lang>.yaml documents preferred book names for display and UI. Each file has:
default: preferred names by OSIS code.translations(optional): translation-specific overrides (currently only ineng.yaml).
Example (book_names/preferred/eng.yaml):
default:
Gen:
long: Genesis
short: Gen
shorter: Ge
Ps:
long: Psalms
long_single: Psalm
short: Ps
shorter: Ps
translations:
niv:
Ps:
short_plural: Pss
1Sam:
shorter: 1SaKeys used in preferred names can include:
long: full namelong_single: singular form (e.g., Psalm vs Psalms)short: common short formshorter: shortest formshort_plural: translation-specific plural short form
Optional per-language Jasmine tests can be added to data/<lang>.yaml (see pol.yaml for an example):
tests:
- text: "Rdz 1:1"
osis: "Gen.1.1"
- it: "should handle odd spacing"
text: "Rdz 1:1"
osis: "Gen.1.1"bin/build_spec.js will emit these into test/<lang>.spec.js. Entries with it get their own it(<label>) block; entries without it are grouped under it("should handle custom tests").
lang/can be regenerated at any time; it is not a source of truth.- Language codes are ISO 639-3 (e.g.,
eng,zho). - Windows-reserved 3-letter basenames (
con,prn,aux,nul) are stored on disk with a trailing underscore (for example, logical codeconmaps tocon_.yaml/con_.js/con_.spec.js).
Mappings used by tooling and for compatibility with the older abbreviations used in Bible Passage Reference Parser:
| ISO 639-2 | ISO 639-3 | English name |
|---|---|---|
| ar | ara | Arabic |
| bg | bul | Bulgarian |
| cs | ces | Czech |
| cy | cym | Welsh |
| da | dan | Danish |
| de | deu | German |
| el | grc | Greek |
| en | eng | English |
| es | spa | Spanish |
| fa | fas | Persian |
| fi | fin | Finnish |
| fr | fra | French |
| he | heb | Hebrew |
| hi | hin | Hindi |
| hr | hrv | Croatian |
| ht | hat | Haitian Creole |
| hu | hun | Hungarian |
| id | ind | Indonesian |
| is | isl | Icelandic |
| it | ita | Italian |
| ja | jpn | Japanese |
| jv | jav | Javanese |
| kn | kan | Kannada |
| ko | kor | Korean |
| la | lat | Latin |
| lg | lug | Ganda |
| mk | mkd | Macedonian |
| mr | mar | Marathi |
| ne | nep | Nepali |
| nl | nld | Dutch |
| no | nor | Norwegian |
| ny | nya | Nyanja |
| or | ori | Odia |
| pa | pan | Punjabi |
| pl | pol | Polish |
| pt | por | Portuguese |
| ro | ron | Romanian |
| ru | rus | Russian |
| sk | slk | Slovak |
| sl | slv | Slovenian |
| so | som | Somali |
| sq | sqi | Albanian |
| sr | srp | Serbian |
| sv | swe | Swedish |
| sw | swa | Swahili |
| ta | tam | Tamil |
| te | tel | Telugu |
| th | tha | Thai |
| tl | tgl | Tagalog |
| tr | tur | Turkish |
| uk | ukr | Ukrainian |
| ur | urd | Urdu |
| vi | vie | Vietnamese |
| yo | yor | Yoruba |
| zh | zho | Chinese |
- Move spec building process into the main build_lang script so that it's a one-step process.
- Add more translation-specific versification.
- Improve English translation representation.
February 7, 2026. Add a loader function to handle file renaming for the "con" language so that this repo works on Windows.
January 31, 2026. Rework folder naming and include preferred book names from source data. Add an additional 2,100 languages from YouVersion data.
January 29, 2026. First release.