Version: 1.1.0
Last updated: 2025-10-22
MASTER REFERENCE: Authoritative URLs for all standards. Referenced by
arabic-roman-source.mdfor romanization data tables.System Status: ✅ Complete - 1529/1529 mappings verified (100%) across 45+ integrated standards Implementation: Python mapping files in
standards_mappings/(auto-discovered)
- Reference: https://assets.publishing.service.gov.uk/media/637df1aae90e076b73e074c7/ROMANIZATION_OF_ARABIC_-_Nov_22.pdf
- Usage: Geographic names, official mapping
- Status: ✅ Integrated (48 mappings)
- Reference: https://unstats.un.org/unsd/geoinfo/ungegn/docs/pubs/UNGEGN%20tech%20ref%20manual_m87_combined.pdf
- Usage: International geographic standardization
- Status: ✅ Integrated (32 mappings)
- Reference: https://unstats.un.org/unsd/ungegn/working_groups/wg5/documents/wgrr5arabic.pdf
- Usage: Updated international standard
- Status: ℹ️ Combined with UNGEGN 1972
- Reference: http://www.loc.gov/catdir/cpso/romanization/arabic.pdf
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (58 mappings)
- Reference: https://transliteration.eki.ee/pdf/Arabic_2.2.pdf
- Usage: German academic and library systems
- Status: ✅ Integrated (51 mappings)
- Reference: https://brill.com/fileasset/downloads_static/static_fonts_simple_arabic_transliteration.pdf
- Usage: Encyclopaedia of Islam, Brill academic publications
- Status: ✅ Integrated (34 mappings)
- Note: Simple 1:1 system (one Arabic letter → one roman letter). Taa marbuta (ة) context-dependent: a, ah, āh, at, āt
- Reference: https://dmg-web.de/en/publications/transliteration-of-the-arabic-script/
- Usage: German Oriental studies standard; adopted by Hans Wehr Dictionary (German edition, 1952)
- Status: ✅ Integrated (46 mappings)
- Note: Academic precision system for reversible transliteration. Identical to Hans Wehr German edition (1952). Uses diacritics (ḥ, ḫ, ṭ, ẓ, ġ, š, ǧ, ṯ, ḏ) and special symbols (ʿ, ʾ). Aims for language-neutral, one-to-one correspondence.
- Reference: https://archive.org/details/dictionaryofmode0000wehr_t0g4
- Published: Wehr, Hans. A Dictionary of Modern Written Arabic (Arabic-English), 4th edition, ed. J. Milton Cowan (Ithaca, N.Y.: Spoken Language Services, 1994)
- Usage: English edition of Hans Wehr Dictionary for anglophone readers
- Based on: DMG system, adapted by Cowan for English readability
- Key Differences from DMG: Anglicized digraphs (š→sh, ǧ→j, ḫ→kh, ġ→gh, ṯ→th, ḏ→dh)
- Status: ✅ Integrated (42 mappings)
- Note: Prioritizes pronunciation accessibility over reversibility.
- Reference: https://cdn.standards.iteh.ai/samples/4118/2f03c828842c4055a5619c1bded39381/ISO-233-2-1993.pdf
- Usage: Simplified international standard for bibliographic use
- Status: ✅ Integrated (42 mappings)
- Note: Simplifies ISO 233:1984 for ease of typing and bibliographic processing.
- Reference: https://brill.com/fileasset/downloads_products/27684_EI3-Instructions-for-Authors.pdf
- Usage: Islamic studies reference works
- Status: ✅ Integrated (31 mappings)
- Note: Standard transliteration covering Arabic, Persian, Urdu, Hindi, and Punjabi
- Reference: https://ielp.ehe.osu.edu/files/2025/09/IPA_Eg_Arabic.pdf
- Usage: Phonetic representation for Egyptian Arabic
- Status: ✅ Integrated (32 mappings)
- Note: Includes pharyngealized consonants and allophonic variations
- Reference: https://sites.lsa.umich.edu/jalt/ipa-symbols/
- Usage: General Arabic IPA correspondences
- Status: ✅ Integrated (32 mappings)
- Note: Based on Journal of Arabic Linguistics Tradition
- Reference: https://en.wikipedia.org/wiki/Help%3AIPA/Arabic
- Usage: Classical Modern Standard Arabic IPA
- Status: ✅ Integrated (38 mappings)
- Note: Based on Ladefoged & Maddieson, Thelwall & Sa'adeddin, Mitchell, Kaye, Al-Ani
- Reference: https://cdn.standards.iteh.ai/samples/78514/749948ae77474f7fa057a6b278281dbb/ISO-233-3-2023.pdf
- Usage: International technical documentation for Persian
- Status: ✅ Integrated (36 mappings)
- Note: Strict transliteration with Persian-specific characters (پ چ ژ گ)
- Reference: https://unstats.un.org/unsd/ungegn/working_groups/wg5/documents/wgrr4persian.pdf
- Usage: UN approved system for geographical names
- Status: ✅ Integrated (38 mappings)
- Note: Broad transcription based on pronunciation
- Reference: https://assets.publishing.service.gov.uk/media/5e1eeaafe5274a4f0f57553a/ROMANIZATION_OF_PERSIAN.pdf
- Usage: US/UK geographic names standardization
- Adopted: BGN 1946, PCGN 1958, updated 2019
- Status: ✅ Integrated (38 mappings)
- Reference: https://www.loc.gov/catdir/cpso/romanization/persian.pdf
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (32 mappings)
- Reference: https://brill.com/fileasset/downloads_products/27684_EI3-Instructions-for-Authors.pdf
- Usage: Islamic studies reference works
- Status: ✅ Integrated (36 mappings)
- Note: Same as EI3 Arabic with Persian-specific characters (پ چ ژ گ)
- Reference: https://ielp.ehe.osu.edu/files/2025/09/IPA_Persian.pdf
- Usage: Phonetic representation for Persian
- Status: ✅ Integrated (30 mappings)
- Reference: https://loc.gov/catdir/cpso/roman_urdu_pushto_sindhi.html
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (53 mappings)
- Reference: https://unstats.un.org/unsd/ungegn/working_groups/wg5/documents/wgrr4urdu.pdf
- Usage: UN approved system for geographical names
- Status: ✅ Integrated (41 mappings)
- Note: Based on Hunterian transliteration system (Government of India official standard developed by William Wilson Hunter, 19th century, based on William Jones 1746-1794). Features aspirated consonants (bh, th), retroflex with dots (ṭ, ḍ), schwa deletion, long vowels (ā, ī, ū)
- Reference: https://assets.publishing.service.gov.uk/media/5ab4e64d40f0b67d64e21540/ROMANIZATION_OF_URDU.pdf
- Usage: US/UK geographic names standardization
- Status: ✅ Integrated (62 mappings)
- Reference: https://brill.com/fileasset/downloads_products/27684_EI3-Instructions-for-Authors.pdf
- Usage: Islamic studies reference works
- Status: ✅ Integrated (51 mappings)
- Note: Same as EI3 Arabic with Urdu-specific characters (ٹ ڈ ڑ ے)
- Reference: https://www.cle.org.pk/Downloads/langproc/Urdu_IPA_to_Sampa.pdf
- Usage: Phonetic representation for Urdu
- Status: ✅ Integrated (70 mappings)
- Reference: https://en.wikipedia.org/wiki/Help:IPA/Hindi_and_Urdu
- Usage: Phonetic representation for Urdu
- Status: ✅ Integrated (59 mappings)
- Reference: https://unstats.un.org/unsd/ungegn/working_groups/wg5/documents/wgrr4punjabi.pdf
- Usage: UN approved system for geographical names
- Status: ✅ Integrated (38 mappings)
- Reference: https://brill.com/fileasset/downloads_products/27684_EI3-Instructions-for-Authors.pdf
- Usage: Islamic studies reference works
- Status: ✅ Integrated (51 mappings, shared with Urdu)
- Note: Uses same character set as EI3 Urdu/Hindi
- Reference: https://en.wikipedia.org/wiki/Help:IPA/Punjabi
- Usage: Phonetic representation for Punjabi
- Status: ✅ Integrated (56 mappings)
- Reference: https://assets.publishing.service.gov.uk/media/621caa32e90e0710b73fd4ff/ROMANIZATION_KAZAKH_Feb_22_19_.pdf
- Usage: US/UK geographic names standardization
- Status: 📋 Not integrated (Cyrillic only)
- Note: Covers Cyrillic script, not Arabic script
- Reference: https://assets.publishing.service.gov.uk/media/621caa11e90e0710c30a4706/ROMANIZATION_KYRGYZ_Feb22_9_.pdf
- Usage: US/UK geographic names standardization
- Status: 📋 Not integrated (Cyrillic only)
- Note: Covers Cyrillic script, not Arabic script
- Reference: https://astanatimes.com/2021/02/kazakhstan-presents-new-latin-alphabet-plans-gradual-transition-through-2031/
- Usage: Official government Latin alphabet
- Adoption: April 2021, transition through 2031
- Status: 📋 Not integrated (Latin alphabet, not romanization)
- Note: 31 letters with diacritics (ä, ö, ü, ğ, ū, ş, ñ); based on Common Turkic Alphabet. Kazakhstan transitioning from Cyrillic to Latin script, abandoning historical Arabic script.
- Reference: https://assets.publishing.service.gov.uk/media/65f317e99d99de001d03df0c/Uyghur_romanization.pdf
- Usage: US/UK geographic names standardization for Uyghur Arabic script
- Status: ✅ Integrated (40 mappings)
- Reference: https://en.wikipedia.org/wiki/Kazakh_alphabets
- Usage: IPA for Kazakh Arabic script
- Status: ✅ Integrated (33 mappings)
- Note: Historical correspondence between Arabic script and Latin alphabet with 35 Arabic script characters
- Reference: https://en.wikipedia.org/wiki/Kyrgyz_alphabets
- Usage: IPA for Kyrgyz Arabic script
- Status: ✅ Integrated (32 mappings)
- Note: Historical correspondence; 35 Arabic script characters with IPA mappings. Kyrgyz: 36 phonemes (14 vowels, 22 consonants)
- Reference: https://en.wikipedia.org/wiki/Uyghur_language
- Usage: IPA for Uyghur Arabic script
- Status: ✅ Integrated (33 mappings)
- Reference: https://www.loc.gov/catdir/cpso/romanization/ottoman.pdf
- Usage: Library cataloging for Ottoman Turkish texts
- Status: ✅ Integrated (49 mappings)
- Note: Ottoman Turkish used Arabic script 1299-1928. Vowel diacritics deferred to Arabic/Persian tables per ALA-LC guidelines.
- Reference: DMG transliteration guidelines
- Usage: German academic and library systems
- Status: ✅ Integrated (49 mappings)
- Reference: https://brill.com/fileasset/downloads_products/27684_EI3-Instructions-for-Authors.pdf
- Usage: Islamic studies reference works
- Status: ✅ Integrated (49 mappings)
- Reference: https://loc.gov/catdir/cpso/romanization/pushto.pdf
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (51 mappings)
- Reference: https://assets.publishing.service.gov.uk/media/5ab4e57ee5274a1aa5933455/ROMANIZATION_OF_PASHTO.pdf
- Usage: US/UK geographic names standardization
- Status: 📋 Not integrated (ALA-LC + DIN Pashto provide coverage)
- Reference: https://en.wikipedia.org/wiki/Pashto_phonology
- Usage: IPA for Pashto
- Status: ✅ Integrated
- Reference: https://loc.gov/catdir/cpso/romanization/kashmiri.pdf
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (70 mappings)
- Reference: https://en.wikipedia.org/wiki/Kashmiri_language#Perso-Arabic_script
- Usage: IPA for Kashmiri
- Status: ✅ Integrated (44 mappings)
- Reference: https://loc.gov/catdir/cpso/romanization/kurdish.pdf
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (47 mappings)
- Reference: https://en.wikipedia.org/wiki/Help:IPA/Kurdish
- Usage: IPA for Kurdish
- Status: ✅ Integrated
- Reference: https://loc.gov/catdir/cpso/romanization/sindhi.pdf
- Usage: Library cataloging, academic references
- Status: ✅ Integrated (49 mappings)
- Reference: https://learn.sindhila.edu.pk/alphabets/ipl
- Usage: IPA for Sindhi
- Status: ✅ Integrated
- Reference: https://assets.publishing.service.gov.uk/media/5ab4de6ce5274a1aa593343b/ROMANIZATION_OF_BALUCHI.pdf
- Usage: US/UK geographic names standardization for Balochi (Eastern, Western, Southern dialects)
- Status: ✅ Integrated (68 mappings)
- Note: Based on Hunterian transliteration system (Government of India official standard); harmonized with BGN/PCGN Urdu and Persian systems
- Reference: https://en.wikipedia.org/wiki/Balochi_alphabets
- Usage: IPA for Balochi
- Status: ✅ Integrated
- Reference: http://www.balochiacademy.ir/en/2022/07/01/balochi-standard-alphabets/
- Authority: Balochi Academy Sarbaz
- Published: July 2017 (accepted by BAS activists)
- Usage: Official Balochi standardization with diacritic-based romanization system
- Status: ✅ Integrated (32 mappings)
- Note: IPA notation set by Salim Balòc Kòhgardi. Uses diacritics: à, è, ò, š, ť, ž, ď
- Reference: https://en.wikipedia.org/wiki/Saraiki_alphabet
- Usage: IPA for Saraiki
- Status: ✅ Integrated
- Reference: https://en.wikipedia.org/wiki/Balti_language#Perso-Arabic_alphabet
- Usage: IPA for Balti Perso-Arabic alphabet
- Status: ✅ Integrated (45 mappings)
- Note: Tibetic language with unique characters including U+0768 (ݨ), U+0769 (ݩ), U+0697 (ڗ), U+075C (ݜ)
| System | Reference | Notes |
|---|---|---|
| Bedirxan Kurdish | Wikipedia | Historical romanization (1930s); not actively used |
| Balti | N/A | No widely recognized romanization standard |
The following standards were researched but are not recommended for implementation. Source: Library and Archives Canada Inventory of Romanization Tools (2019).
| Standard | Year | Status | Reason Not Pursued | Research Date |
|---|---|---|---|---|
| ISO 233:1984 | 1984 | 📋 Not Pursued | ISO 233-2 (1993) simplified version provides equivalent coverage with easier typing | N/A |
| BS 4280:1968 | 1968 | 🔴 Rejected | No free documentation; requires purchase (£50-200); 56 years old, likely superseded by ISO/BGN/PCGN; UK now uses BGN/PCGN or ALA-LC | 2024-10-10 |
| I.G.N. System 1973 | 1973 | 🔴 Rejected | Documentation unavailable online; would require contacting IGN France; very niche (French geographic names); UNGEGN (Variant A) provides adequate coverage | 2024-10-10 |
| Lebanon National | 1963 | 📋 Not Prioritized | Country-specific; likely superseded by international standards; documentation difficult to obtain | N/A |
| Morocco National | 1932 | 📋 Not Prioritized | Very old (92 years); country-specific; historical interest only; documentation unavailable | N/A |
| RJGC (Jordan) | Unknown | 📋 Not Prioritized | Country-specific; specialized (geographic names only); limited practical use outside Jordan | N/A |
| Survey of Egypt | Unknown | 📋 Not Prioritized | Country-specific; specialized (cartographic); limited practical application | N/A |
Note on I.G.N. System 1973: This is "Variant B of the Amended Beirut System" (UNGEGN 2018 is "Variant A"). Main difference: Variant B conforms to French orthography and is preferred in Francophone countries (Morocco, Algeria, Tunisia, Lebanon).
The following text files contain extracted mappings from standards documents and are preserved for reference:
| File | Description |
|---|---|
documentation/iso2333_mappings.txt |
ISO 233-3 (2023) Persian mappings |
documentation/ei3_mappings.txt |
Encyclopedia of Islam mappings |
documentation/arabic-roman-mappings.txt |
General Arabic romanization mappings |
Note: All active standards have been migrated to standards_mappings/*.py for auto-discovery and validation.