Skip to content

Latest commit

 

History

History
376 lines (291 loc) · 17.5 KB

File metadata and controls

376 lines (291 loc) · 17.5 KB

Arabic Transliteration Standards Reference

Version: 1.1.0
Last updated: 2025-10-22

MASTER REFERENCE: Authoritative URLs for all standards. Referenced by arabic-roman-source.md for romanization data tables.

System Status: ✅ Complete - 1529/1529 mappings verified (100%) across 45+ integrated standards Implementation: Python mapping files in standards_mappings/ (auto-discovered)

Arabic International Standards

BGN/PCGN:1956 (Board on Geographic Names/Permanent Committee on Geographical Names)

UNGEGN: 1972 (United Nations Group of Experts on Geographical Names)

UNGEGN 5.0, 2018

ALA-LC, 2012 (American Library Association-Library of Congress)

DIN 31635 (Deutsches Institut für Normung)

Arabic Academic Systems

Brill Simple Arabic Transliteration System 1.0, 2010

DMG (Deutsche Morgenländische Gesellschaft)

  • Reference: https://dmg-web.de/en/publications/transliteration-of-the-arabic-script/
  • Usage: German Oriental studies standard; adopted by Hans Wehr Dictionary (German edition, 1952)
  • Status: ✅ Integrated (46 mappings)
  • Note: Academic precision system for reversible transliteration. Identical to Hans Wehr German edition (1952). Uses diacritics (ḥ, ḫ, ṭ, ẓ, ġ, š, ǧ, ṯ, ḏ) and special symbols (ʿ, ʾ). Aims for language-neutral, one-to-one correspondence.

Wehr/Cowan (English Hans Wehr) - 1961

  • Reference: https://archive.org/details/dictionaryofmode0000wehr_t0g4
  • Published: Wehr, Hans. A Dictionary of Modern Written Arabic (Arabic-English), 4th edition, ed. J. Milton Cowan (Ithaca, N.Y.: Spoken Language Services, 1994)
  • Usage: English edition of Hans Wehr Dictionary for anglophone readers
  • Based on: DMG system, adapted by Cowan for English readability
  • Key Differences from DMG: Anglicized digraphs (š→sh, ǧ→j, ḫ→kh, ġ→gh, ṯ→th, ḏ→dh)
  • Status: ✅ Integrated (42 mappings)
  • Note: Prioritizes pronunciation accessibility over reversibility.

ISO 233-2 Arabic Simplified

Encyclopedia of Islam Arabic (EI3/EWIC/EQ)

Phonetic Representation Systems

Arabic IPA (OSU Egyptian Arabic)

Arabic IPA (University of Michigan)

Arabic IPA (Wikipedia)

Persian/Farsi Standards

ISO 233-3 Persian

UNGEGN Persian (2012)

BGN/PCGN Persian (1958, updated 2019)

ALA-LC Persian

Encyclopedia of Islam Persian (EI3/EWIC/EQ)

Persian IPA (OSU)

Urdu Standards

ALA-LC Urdu

UNGEGN Urdu (1972, amended 1977)

  • Reference: https://unstats.un.org/unsd/ungegn/working_groups/wg5/documents/wgrr4urdu.pdf
  • Usage: UN approved system for geographical names
  • Status: ✅ Integrated (41 mappings)
  • Note: Based on Hunterian transliteration system (Government of India official standard developed by William Wilson Hunter, 19th century, based on William Jones 1746-1794). Features aspirated consonants (bh, th), retroflex with dots (ṭ, ḍ), schwa deletion, long vowels (ā, ī, ū)

BGN/PCGN Urdu (2018)

Encyclopedia of Islam Urdu/Punjabi (EI3/EWIC/EQ)

Urdu IPA (CLE)

Urdu IPA (Wikipedia)

Punjabi Standards

UNGEGN Punjabi

Encyclopedia of Islam Punjabi (EI3/EWIC/EQ)

Punjabi IPA (Wikipedia)

Uyghur/Kazakh/Kyrgyz Standards

BGN/PCGN Kazakh, 2019

BGN/PCGN Kyrgyz, 2019

Kazakh Official Latin Alphabet, 2021

BGN/PCGN Uyghur, 2024

Kazakh IPA (Wikipedia)

  • Reference: https://en.wikipedia.org/wiki/Kazakh_alphabets
  • Usage: IPA for Kazakh Arabic script
  • Status: ✅ Integrated (33 mappings)
  • Note: Historical correspondence between Arabic script and Latin alphabet with 35 Arabic script characters

Kyrgyz IPA (Wikipedia)

  • Reference: https://en.wikipedia.org/wiki/Kyrgyz_alphabets
  • Usage: IPA for Kyrgyz Arabic script
  • Status: ✅ Integrated (32 mappings)
  • Note: Historical correspondence; 35 Arabic script characters with IPA mappings. Kyrgyz: 36 phonemes (14 vowels, 22 consonants)

Uyghur IPA (Wikipedia)

Ottoman Turkish Standards

ALA-LC Ottoman Turkish

  • Reference: https://www.loc.gov/catdir/cpso/romanization/ottoman.pdf
  • Usage: Library cataloging for Ottoman Turkish texts
  • Status: ✅ Integrated (49 mappings)
  • Note: Ottoman Turkish used Arabic script 1299-1928. Vowel diacritics deferred to Arabic/Persian tables per ALA-LC guidelines.

DMG Turkish (Deutsche Morgenländische Gesellschaft)

  • Reference: DMG transliteration guidelines
  • Usage: German academic and library systems
  • Status: ✅ Integrated (49 mappings)

Encyclopedia of Islam Ottoman Turkish (EI3)

Extended Arabic Script Standards

Pashto Standards

ALA-LC Pashto

BGN/PCGN Pashto, 1968

Pashto IPA (Wikipedia)

Kashmiri Standards

ALA-LC Kashmiri

Kashmiri IPA (Wikipedia)

Kurdish (Sorani) Standards

ALA-LC Kurdish (Sorani)

Kurdish IPA (Wikipedia)

Sindhi Standards

ALA-LC Sindhi

Sindhi IPA (SLA)

Balochi Standards

BGN/PCGN Baluchi, 2008

Balochi IPA (Wikipedia)

Balochi Academy Sarbaz (BAS)

  • Reference: http://www.balochiacademy.ir/en/2022/07/01/balochi-standard-alphabets/
  • Authority: Balochi Academy Sarbaz
  • Published: July 2017 (accepted by BAS activists)
  • Usage: Official Balochi standardization with diacritic-based romanization system
  • Status: ✅ Integrated (32 mappings)
  • Note: IPA notation set by Salim Balòc Kòhgardi. Uses diacritics: à, è, ò, š, ť, ž, ď

Saraiki Standards

Saraiki IPA (Wikipedia)

Balti Standards

Balti IPA (Wikipedia)


Research & Undocumented Standards

Historical & Undocumented Systems

System Reference Notes
Bedirxan Kurdish Wikipedia Historical romanization (1930s); not actively used
Balti N/A No widely recognized romanization standard

Arabic Standards Not Pursued

The following standards were researched but are not recommended for implementation. Source: Library and Archives Canada Inventory of Romanization Tools (2019).

Standard Year Status Reason Not Pursued Research Date
ISO 233:1984 1984 📋 Not Pursued ISO 233-2 (1993) simplified version provides equivalent coverage with easier typing N/A
BS 4280:1968 1968 🔴 Rejected No free documentation; requires purchase (£50-200); 56 years old, likely superseded by ISO/BGN/PCGN; UK now uses BGN/PCGN or ALA-LC 2024-10-10
I.G.N. System 1973 1973 🔴 Rejected Documentation unavailable online; would require contacting IGN France; very niche (French geographic names); UNGEGN (Variant A) provides adequate coverage 2024-10-10
Lebanon National 1963 📋 Not Prioritized Country-specific; likely superseded by international standards; documentation difficult to obtain N/A
Morocco National 1932 📋 Not Prioritized Very old (92 years); country-specific; historical interest only; documentation unavailable N/A
RJGC (Jordan) Unknown 📋 Not Prioritized Country-specific; specialized (geographic names only); limited practical use outside Jordan N/A
Survey of Egypt Unknown 📋 Not Prioritized Country-specific; specialized (cartographic); limited practical application N/A

Note on I.G.N. System 1973: This is "Variant B of the Amended Beirut System" (UNGEGN 2018 is "Variant A"). Main difference: Variant B conforms to French orthography and is preferred in Francophone countries (Morocco, Algeria, Tunisia, Lebanon).


Legacy Text Mapping Files

The following text files contain extracted mappings from standards documents and are preserved for reference:

File Description
documentation/iso2333_mappings.txt ISO 233-3 (2023) Persian mappings
documentation/ei3_mappings.txt Encyclopedia of Islam mappings
documentation/arabic-roman-mappings.txt General Arabic romanization mappings

Note: All active standards have been migrated to standards_mappings/*.py for auto-discovery and validation.