Skip to content

Latest commit

 

History

History
116 lines (82 loc) · 4.34 KB

File metadata and controls

116 lines (82 loc) · 4.34 KB

Adobe Arabic Character Sets - User Guide

Last Updated: 2025-10-22

Quick Start

I want to...


Quick Reference

Character Set Files (Arabic Script)

File Use For Details
adobe-arabic-1.txt Basic Arabic See README
adobe-arabic-2.txt Urdu, Persian, Punjabi See README
adobe-arabic-3.txt Uyghur, Kazakh, Kyrgyz See README
adobe-arabic-4.txt Kashmiri, Saraiki, Balti See README
adobe-arabic-5.txt Pashto, Sindhi, Kurdish, Balochi See README

Romanization Files (Latin Script)

Important: All romanization modules require Adobe Latin 3

File Use For Details
adobe-arabic-1-roman.txt Romanizing Arabic See README
adobe-arabic-2-roman.txt Romanizing Urdu, Persian, Punjabi See README
adobe-arabic-3-roman.txt Romanizing Uyghur, Kazakh, Kyrgyz See README
adobe-arabic-4-roman.txt Romanizing Kashmiri, Saraiki, Balti See README
adobe-arabic-5-roman.txt Romanizing extended languages See README

How to Look Up Romanization

"How is this Arabic character romanized?"

Go to: documentation/arabic-roman-source.md

This file contains tables showing how each character is romanized across different standards (BGN/PCGN, UNGEGN, ALA-LC, ISO, etc.)

Example:

  • Looking for how ع (Ayn) is romanized in BGN/PCGN?
  • Find the Arabic or Urdu table
  • Look up Unicode 0639
  • Check the BGN/PCGN column

Notation

  • b = single romanization
  • k/g = multiple options (context-dependent)
  • - = not used in this standard

Programmatic Access

Parse Character Sets (Python)

# Read character set files (tab-delimited)
with open('adobe-arabic-1.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()[1:]  # Skip header
    for line in lines:
        if line.strip() and not line.startswith('#'):
            unicode_code, char, glyph, name, notes = line.split('\t')
            print(f"{unicode_code}: {char} ({glyph})")

Access Romanization Mappings (Python)

# Load romanization mappings from standards_mappings/
import sys
sys.path.insert(0, '.')
from standards_mappings import get_standard_mappings, get_metadata

# Get mappings for a specific standard
mappings = get_standard_mappings('BGN/PCGN Urdu')
# Returns: {'0628': 'b', '067E': 'p', '062A': 't', ...}

# Get metadata
metadata = get_metadata('BGN/PCGN Urdu')
# Returns: {'standard_name': 'BGN/PCGN Urdu', 'table_type': 'urdu', ...}

Common Standards

  • BGN/PCGN - Geographic names (US/UK)
  • UNGEGN - Geographic names (UN)
  • ALA-LC - Library cataloging
  • ISO 233 - International standard
  • IPA - Phonetic transcription

See documentation/arabic-roman-standards.md for complete list with official source URLs.


Resources