A Python tool to convert Persian/Arabic PDF tables to Excel format with proper Right-to-Left (RTL) text support.
- ✅ Handles Persian/Arabic RTL text correctly
- ✅ Preserves text direction while keeping numbers readable
- ✅ Auto-detects and extracts tables from PDF
- ✅ Creates Excel files with RTL layout
- ✅ Properly formats headers and data
- ✅ Auto-adjusts column widths
- ✅ Freezes header row for easy scrolling
pip install pdfplumber openpyxl- Clone this repository or download the script
- Install dependencies:
pip install -r requirements.txtConvert a PDF file to Excel (auto-generates output filename):
python pdf_to_excel_improved.py input.pdfSpecify output filename:
python pdf_to_excel_improved.py input.pdf output.xlsxpython pdf_to_excel_improved.py input.pdf -o output.xlsx -s "My Data" -f "B Nazanin"Arguments:
input.pdf- Input PDF file (required)output.xlsx- Output Excel file (optional)-o, --output- Output file path-s, --sheet- Sheet name (default: "Data")-f, --font- Font name (default: "Arial", can use "B Nazanin" for Persian)
# Simple conversion
python pdf_to_excel_improved.py bank_statement.pdf
# Custom output name
python pdf_to_excel_improved.py report.pdf converted_report.xlsx
# With custom sheet name and font
python pdf_to_excel_improved.py data.pdf -o result.xlsx -s "تراکنشها" -f "B Nazanin"- Extracts tables from PDF using pdfplumber
- Fixes text direction - Reverses Persian/Arabic text while keeping numbers intact
- Creates RTL Excel - Sets up proper right-to-left layout
- Formats cells - Applies borders, alignment, and styling
- Auto-adjusts - Sets appropriate column widths and row heights
The script intelligently handles mixed content:
- Persian/Arabic text: Reversed for proper RTL display
- Numbers and dates: Kept in original order
- Mixed text: Each part handled correctly
The generated Excel file includes:
- RTL (Right-to-Left) sheet layout
- First row from PDF as headers (bold, centered)
- Data rows with right-aligned text
- Numbers center-aligned for readability
- Border around all cells
- Frozen header row
- Auto-adjusted column widths
Issue: Persian text appears broken or reversed
- Solution: The script automatically fixes this. Make sure you're using the latest version.
Issue: Numbers appear reversed
- Solution: The script preserves number order. If you see reversed numbers, they might be in the original PDF.
Issue: Missing data
- Solution: Adjust
min_table_rowsandmin_columnsparameters inextract_tables_from_pdf()function.
MIT License - Feel free to use and modify!
Contributions are welcome! Please feel free to submit a Pull Request.
Created for handling Persian/Arabic PDF documents with proper text direction support.