Skip to content

A Python script that converts WordPress eXtended RSS (WXR) export files to CSV format for easy data analysis and migration.

License

Notifications You must be signed in to change notification settings

LunarBit-dev/WXR-to-CSV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordPress WXR to CSV Converter

A Python script that converts WordPress eXtended RSS (WXR) export files to CSV format for easy data analysis and migration.

Features

  • Converts WordPress export files (WXR format) to CSV
  • Extracts posts, pages, and custom post types
  • Preserves metadata including categories, tags, custom fields
  • Handles HTML content and special characters properly
  • Command-line interface for easy automation
  • No external dependencies (uses Python standard library only)

Installation

  1. Clone or download this repository
  2. Ensure you have Python 3.6+ installed
  3. No additional packages needed - uses only Python standard library

Usage

Command Line

Basic usage:

python wxr_to_csv.py your_wordpress_export.xml

Specify output file:

python wxr_to_csv.py your_wordpress_export.xml -o output.csv

Include specific post types:

python wxr_to_csv.py your_wordpress_export.xml -t post page custom_post_type

Autorun script

python autorun.py

Python Script

from wxr_to_csv import WXRToCSVConverter

converter = WXRToCSVConverter()
converter.convert_to_csv('export.xml', 'output.csv', ['post', 'page'])

CSV Output Columns

The generated CSV file includes the following columns:

  • post_id: WordPress post ID
  • title: Post/page title
  • post_type: Type of content (post, page, etc.)
  • status: Publication status (publish, draft, etc.)
  • post_date: Publication date
  • post_modified: Last modification date
  • creator: Author username
  • link: Post URL
  • post_name: URL slug
  • description: Post excerpt/description
  • content: Full post content (HTML)
  • excerpt: Post excerpt
  • categories: Categories (semicolon-separated)
  • tags: Tags (semicolon-separated)
  • comment_status: Comment settings
  • ping_status: Pingback/trackback settings
  • post_parent: Parent post ID (for hierarchical content)
  • menu_order: Menu order
  • is_sticky: Sticky post flag
  • post_password: Password protection
  • custom_fields: Custom field data
  • pub_date: RSS publication date
  • post_date_gmt: Publication date (GMT)
  • post_modified_gmt: Modification date (GMT)

Getting WordPress Export Files

  1. Log into your WordPress admin dashboard
  2. Go to Tools → Export
  3. Select All content or choose specific content types
  4. Click Download Export File
  5. Use the downloaded .xml file with this script

Command Line Options

usage: wxr_to_csv.py [-h] [-o OUTPUT] [-t TYPES [TYPES ...]] input_file

Convert WordPress eXtended RSS (WXR) files to CSV format

positional arguments:
  input_file            Path to the WXR file to convert

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output CSV file path (default: same name as input with .csv extension)
  -t TYPES [TYPES ...], --types TYPES [TYPES ...]
                        Post types to include (default: post page)

Examples

Convert all posts and pages:

python wxr_to_csv.py wordpress_export.xml

Convert only blog posts:

python wxr_to_csv.py wordpress_export.xml -t post

Convert custom post types:

python wxr_to_csv.py wordpress_export.xml -t product testimonial

Troubleshooting

Common Issues

  1. "Error parsing WXR file": The XML file may be corrupted or not a valid WXR file
  2. "No posts found": Check that the post types you specified exist in the export
  3. Encoding issues: The script handles UTF-8 encoding by default

Large Files

For very large WordPress exports:

  • The script loads the entire XML file into memory
  • Consider splitting large exports if you encounter memory issues
  • You can filter by post type to reduce the dataset size

License

This project is open source and available under the MIT License.

Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

About

A Python script that converts WordPress eXtended RSS (WXR) export files to CSV format for easy data analysis and migration.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages