A lightweight utility to fix formatting issues in Markdown files exported from Google Docs.
Specific Use Case: This tool is designed to fix the "aggressive escaping" issues that occur when Gemini-generated content (especially mathematical formulas) is saved to Google Docs and then exported as Markdown.
It specifically handles:
- LaTeX Math Repair: Restores formulas broken by escaping (e.g.,
\\taubecoming\tau). - Special Character Fixes: Restores
_,=,.,+,[which are often incorrectly escaped.
[ 中文说明 ]
- Fix Aggressive Escaping: Automatically restores
_,=,.,+,[and other characters that Google Docs incorrectly escapes. - LaTeX Math Repair: Ensures LaTeX formulas (e.g.,
\tau,\boldsymbol) render correctly. - Batch Processing: Scans and fixes all
.mdfiles in thegoogle_mddirectory. - Word Conversion: Optionally converts fixed Markdown files to
.docx(requires Pandoc).
- Python 3.x
- Pandoc (Optional, strictly for
.docxconversion)
- Clone this repository:
git clone https://github.com/your-username/google2md_mathfix.git
- (Optional) Install development dependencies if you plan to build the executable:
pip install -r requirements.txt
- Prepare Files:
- Option A (Markdown): Place your exported Google Docs Markdown files (the broken ones) into the
google_mdfolder. - Option B (Word): Place your Google Docs
.docxfiles into thegoogle_docxfolder (or the root directory). The script will first convert them to Markdown, fix the equations, and then process them.
- Option A (Markdown): Place your exported Google Docs Markdown files (the broken ones) into the
- Run the Script:
Or use the main processor which also handles Word conversion:
python fix_md.py
python main.py
- Check Output:
- Fixed Markdown files will overwrite the originals in
google_md(if changes are needed). - Converted Word documents will appear in
word_output.
- Fixed Markdown files will overwrite the originals in
main.py: The main entry point. Orchestrates the scanning, fixing, and conversion process. It supports processing both the current directory and thegoogle_mdfolder.fix_md.py: Contains the core logic and Regex patterns for repairing the Markdown content. Can be run standalone to just fix files without conversion.md_to_docx.py: A utility script that handles calling Pandoc to convert Markdown to Word documents.reset_test.py: A testing utility that intentionally "corrupts" a clean Markdown file (re-applies escaping) to verify iffix_md.pycan repair it correctly.
If you want to create a standalone .exe file to run without Python installed:
- Install PyInstaller:
pip install -r requirements.txt
- Build:
pyinstaller google2md_mathfix.spec
- Locate Executable:
The generated file will be in the
distfolder:dist/google2md_mathfix.exe. You can copy this.exefile to any folder containing your markdown files (orgoogle_mdfolder) and run it directly.
The script applies a series of strictly ordered Regular Expressions to reverse the escaping done by Google Docs.
- Restores
\\to\(critical for LaTeX). - Restores
_,=,-etc. to their original characters. - Restores horizontal rules (
---) often broken as## ---.
MIT License