A Python script to parse MOPAC output files (.out) and extract key computational chemistry data including energy, atomic charges, dipole moment, molecular formula, and geometry.
- Energy Extraction: Parses final heat of formation or total energy
- Atomic Charges: Extracts NET ATOMIC CHARGES with atom numbers, symbols, and charge values
- Dipole Moment: Extracts dipole components and total (if available)
- Molecular Formula: Derives or extracts the empirical formula
- Geometry: Extracts the last Cartesian coordinates and saves as Gaussian input file (.gjf)
- Batch Processing: Processes all .out files in the current directory
- Output Formats:
- JSON file with extracted data (energy, charges, dipole, formula)
- Individual .gjf files for each geometry
- Python 3.x
- Standard library modules:
re,os,json
- Place your MOPAC .out files in the same directory as the script
- Run the script:
python mopac_parser.py- The script will:
- Process all .out files in the directory
- Print summary information for each file
- Save extracted data to
parsed_results.json - Save geometries to individual .gjf files
Contains extracted data for all processed files:
{
"filename.out": {
"energy": 2093.54508,
"charges": [
{
"atom": 1,
"symbol": "C",
"charge": -0.479686
}
],
"dipole": null,
"formula": "C31 H64 = 95 atoms"
}
}Gaussian input files with the extracted geometry:
%chk=filename.chk
# pm7
Geometry from filename.out
0 1
C 7.193082 1.842861 0.030668
H 7.451486 2.497367 0.855113
...
The script uses a MOPACParser class with the following methods:
read_file(): Reads the .out file contentextract_energy(): Finds final energy valuesextract_charges(): Parses atomic charges sectionextract_dipole_moment(): Extracts dipole dataextract_geometry(): Parses Cartesian coordinatesextract_molecular_formula(): Gets or derives formulaparse(): Orchestrates all extractions
- Checks for file existence
- Handles missing data gracefully (returns None for unavailable data)
- Continues processing other files if one fails
- Validates coordinate parsing
Parsed vvc.out: Formula = C31 H64 = 95 atoms, Energy = 2093.54508, Dipole total = N/A
Geometry saved to vvc.gjf
All results saved to parsed_results.json
- The script extracts the last occurrence of geometry sections
- Charges and geometry are distinguished by line format (6 parts for charges, 5 for geometry)
- Dipole moment is extracted if present in the output
- The .gjf files use PM7 method with charge 0 and multiplicity 1