All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Analysis workflow and CLI. Added the
polyzymd analyzecommand family with YAML-driven setup and execution for RMSF, contacts, distances, catalytic triad, and secondary-structure analyses, plus shared loading, alignment, PBC handling, result models, and aggregation helpers. (src/polyzymd/analysis/) - Comparison engine for multi-condition studies. Added registry-based
comparators, typed comparison result models, shared statistical utilities, and
a generic
polyzymd compare runworkflow for RMSF, contacts, distances, catalytic triad, exposure dynamics, binding free energy, polymer affinity, and secondary structure. (src/polyzymd/compare/) - Config-driven plotting stack. Added
polyzymd compare plot-all, registry-based plot discovery, shared plot themes, and publication-oriented plotters for RMSF, contacts, distances, catalytic triad, secondary structure, binding free energy, exposure, and polymer affinity. (src/polyzymd/compare/plotter.py,src/polyzymd/compare/plotters/) - Secondary-structure comparison support. Added DSSP-backed secondary
structure analysis, comparison results, and plotting so secondary structure is
part of the stable release analysis stack. (
src/polyzymd/analysis/secondary_structure/,src/polyzymd/compare/comparators/secondary_structure.py,src/polyzymd/compare/plotters/secondary_structure.py) - Comprehensive analysis documentation. Added end-to-end tutorials,
cookbook-style guides, API pages, and extension guides covering analysis,
comparison, and plotting workflows. (
docs/source/tutorials/,docs/source/api/compare.md)
- Release presentation labeling for debated metrics. Binding preference,
exposure dynamics, binding free energy, and polymer affinity remain available
from the CLI and plotting pipeline, but PolyzyMD now marks them explicitly as
experimental in command output, plot listings, generated text reports, figure
annotations, config templates, and user-facing docs. (
src/polyzymd/core/experimental.py,src/polyzymd/compare/cli.py,src/polyzymd/compare/plotter.py,README.md) - Stable release scope for analysis demos. The presentation-ready stable
comparison stack is now RMSF, contacts, distances, catalytic triad, and
secondary structure, while the debated science-facing metrics remain visible
but clearly labeled as experimental. (
README.md,docs/source/tutorials/analysis_compare_conditions.md)
- Comparison result and plotting reliability. Fixed multiple comparison and
plotting issues uncovered while building the release branch, including cached
result discovery, condition-specific result paths, partition-aware BFE plots,
shared-path bugs in the plot orchestrator, contacts/distances comparison edge
cases, and corrupted-trajectory handling in contacts/exposure workflows.
(
src/polyzymd/compare/,src/polyzymd/analysis/contacts/,src/polyzymd/analysis/distances/)
- Analysis supports OpenMM trajectories only. The
polyzymd analyzecommands expect DCD trajectories in PolyzyMD's standard directory layout. GROMACS XTC trajectory support is planned for v1.2.1 (#47). Users running GROMACS simulations should use native GROMACS analysis tools or MDAnalysis directly until then.
- Applied
ruff formatto 5 source files that failed the CI formatting check (cli/main.py,config/loader.py,config/schema.py,simulation/runner.py,workflow/slurm.py).
- Pixi replaces conda/mamba for environment management. SLURM job scripts
now use
pixi shell-hookinstead ofmodule load+conda activate. Existing conda environments still work for local use, but HPC job submission requires a pixi installation. See the updated Installation Guide. - Removed
polyzymd-submitandpolyzymd-continueentry points. These console-script aliases were broken and unused. Usepolyzymd submitandpolyzymd run-segmentinstead. - Removed deprecated GROMACS exporter API.
PositionRestraintGenerator.generate(),generate_all_from_config(), and theTopologyModifierclass have been removed. These were dead code never used externally. UsePositionRestraintGenerator.add_posres_to_itp_files()instead.
polyzymd statusCLI command. Displays a compact progress overview for all replicates of a simulation with colored Unicode progress bars, completion percentages, nanosecond progress, and per-replicate status. Auto-detects replicate directories via the naming template. Read-only (usesload_progress()only). (cli/main.py,cli/colors.py,config/schema.py)- Wall-time restart checkpoints for SLURM preemption resilience. The
simulation loop now saves portable
restart_state.xml+restart_system.xmlat a configurable wall-time interval (checkpoint_intervalin config, default 60s). On SLURM preemption, the loop detects SIGTERM within ~15s (via adaptive sub-chunking) and saves an interrupted state before the grace period expires. Previously, a singlesimulation.step(200000)call could block for ~2 minutes, leaving no time for graceful shutdown within a 120s grace period. (simulation/runner.py,simulation/continuation.py,simulation/signals.py,config/schema.py) - Adaptive sub-chunk sizing. After the first checkpoint interval, the
loop measures actual steps/second and adjusts the sub-chunk size to
target ~15s between interrupt checks. This ensures responsive signal
handling regardless of system size or hardware speed.
(
simulation/runner.py,simulation/continuation.py) - Portable recovery path priority. Continuation recovery now prefers
portable state XML files (
interrupted_state.xml,restart_state.xml) over binary.chkcheckpoints, which are not portable across heterogeneous GPU clusters. Binary.chkis only used as a last-resort fallback for legacy interrupted segments or hard-killed segments. (simulation/continuation.py) - Per-module colored logging. Each module group (builders, simulation,
workflow, exporters, etc.) gets a distinct near-white tinted color for
INFO/DEBUG log messages, making it easy to visually distinguish which
subsystem produced each log line. WARNING stays amber yellow, ERROR stays
red. Colors auto-detect terminal capability (truecolor > 256-color > basic
none) and respect the
NO_COLORenvironment variable. (cli/colors.py— new module) --no-colorCLI flag. Disables all ANSI color output for logging andcolored_echomessages. Added to the top-levelpolyzymdcommand group. (cli/main.py)- Hard-kill recovery. If a SLURM job is killed without a graceful signal
(e.g., node failure,
scancel, OOM), the next job detects the incomplete segment via checkpoint + CSV analysis and resumes from the last checkpoint rather than re-running the entire segment. (simulation/runner.py,simulation/progress.py) - Interruptible equilibration. Equilibration stages now run in chunked
steps and respond to SIGUSR1/SIGTERM, allowing graceful shutdown during
long equilibration phases. Previously, interruption during equilibration
meant the entire stage had to be re-run. (
simulation/runner.py) - Checkpoint-based continuation.
run-segmentautomatically determines whether to build, continue from checkpoint, or skip based on filesystem state. No manual segment tracking required. - FAILED segment cleanup.
run-segmentdetects and removes incompleteFAILED-state segments before retrying, preventing permanent stuck states. (cli/main.py) CheckpointReporterfor production segments. Segment 0 now savessystem.xmlearly and uses a dedicated checkpoint reporter, enabling recovery even if the simulation crashes before the first trajectory frame. (simulation/runner.py)--pixi-envoption forpolyzymd submitandpolyzymd recover. Overrides the default pixi environment name in generated SLURM scripts (default is auto-selected based on the SLURM preset). (cli/main.py)--memoryoption forpolyzymd recover. Overrides the SLURM memory allocation in recovery job scripts, matching the existing--memoryflag onpolyzymd submit. Useful when a job OOM-killed and needs to be resumed with more RAM. (cli/main.py)- squeue-based duplicate detection at submission time. Both
polyzymd submitandpolyzymd recover --submitnow querysqueuefor RUNNING/PENDING jobs with the same job name before submitting. If a duplicate is found, submission is blocked with a clear error message. The check is best-effort: ifsqueueis unavailable (non-SLURM environment, CI), a warning is logged and submission proceeds normally. A new--forceflag on both commands allows explicit override. (workflow/daisy_chain.py,cli/main.py) _estimate_steps_from_csvhelper. Estimates completed steps fromstate_data.csvwhen progress.json is missing or stale, enabling accurate progress reporting after hard kills. (simulation/progress.py)
- All
click.echo()calls in the CLI migrated tocolored_echo()with phase-aware coloring (e.g., build commands use sage green, workflow commands use lavender). Success messages (click.style(fg="green")) and error messages (click.style(fg="red")) are preserved as-is. - All
print()calls in production code (workflow/daisy_chain.py,exporters/gromacs.py,data/solvents/_generator.py) migrated toLOGGER.info()so they flow through the colored logging formatter. - SLURM job scripts now activate the environment via
pixi shell-hook -e <env> --manifest-path <path>instead ofmodule load+conda activate. The manifest path is auto-detected from thepolyzymdbinary location at submission time. pixi.tomltrimmed to actual runtime dependencies with three environments:build(no CUDA),cuda-12-4(CU Boulder Blanca),cuda-12-6(PSC Bridges2).- Added
openbabeltopixi.tomlconda dependencies. Required at import time bypolymerist.polymers.building.mbconvert, whichpolyzymd buildtriggers unconditionally.
- Concurrency guard prevents duplicate segment execution. When SLURM
requeues a preempted job while a recovery script also resubmits, two jobs
can race to start the next segment.
run-segmentnow checks for any segment with a recently-modified checkpoint file (< 600s) classified as RUNNING and exits with code 2 (EXIT_CODE_CONCURRENT) instead of launching a concurrent segment. The SLURM bash wrapper intercepts exit code 2 and terminates cleanly without resubmitting, breaking infinite submit-cancel-resubmit loops that occurred when a job was accidentally double-submitted. (cli/main.py,simulation/signals.py,workflow/slurm.py) - Overall status now reflects the most recent segment, not
any(). When a simulation had mixed segment statuses (e.g., segment 0 INTERRUPTED, segment 1 FAILED — as seen with the CALB replicate 2 infinite-loop bug), theany(INTERRUPTED)check in the status cascade fired beforeany(FAILED), settingprogress.statusto"interrupted". This misled the user into thinking auto-resume would handle recovery when the simulation actually needed manual resubmission. The status derivation logic now uses the highest-index segment's status to determine the overall state: if the latest segment is FAILED, overall = FAILED. A new_derive_overall_status()helper centralises this logic (previously duplicated inscan_filesystemandvalidate_progress). Additionally, cleanup blocks inrun-segment(FAILED segment removal, hard-kill cleanup) now recomputeprogress.statusbefore saving, preventing stale status values from persisting inprogress.json. (simulation/progress.py,cli/main.py) - Hard-killed segments now retry in-place instead of advancing.
When SLURM kills a job without a grace period (SIGKILL, node failure,
OOM), no
INTERRUPTEDmarker file is written. Previously, the next job would classify the segment as INTERRUPTED via the stale-checkpoint heuristic and advance to a new segment index, loading fromrestart_state.xml— potentially losing all work done after that periodic checkpoint. Now,run-segmentdetects this case (highest-index segment classified INTERRUPTED but missing theINTERRUPTEDmarker file), cleans up the incomplete directory, and removes it from progress. This causesget_next_segment_info()to reassign the same index, retrying from the previous completed segment's state with no data loss. (cli/main.py) _estimate_steps_from_csvnow returns per-segment step counts. Previously, the function returned the raw cumulative step number from the last CSV row. OpenMM'sStateDataReporterwrites cumulative integrator steps (from time=0 including equilibration and all prior segments), so for continuation segments this massively overcounted progress — causingvalidate_progress()to mark in-progress simulations as completed. The function now computeslast_step - first_stepfor the correct per-segment delta. For single-row CSVs it returns 0 (safe undercount). (simulation/progress.py)- Equilibration
finished_attimestamps are now populated.EquilibrationStageRecord.finished_atexisted in the Pydantic model but was always null.scan_equilibration_stages()now sets it from the checkpoint file's mtime, and_run_initial_segment()sets it to the current time during live runs. (simulation/progress.py,cli/main.py) polyzymd statusns calculation now includes interrupted segments. Previously, the status command usedtime_completed_ns()which only counts COMPLETED segments, showing 0.000 ns even when millions of steps had been simulated across interrupted replicates. Now usestotal_steps_completed * timestep_fs / 1e6for accurate progress display. (cli/main.py)polyzymd statussummary no longer falsely reports "All completed". The summary line only countedinterrupted,failed,not_started, andnot_foundstatuses as needing attention, so replicates withrunningstatus (including stale-running jobs that were killed without graceful shutdown) fell through to the "All N replicates completed!" message. The summary now trackscompleted_count,running_count, andneed_attentionseparately, and only shows the green completion message when every replicate hasstatus == completed. (cli/main.py)polyzymd statusnow detects stale "running" replicates. Switched fromload_progress()(raw JSON read) toload_or_scan_progress()which validates progress against the filesystem. If a checkpoint file is older than 10 minutes, the segment is reclassified fromrunningtointerrupted, matching whatpolyzymd recoveralready sees. The corrected status is saved back toprogress.json. (cli/main.py)- Position restraints now applied to all polymer ITP files. Previously,
only the first polymer ITP (
_MOL1.itp) received#ifdef POSRES_POLYMERblocks. With random copolymers, OpenFF Interchange generates a separate molecule type (and ITP file) per unique polymer sequence, leaving most polymer chains unrestrained. The rewrittenPositionRestraintGeneratordiscovers all polymer ITPs, parses each one's[ atoms ]section to identify heavy atoms by atom name (HMR-safe), and appends position restraint blocks to every polymer ITP. (exporters/gromacs.py) - Residue numbering in .gro files is now globally sequential. OpenFF
Interchange's GRO writer computes
(residue_index + copy_index) % 100_000, which creates a sliding +1 offset for multi-residue molecules (polymers). A new post-processing step (_fix_gro_residue_numbering) assigns globally sequential residue numbers across all multi-residue molecule copies, enabling unique residue-based selection in MDAnalysis (e.g.,resid 11:15for the third polymer chain). Single-residue molecules (water, ions) are left unchanged. (exporters/gromacs.py) - Segment 0 progress loss: previously, a hard kill during segment 0 could
lose all progress because no checkpoint existed. Now
system.xmlis saved at the start of production, enabling checkpoint-based recovery. - Stale
.pycfiles from feature branches no longer cause import errors after branch switching (resolved by merging both feature branches). - Removed
espaloma-chargefrompixi.tomlto prevent a broken import chain.polymeristeagerly importsespaloma_chargeat module level (in_toolkits.py), which pulls indgl, which fails to loadlibgraphboltwhen dgl and PyTorch versions are mismatched. Since polyzymd uses NAGL (not espaloma) for charge assignment, and NAGL >=0.2 has a pure-PyTorch fallback that works without dgl, removingespaloma-chargeeliminates the crash with no loss of functionality. - Fixed indentation bug in generated GROMACS run script. The
post-processing section had a misindented
echoline that would cause the script to fail underset -e. (exporters/gromacs.py) recover --submitno longer rebuilds system when equilibration is complete. When a replicate had completed equilibration but no production segments,polyzymd recover --submitgenerated a SLURM script that re-ran the full build routine. Since polymer packing and solvation are non-deterministic, this produced a different atom count, causingloadCheckpointto crash with"wrong number of particles". Therecovercommand now detects pre-built system files (solvated_system.pdb,system.xml) and passes--skip-buildto the generated script. Additionally,_run_initial_segmentnow skips minimization and equilibration when--skip-buildis active and equilibration is already recorded as complete inprogress.json, jumping directly to production segment 0. (cli/main.py)- Co-solvent volume fraction validator no longer crashes with concentration-based
co-solvents.
validate_volume_fractionscalledsum()overvolume_fractionfields without filteringNonevalues, raisingTypeErrorwhen any co-solvent usedconcentrationinstead ofvolume_fraction. (config/schema.py) - Equilibration stages now honour
thermostat_timescalefor integrator friction. Thethermostat_timescalefield was read from the stage config but never used; the integrator always received the default friction of 1.0/ps. Friction is now computed as1.0 / thermostat_timescale. (simulation/runner.py) - Barostat temperature now tracks the integrator during NPT temperature ramps.
When an equilibration stage used NPT ensemble with temperature ramping, the
MonteCarloBarostatwas initialized at the starting temperature but never updated as the ramp progressed. This caused the barostat to evaluate volume-move acceptance at the wrong temperature throughout the entire ramp, leading to incorrect pressure coupling. The ramp loop and final-temperature section now callcontext.setParameter(MonteCarloBarostat.Temperature(), ...)to keep the barostat in sync with the integrator. (simulation/runner.py) - EQ_INTERRUPTED marker now records the correct temperature during ramps.
The temperature ramp loop incremented
current_tempbefore the interrupt check, so theEQ_INTERRUPTEDmarker saved a temperature one increment higher than what was actually simulated. The increment is now moved to after the interrupt check, and log messages report the correct temperature. (simulation/runner.py) - Temperature ramp resume no longer double-counts fast-forwarded chunks.
On resume,
current_tempwas initialized fromresume_temperature(the value saved in the marker) and then the fast-forward skip loop also incrementedcurrent_tempfor each skipped chunk, causing the simulation to jump ahead in temperature. The ramp loop now always starts fromstage.temperature_startand lets the fast-forward loop reconstruct the correct temperature by skipping completed chunks. (simulation/runner.py) - Unrecoverable hard-kill state (checkpoint without system.xml) now raises
immediately. When a segment was hard-killed and only the periodic
checkpoint existed (no
system.xml), the continuation manager logged an error but fell through silently, returning paths to non-existent files. This caused confusing downstreamFileNotFoundErrormessages. Case 5b now raisesFileNotFoundErrorimmediately with a clear message. (simulation/continuation.py) check-progresserrors no longer trigger infinite SLURM resubmission. Errors incheck-progress(config load failure, missing progress file) exited with code 1 — the same code used for "work remains." The SLURM bash wrapper interpreted any non-zero exit as "resubmit," causing an infinite loop on persistent errors. Error conditions now exit with code 3 (EXIT_CODE_CHECK_ERROR), and the SLURM template only resubmits on exit code 1. (cli/main.py,simulation/signals.py,workflow/slurm.py)- Progress file writes are now crash-safe with
fsyncbefore rename.save_progress()already used atomic write-to-temp-then-rename, but did not callfsyncon the temporary file beforeos.replace(). On power loss or kernel panic the rename could be durable while the file contents were not, leaving a zero-length or corruptprogress.json. The function now callsf.flush()andos.fsync(f.fileno())before the rename. (simulation/progress.py) SlurmConfig.from_preset()now raisesValueErrorfor unknown preset names. Previously, an unrecognised preset name silently fell back to theaa100preset, masking typos in config files or CLI arguments. The error message lists all valid presets. (workflow/slurm.py)save_configno longer mutates the globalyaml.Dumperrepresenter registry. The custom multiline-string representer was registered viayaml.add_representer(), which permanently altersyaml.Dumperfor the entire process. Now uses a localDumpersubclass so other YAML consumers are unaffected. (config/loader.py)build --dry-run --gromacsnow shows the actual output path. The GROMACS dry-run summary printed the literal string{projects_dir}/{replicate}/gromacs/instead of interpolating the real directory. (cli/main.py)- Reaction template paths (
initiation,polymerization,termination) are now included in path resolution._expand_pathsand_convert_paths_to_relativeonly knew aboutpdb_path,sdf_path,sdf_directory,cache_directory, andbase_directory. Relative.rxnpaths in thereactions:config block were passed through as-is, causingFileNotFoundErrorwhen the config file lived in a different directory from the CWD. (config/loader.py) to_signac_statepoint()no longer crashes with concentration-based co-solvents. The statepoint export unconditionally accessedcosolvent.volume_fraction, which isNonefor concentration-based co-solvents. Now exports_fractionor_molaritydepending on which is set. (config/schema.py)load_checkpointnow restores velocities, not just positions.getState()was called withgetPositions=Trueonly, so_current_velocitiesremained stale (orNone) after loading a checkpoint. If equilibration stages subsequently checked_current_velocities, they could incorrectly re-randomize velocities instead of continuing from the checkpoint's kinetic state. (simulation/runner.py)- Sorted import block in
run-segmenthandler. ruff I001 (import sort) violation in the equilibration progress save block. (cli/main.py) - Signal handler no longer calls
LOGGER(async-signal-unsafe). The_handler()function usedLOGGER.warning(), which acquires Python's logging lock internally. If the signal arrives while application code already holds that lock, the handler deadlocks. Replaced withos.write(2, ...)which is async-signal-safe. (simulation/signals.py) - Cross-check INTERRUPTED markers against CSV data to detect stale markers.
If a segment was gracefully interrupted, then restarted in-place and ran much
further before being hard-killed, the old INTERRUPTED marker would persist
with the original (too-low) step count while the CSV reflected all the work
actually done.
_scan_segment_dirnow compares the marker'ssteps_completedagainst the CSV delta; if the CSV shows more than 2× the marker value and exceeds 1 million steps, the stale marker is overridden with the CSV estimate and a warning is logged. This prevents undercounting completed steps, which would inflate the "remaining" calculation and cause the simulation to overshoot its target duration. (simulation/progress.py)
- Rewrote installation guide for pixi (replaces conda/mamba instructions).
- Removed phantom
polyzymd runandpolyzymd continueCLI references (these commands never existed in the code). - Added
polyzymd run-gromacssection to CLI reference. - Updated HPC guide with pixi shell-hook activation examples.
- Fixed stale
polymerist-envenvironment name references. - Updated troubleshooting guide for pixi workflow.
- Initial public release on PyPI.