Releases: project-gemmi/gemmi
0.7.4
Library
- Add calculation of hydrogen bonds according to the venerable DSSP method
(implementation of the whole DSSP method was considered, attempted and postponed) - Add reading of MRC map in mode 12 (float16_t)
(using the half library that's now bundled with gemmi) - bundled third-party libs: updated fast_float
and added https://sourceforge.net/projects/half - Calculating anomalous maps (f") from a model
- gemmi <-> mmdb conversion: handle SEQRES
- to_mmcif: populate more items in
_struct_ref_seq. - to_pdb: to preserve original letter casing of modres.mod_id
- mmcif: read _em_3d_reconstruction.resolution into Structure::resolution
- Add a function
populate_structure_from_block()(#383) - cif.hpp: keywords loop_, global_ and stop_ can be part of unquoted strings
(stop_is a keyword, butstop_itis not) - Added Intensities::merged()
- Added add 6 PYR residues to the internal ResInfo table
- In add_hydrogens_without_positions(): tweaked how occupancies of neighbouring
atoms are considered - Tweaked determine_cutoff_radius() in dencalc.hpp
Programs
- blobs: can be called with a map instead of map coefficients
- h: add option --d-fract=FRACT (#385) that adds atoms as H/D mix,
with the given deuterium fraction. - sf2map: add option
--powwhich outputs, for instance, a Pattterson map - contact: add option
--asusthat list asymmetric units that are in contact with 1_555,
not individual contacts. - sfcalc: Add support for writing anomalous structure factors (FCanom and PHICanom columns)
to MTZ files alongside regular structure factors. - fix: gemmi convert
--add-tlsdidn't work for mmCIF input
Python
- Added a small utility fetch.py to fetch files from the PDB:
can be used aspython -m gemmi.fetch 3abc - Added
read_structure_string() - Added bindings to misc C++ functions
- Added FlatStructure that exposes coordinate data in a single table
0.7.3
Library:
-
A breaking change in
Op:Until now
parse_triplet("h,k,l")was equivalent toparse_triplet("x,y,z").
This is not how it's handled in cctbx and in, for example Pointless.
In the Pointless documentation of the REINDEX keyword there is such a note:Note that the real and reciprocal space operators correspond to mutually transposed matrices, eg "x-y,-y,-z" corresponds to "h,-h-k,-l".
Gemmi Op was changed to store the notation kind that was used at creation,
so nowparse_triplet("h,k,l") .triplet()gives"h,k,l", not "x,y,z".
New Op methods have been added:is_hkl(),as_hkl()andas_xyz()
Apart from this, in case of hkl-Ops parse_triplet and triplet silently transpose the rotation matrix. See #359 for details.
This change was extensively tested withMtz::reindex()– against Pointless.
It could happen that it causes problems in other scenarios. -
DDL2: added alternate (deposition) checks
-
Mtz: added C++ Mtz::write_to_buffer() and Python Mtz.write_to_bytes() (used for streaming MTZ files from web servers)
and C++ size_to_write()
These new C++ functions are currently undocumented, not sure if they are useful.
Changed Mtz::write_to_string() to clear the string instead of appending to it. Probably nobody expected the latter.
Reorganized MTZ reading to avoid fseek() – so we can read a stream or gzipped file, without storing/unpacking it into memory buffer as we did before (i.e. reading huge gzipped MTZ files uses less memory) -
added UnitCell::find_nearest_pbc_images()
-
CIF reading: added optional argument
int check_level=1(see docs for details) -
Form factors: support for custom form factors (in the form of sums of 5 Gaussians) – for a Cambridge project on chemistry-dependent form factors.
-
to_mmcif: write _atom_site_anisotrop.pdbx_PDB_model_num if there are 2+ models
-
to_pdb: split option use_linkr into use_linkr and use link_id
-
the built-in residue list is now partly editable (example in docs)
-
mmcif: workaround for a problem with reading 7pvv.cif (#369)
occupancy and B_iso_or_equiv are now optional (because of #375) -
pdb: added option ignore_ter to PdbReadOptions
-
started working on secondary structure determination – that's unfinished and unusable yet
-
renamed
interpolate_grid_of_aligned_model2()tointerpolate_grid_around_model() -
added functions
interpolate_points()andinterpolate_grid_flexible()(#363), needed in Pandda2
Python:
- added python bindings to Ccp4Base::ccp4_header and to some functions related to gemmi::ChemComp
Program
- gemmi-validate: new check for monomer files, and new option
--depoto check against PDB's deposition criteria (from DDL2)
0.7.1
Library
- reading mmcif: added reading of TLS information
- writing mmcif: added a few new items
- using
Loggeralso in classesDdlandCifToMtz(Loggeris now used in all library functions that output warnings or messages) - improved and documented mmCIF validation (with DDL2)
- added
read_ccp4_header()for reading only map header when a map is a huge file - added a few functions related to TLS (will be documented later)
- documented: working with XDS files, normalization of amplitudes (F->E)
- calculating merging statistics (R-merge, R-meas, R-pim, CC1/2) in various ways: Gemmi can calculate R-merge (and other R-*) in 3 different ways that are present in the literature and other programs; for CC1/2, the sigma-tau method is used
- internal refactoring of file reading
- misc bug fixes
Python
- functions
Mtz.filtered()andXdsAscii.filtered() - a number of other additions and a few small changes/fixes
- all cif-reading function consistently read gzipped files (previously,
cif.read_file()andgemmi.read_small_structure()didn't)
Program
gemmi fprimesupports ranges of energiesgemmi merge– added new options, most importantly--statsto print quality metricsgemmi convert– option--add-tlsto convert "residual" B-factors (from Refmac) to full B-factorsgemmi mask– solvent masking that takes into account alternative conformers and atom occupancy (experimental)
0.7.0
C++14 (or later) is required to build the library, C++17 (or later) to build Python bindings.
Expect breaking changes, especially in Python bindings.
The lists below are not complete, but should cover most of the changes.
Library
- Added unified logging of warnings/errors from various gemmi functions (class Logger)
- replaced string
Model::namewith intModel::num - mmcif: better handling of null auth_comp_id
- fixes for mmJSON
- Removed deprecated functions:
- UnitCell.fractionalization_matrix and orthogonalization_matrix – use frac.mat and orth.mat
- count_hydrogen_sites() – use has_hydrogen() or count_atom_sites(gemmi.Selection('[H,D]')
- Grid::resample_to() – use interpolate_grid()
- unified API of Grid interpolation functions. They now have parameter
orderthat can be 0 (nearest value), 1 (linear interpolation), or 3 (cubic). In C++ there are also functions such as trilinear_interpolation() to ensure no overhead. - to_pdb: write HET records
- Extended selection syntax with:
[metals]and[nonmetals]. - Added function set_is_metal() intended for debatable metalloids
- improved interoperability with MMDB (a CCP4 library)
- MonLib: removed
read_cifargs - mtz: fixed writing BATCH records
- hydrogen placement: fixes needed for new files with metals in CCP4 Monomer Library
- pdb: fixed reading TLS S tensor
- Structure metadata: expanded RefinementInfo
Python
- Python bindings migrated from pybind11 to nanobind.
- Much lower runtime overhead, faster build times, better error diagnostics.
- Built-in typing stubs.
- Only Python 3.8+.
- Sadly, no support for Buffer Protocol. It was replaced with NumPy
__array__methods.
For NumPy, you can also use.arrayproperties that were available also in the previous releases. - No implicit conversions from list to ndarray, and from bytes to string (let me know where it causes problems)
- gemmi.ValueSigmaAsuData.value_array has now shape (N,2)
- Added pickling support for Structure, Model, Chain, Residue, Atom, cif.Document, cif.Block.
- Added function interpolate_position_array (#323).
- Python extension module is now installed into
site-packages/gemmi/(this change should be invisible to the user)
Program
- gemmi convert --sifts-num is now more customizable
- gemmi sf2map: added option --check (see docs)
- gemmi cif2mtz: add a rule to spec to convert
pdbx_F_calc_with_solventtoF-model(+phase) - gemmi xds2mtz: handles merged files from XSCALE
- gemmi mtz2cif and merge: recognize extension .ahkl as XDS file
0.6.7
This is primarily a bug-fix release. New Python bindings are not included yet.
Enhancements:
-
New subcommand
gemmi setfor changing coordinates, B-factors and occupancies in coordinate files (mmCIF and PDB). Unlike other tools, it replaces numbers while leaving the rest of the file intact. An alternative to CCP4 PDBSET keywords: BFACTOR, OCCUPANCY, SHIFT, NOISE. Note thatgemmi convertoffers overlapping capabilities. For instance,gemmi convert --apply-symop=x+0.123,y,zshifts the coordinates similarly togemmi set --shift='9.3 0 0'(the latter takes the shift in Angstroms). -
Improved anisotropic scaling of structure factors. More work is planned in this area.
Fixes:
- fixed reading of mmCIF files without
_atom_site.auth_seq_id - in Topology preparation: fixed a couple of bugs, peptide links are now assumed to be CIS for ω=0±60° (previously, ω=0±30°)
- fixed re-assignment of ATOM/HETATM record types (
gemmi convert --assign-records) - fixed
gemmi convert --sifts-numfor UniProt sequence numbers >5000
And various minor changes that are hard to describe concisely.
0.6.6
Library:
- SmallStructure: changed how the space group is read and accessed.
Relying on H-M space group names alone was not always sufficient. The new mechanism uses the list of operations and Hall symbol in preference to the H-M symbol – the order is configurable. - symmetry triplets: parse decimal fractions (small molecule files may use notation such as x+0.25 instead of x+1/4)
- tabulated space groups: a few more settings: B 1 2 1, B 1 21 1, F 1 m 1, F 1 d 1, F 1 2 1
- X-ray scattering coefficients: changed the default value of
IT92::ignore_chargeto true (i.e. charges are now ignored by default; before version 0.6.3 they were always ignored) - cif::Table: added method
ensure_loop()that converts tag-value pairs into a loop; might be needed before callingappend_row() - place_hydrogens(): fix for NH3-like configurations
- improved gemmi->mmdb conversion
- Grid: tweaked good_grid_size() to ensure that when creating a grid up to a certain d_min, all reflections up to d_min are in the grid (it matters when no oversampling is applied)
- DensityCalculator: deprecated function
set_grid_cell_and_spacegroup(), usegrid.setup_from() - fixed TNT-compatible reciprocal space ASU calculation for non-standard settings
- infer_polymer_end(): complicate the heuristic even more, to detect files that have HETATM incorrectly used for standard residues in a polymer (such files were reported, they are either a result of mutating from non-standard residues, or a buggy program)
- added function assign_het_flags() to re-set ATOM/HETATM flags
- Model: added funtions
calculate_b_iso_range()andcalculate_b_aniso_range(); the first one can be used to detect if pLDDT is in the range 0-100 (like from AlphaFold) or 0-1 (like from ESMFold) - writing mmCIF: write _entity_poly_seq.hetero
- added flag
Entity::reflects_microheterothat shows if sequences were read from SEQRES (and don't account for point mutations) or from _entity_poly_seq; new functionadd_microhetero_to_sequences()changes the former to the latter
Program:
- gemmi sfcalc: added a few more options
- gemmi convert: added options
--assign-records[=A|H], improved--sifts-num, adding microheterogeneities to _entity_poly_seq when converting from PDB - gemmi cifdiff: added option
-tfor basic comparison of values for a single tag
Other:
- minimal WebAssembly port (C++ code compiled with emscripten) of Structure,
as a proof-of-concept and for reading mmCIF files in UglyMol - examples/to_rdkit.py: example of conversion of gemmi ChemComp to RDKit Mol
and a number of less important changes
0.6.5
Library:
- gemmi can now be built with zlib-ng, a faster fork of zlib (good for working with large, compressed files)
- experimental: binary serialization of Structure (contained objects, such as Model, Chain or UnitCell, can also be serialized separately)
- finalized handling of 5-character monomer names; uses the tilde-hetnam extension (
ABCDE↔~DE) for PDB files - when atom names in the coordinate file match previous names (
_chem_comp_atom.alt_atom_id) from the monomer library (the names in the CCD and therefore also in the ML change occasionally), print better diagnostic; added functionMonLib::update_old_atom_names()to update the names in a Structure - topology: fixed handling of two bonds between the same two residues
- options for handling mmCIF files with incorrect entities (modified
add_entity_ids()when called withoverwrite=true) - added function
Intensities::prepare_merged_mtz() - a few bug fixes (for instance, in handling of negative residue numbers in the selection syntax)
Python bindings:
- generating type stubs - see #293
- python:
cif.Loop.val()has been replaced with__getitem__/__setitem__ - fixed
Mtz.Batch.intsandMtz.Batch.floats
Program
- subcommand diff has been renamed to cifdiff
- subcommand prep has been renamed to crd
- validate: more options for checking monomer files
- gemmi-grep: added option --extended-regexp
- mtz2cif: added column names Iplus/Iminus (used by ccp4i2) to the default conversion spec
Note: this list is meant to show important changes only.
0.6.4
Library
- completely changed build system for Python module, from setuptools to scikit-build-core
- optimized electron density calculation: single-precision version is now about 2x faster and slightly less exact; some other grid-based calculations also got optimized in the process
- as part of the above optimizations, some of the grid computations require that the model is in the standard orientation (conventional axis directions); in other cases (which are very rare after the remediation of non-standard coordinate frames in the PDB) call standardize_crystal_frame()
- CIF output: more flexible formatting
- mmCIF writing: category _entity_poly is included by default, with pdbx_strand_id and pdbx_seq_one_letter_code
- minor changes in reading mmCIF coordinate files
- cif: added functions Loop::add_columns(), Loop::remove_column(), Column::erase()
- MRC map format: ORIGIN record is ignored (previously, if ORIGIN was non-zero, Ccp4::full_cell() returned false and some map properties were not set)
- new function Grid::symmetrize_avg()
- fixed bug in ReciprocalGrid::prepare_asu_data()
- added function read_pir_or_fasta() for reading sequences (previously it was undocumented and more limited)
- added function pdbx_one_letter_code() which returns a string like AA(MSE)H…, for _entity_poly.pdbx_seq_one_letter_code
- new functions expand_one_letter() and expand_one_letter_sequence() that take ResidueKind.AA/RNA/DNA as argument replaced expand_protein_one_letter*()
- adjusted weights in align_sequence_to_polymer()
- added function assign_best_sequences()
- PDB reading: added Structure::ter_status flag to indicate if TER records were: absent, present, clearly in wrong places
- experimental (not documented yet) new functions: Model::get_cra(), Model::get_parent_of()
- Topo::Bond stores a flag for bonds between different symmetry images
- ChemComp::Atom: store _chem_comp_atom.alt_atom_id as old_id, use it in new function update_old_atom_names()
- riding hydrogens: added H had wrong occupancy in special, rare cases
- added Vec3f – Vec3 with single-precision numbers
- minor API changes: Binner::setup() doesn't return anything, changed argument types of Scaling::scale_data(), align_sequences()
Program
- new tool gemmi-diff that compares categories and tags in two (mm)CIF files
- gemmi-align prints vertical list with option --verbose
- gemmi-residues has new options: -e, -sss, --chains
- gemmi-rmsz: added option --missing to print missing atoms
- gemmi-validate: more options for validating monomer files
- gemmi-h: more options
- gemmi-mtz: prints info about SYMM records
0.6.3
- new: normalization of amplitudes using so-called "Karle" approach, similar as in the CCP4 program ECALC
- added X-ray scattering coefficients for ions (previously, the charge of atom was ignored)
- pdb: reading CONECT records, and an option to also write them
- when reading pdb, if any chain has 2+ TER records, all TER records are ignored
- more configuration options for writing pdb files
- added functions Mtz::expand_to_p1() and Mtz::read_file_gz()
- cif::Block::find_value(tag) now returns also value from the corresponding loop if that loop has only one row
- changes in gemmi-validate related to validation with DDL2
- gemmi-sfcalc: added option --sigma-cutoff
- gemmi sf2map --mapmask: if the unit cells in coordinate file is different than in SF file, use only the latter
- improved transform_to_assembly(), expand_ncs() and rename_chain()
- cif2mtz: Mtz column for pdbx_DELPHWT has now label PHDELWT (#272)
- fixed ensure_asu(): phase-shift (for phases and H-L coefficients) was wrong
- fixed UnitCell::find_nearest_image() for non-crystals with NCS
- fixed DensityCalculator::requested_grid_spacing()
- changes and enhancements in add_chemcomp_to_block(), in solvent masking, in mtz2cif,
and in several other places - added python bindings to MtzToCif, cif::Ddl, PdbWriteOptions, changed how options for PDB writing are passed, more bindings for Mtz::Batch
0.6.2
- a number of fixes, mostly in topology preparation
- support for extended (longer) CCD and PDB codes that are about to be introduced by the PDB
- gemmi-convert: added option to rename a monomer
- a few changes and additions in cif2mtz, including:
- anomalous data written as separate rows for F+ and F- is now converted as expected
- _refln.F_squared_meas is now a synonym for F_squared_meas
- gemmi-grep: new option --only-tags
- gemmi-validate: a couple of new checks and options
- pdb and mmCIF: convert MODRES <-> _pdbx_struct_mod_residue
- cif.Block: blocks with no name (just
data_) used to have the name set to "#", now it's " "