Skip to content

Commit d33ed8b

Browse files
Croydon-BrixtonkierandidinscorleyNathaniel CorleyBuddha7771
authored
docs and chore: add examples and consolidate constants (#5)
* clean: remove legacy dependencies, relax various dependency versions, update CI (#4) * clean: remove `fire` and `fastparquet` legacy dependencies, relax `biotite`, `hydrid`, `torch` and `einops` versions * ci: update github actions trigger `digs` workflow manually only. * clean: clean-up CI (#6) * ci: restrict workflow permissions (#8) * clean: clean-up CI * ci: restrict workflow permissions * refactor: disentangle CLI form package code and document PDB training example * refactor: disentangle CLI from package code * docs: update readme with PDB training example * docs: further documentation improvements * Build/numpy version (#10) * refactor: disentangle CLI from package code * docs: update readme with PDB training example * docs: further documentation improvements * build: enable numpy 2.x, pandas 2.3 * chore: ruff * docs: fix readme (#11) * refactor: consolidate `constants` and `common` (#7) Co-authored-by: Kieran Didi <58345129+kierandidi@users.noreply.github.com> * docs: initial examples (#12) * docs: load and visualize structures * chore: format * docs: update README * docs(readme): improve code block formatting * docs(readme): fix typos * Change TOC depth Just changed the TOC depth from 1 to 2 in the conf.py file to test the GitHub actions workflow. * Update README.md Updated the link to the external docs to remove 404 error. * Update README.md Changed the documentation link in the contribution section. * docs(readme): add notice * clean: remove legacy dependencies, relax various dependency versions, update CI (#4) * clean: remove `fire` and `fastparquet` legacy dependencies, relax `biotite`, `hydrid`, `torch` and `einops` versions * ci: update github actions trigger `digs` workflow manually only. * clean: clean-up CI (#6) * ci: restrict workflow permissions (#8) * clean: clean-up CI * ci: restrict workflow permissions * refactor: disentangle CLI form package code and document PDB training example * refactor: disentangle CLI from package code * docs: update readme with PDB training example * docs: further documentation improvements * Build/numpy version (#10) * refactor: disentangle CLI from package code * docs: update readme with PDB training example * docs: further documentation improvements * build: enable numpy 2.x, pandas 2.3 * chore: ruff * docs: fix readme (#11) * docs: two additional examples for the gallery --------- Co-authored-by: Nathaniel Corley <nscorley@Nathaniels-MacBook-Pro.local> Co-authored-by: Buddha7771 <hwlee7771@gmail.com> Co-authored-by: Rachel Clune <rclune4b@gmail.com> Co-authored-by: Rachel Clune <rachel.clune@omsf.io> Co-authored-by: Simon Mathis <simon.mathis@gmail.com> Co-authored-by: Kieran Didi <58345129+kierandidi@users.noreply.github.com> --------- Co-authored-by: Kieran Didi <58345129+kierandidi@users.noreply.github.com> Co-authored-by: Nathaniel Corley <nscorley@gmail.com> Co-authored-by: Nathaniel Corley <nscorley@Nathaniels-MacBook-Pro.local> Co-authored-by: Buddha7771 <hwlee7771@gmail.com> Co-authored-by: Rachel Clune <rclune4b@gmail.com> Co-authored-by: Rachel Clune <rachel.clune@omsf.io>
1 parent dc7c3d7 commit d33ed8b

File tree

88 files changed

+1291
-376
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1291
-376
lines changed

.github/workflows/lint_and_test.yaml

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ on:
44
push:
55
branches:
66
- main
7-
- dev
87
- production
98
paths:
109
- 'src/**'
@@ -14,6 +13,7 @@ on:
1413
branches:
1514
- main
1615
- dev
16+
- staging
1717
- production
1818
paths:
1919
- 'src/**'
@@ -29,12 +29,13 @@ on:
2929
concurrency:
3030
group: ${{ github.workflow }}-${{ github.ref }}
3131
cancel-in-progress: true
32+
33+
permissions:
34+
contents: read
3235

3336
jobs:
3437
lint:
35-
name: ruff (check code style)
36-
# NOTE: We use an ubuntu runner to not be dependent on possibly limited digs infra
37-
# for this tiny linting job. This means we can have many lint jobs accross repos in parallel.
38+
name: ruff
3839
runs-on: ubuntu-latest
3940
steps:
4041
- uses: actions/checkout@v4
@@ -49,23 +50,18 @@ jobs:
4950
run: pip install ruff==${{ env.RUFF_VERSION }}
5051
- name: ruff format (check code formatting)
5152
run: ruff format --diff
52-
# - name: ruff check (lint code base)
53-
# run: ruff check
53+
- name: ruff check (lint code base)
54+
run: ruff check src tests
5455

5556
test_digs:
56-
name: pytest (run tests)
57+
name: pytest (jojo)
5758
runs-on: [jojo]
5859
timeout-minutes: 30
5960
needs: lint
60-
# ... only run on non-draft PRs to `main` to avoid unnecessary CI runs
61-
# ... and only run on changed files in the `atomworks`, `tests`, or `scripts` directories
62-
if: |
63-
(github.event_name == 'pull_request' && !github.event.pull_request.draft) ||
64-
(github.event_name == 'pull_request_target' && github.event.action == 'ready_for_review')
61+
if: github.event_name == 'workflow_dispatch'
6562
steps:
6663
- uses: actions/checkout@v4
6764
- name: Run tests
68-
timeout-minutes: 30
6965
run: |
7066
export N_CPU=8
7167
srun --chdir=$PWD -p cpu -c $N_CPU -t 00:30:00 --mem=32G bash ./.github/ci/run_tests.sh
@@ -110,7 +106,7 @@ jobs:
110106
run: |
111107
atomworks setup tests
112108
113-
- name: Run pytest with multiple cores
109+
- name: Run pytest
114110
run: |
115111
export OPENBLAS_NUM_THREADS=1
116112
export OMP_NUM_THREADS=1

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,4 +161,5 @@ tests/test_outputs
161161
dev.py
162162
dev.ipynb
163163
_version.py
164-
tinker/
164+
tinker/
165+
.DS_Store

README.md

Lines changed: 215 additions & 18 deletions
Large diffs are not rendered by default.
479 KB
Loading
319 KB
Loading
204 KB
Loading
356 KB
Loading
201 KB
Loading
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
"""
2+
Annotating and Saving Protein Structures
3+
=========================================
4+
5+
This example walks through how to add custom annotations to AtomArrays, visualize them, and save them for later use.
6+
7+
**Prerequisites**: Familiarity with :doc:`load_and_visualize_structures` for basic structure loading and exploration.
8+
9+
.. figure:: /_static/examples/annotate_and_save_structures_01.png
10+
:alt: Heme pocket visualization
11+
:width: 400px
12+
13+
Visualization of heme-binding pocket atoms (within 6Å of heme ligand) in myoglobin.
14+
"""
15+
16+
########################################################################
17+
# Setup and Structure Loading
18+
# ----------------------------
19+
#
20+
# Let's start by loading a protein structure that we'll annotate. We'll use the same myoglobin structure from the loading example:
21+
22+
import os
23+
import tempfile
24+
25+
import biotite.structure as struc
26+
import numpy as np
27+
28+
from atomworks.io import parse
29+
from atomworks.io.utils.io_utils import to_cif_file
30+
from atomworks.io.utils.testing import get_pdb_path_or_buffer
31+
from atomworks.io.utils.visualize import view
32+
33+
# sphinx_gallery_thumbnail_path = '_static/examples/annotate_and_save_structures_01.png'
34+
35+
# Load myoglobin structure with heme
36+
example_pdb_id = "101m" # Myoglobin with heme
37+
pdb_path = get_pdb_path_or_buffer(example_pdb_id)
38+
39+
# Parse the structure (no need to add missing atoms, since we would just remove them in the following step)
40+
atom_array = parse(pdb_path, add_missing_atoms=False, fix_formal_charges=False)["assemblies"]["1"][0]
41+
42+
print(f"Loaded structure with {len(atom_array)} atoms")
43+
print(f"Chains: {np.unique(atom_array.chain_id)}")
44+
45+
# Clean up coordinates (remove any NaN values, if present)
46+
# (NaN coordinates will break our later step when we create a CellList with Biotite)
47+
valid_coords_mask = ~np.isnan(atom_array.coord).any(axis=1)
48+
atom_array = atom_array[valid_coords_mask]
49+
print(f"After removing NaN coordinates: {len(atom_array)} atoms")
50+
51+
########################################################################
52+
# Adding Custom Annotations
53+
# --------------------------
54+
#
55+
# Now let's add custom annotations to mark different types of atoms. We'll use pocket identification as an example to demonstrate how to create meaningful structural annotations for many ML and general bioinformatics applications.
56+
#
57+
# Step 1: Identify Structural Features (Pocket Identification)
58+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
59+
#
60+
# Let's efficiently identify the heme-binding pocket using spatial distance cutoffs with Biotite's ``CellList`` class:
61+
62+
# Find atoms within 6 Angstroms of the heme using a spatial cell list
63+
cell_list = struc.CellList(atom_array.coord, cell_size=6.0)
64+
heme_coords = atom_array.coord[atom_array.res_name == "HEM"]
65+
66+
print(f"Found {len(heme_coords)} heme atoms")
67+
68+
# Get all atoms within 6Å of any heme atom
69+
pocket_mask = cell_list.get_atoms(heme_coords, 6.0, as_mask=True)
70+
pocket_mask = np.any(pocket_mask, axis=0) # Combine results for all heme atoms
71+
72+
print(f"Found {np.sum(pocket_mask)} atoms within 6Å of heme")
73+
74+
# %%
75+
76+
# Visualize the pocket region (always a helpful sanity-check, and trivial with AtomWorks)
77+
print("\nVisualizing pocket region (all atoms within 6Å of heme):")
78+
view(atom_array[pocket_mask])
79+
80+
########################################################################
81+
# .. figure:: /_static/examples/annotate_and_save_structures_01.png
82+
# :alt: Heme pocket visualization
83+
84+
########################################################################
85+
# Step 2: Create Annotations from Identified Features
86+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
87+
#
88+
# Now we'll convert our pocket identification into an explicit ``AtomArray`` annotation and visualize it:
89+
90+
# Boolean annotation for pocket residues (excluding heme itself)
91+
is_pocket = pocket_mask & (atom_array.res_name != "HEM")
92+
atom_array.set_annotation("is_hem_pocket", is_pocket.astype(bool))
93+
94+
# Boolean annotation for heme atoms
95+
is_heme = atom_array.res_name == "HEM"
96+
atom_array.set_annotation("is_heme", is_heme.astype(bool))
97+
98+
print(f" - Pocket atoms: {np.sum(atom_array.is_hem_pocket)}")
99+
print(f" - Heme atoms: {np.sum(atom_array.is_heme)}")
100+
101+
# %%
102+
103+
# Visualize just the pocket residues
104+
print("\nVisualizing annotated pocket residues:")
105+
view(atom_array[atom_array.is_hem_pocket])
106+
107+
########################################################################
108+
# .. figure:: /_static/examples/annotate_and_save_structures_02.png
109+
# :alt: Annotated pocket residues visualization
110+
111+
########################################################################
112+
# Saving Annotated Structures
113+
# ----------------------------
114+
#
115+
# Now let's save our annotated structure. In many use cases we may want to save our modified ``AtomArray`` to disk and later load again, preserving our original annotations.
116+
#
117+
# AtomWorks provides two methods to do so:
118+
#
119+
# .. list-table::
120+
# :header-rows: 0
121+
#
122+
# * - Saving to CIF, adding extra annotations directly into the file
123+
# * - Standard Python object pickling (which may be sensitive to versions, libraries, etc.)
124+
#
125+
# Saving to CIF Files
126+
# ~~~~~~~~~~~~~~~~~~~
127+
#
128+
# CIF files are the standard for structural data and allow us to store arbitrary annotations and categories.
129+
130+
# Create temporary directory for our files
131+
temp_dir = tempfile.mkdtemp()
132+
print(f"Working in temporary directory: {temp_dir}")
133+
134+
# Save to CIF file with custom annotations specified
135+
cif_path = os.path.join(temp_dir, "annotated_structure.cif")
136+
custom_fields = ["is_hem_pocket", "is_heme"]
137+
138+
saved_cif_path = to_cif_file(
139+
atom_array,
140+
cif_path,
141+
extra_fields=custom_fields,
142+
)
143+
144+
print(f"Saved CIF file to: {saved_cif_path}")
145+
print(f"File size: {os.path.getsize(saved_cif_path) / 1024:.1f} KB")
146+
147+
########################################################################
148+
# Note on Biological Assemblies and CIF Saving
149+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
150+
#
151+
# In some cases, you may find that ``to_cif_file`` reports an error when the structure represents a biological assembly containing multiple copies of the asymmetric unit. The reason for this error is that ``AtomWorks`` builds the biological assembly and explicitly represents every atom; we can't then reverse that process since we may be left with ambiguous bond annotations (e.g., no way to distinguish between multiple copies of "Chain A"). The best solution is to either (a) set the ``chain_id`` to the ``chain_iid`` (which resolves the ambiguity) or (b) simply save the object using a pickle.
152+
#
153+
# More rigorous solutions exist; a helpful place for contributions!
154+
#
155+
# Alternative Storage Options
156+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
157+
#
158+
# For Python-specific workflows, you can also save structures as pickle files to preserve exact data types, though CIF files are recommended for interoperability and long-term storage.
159+
160+
########################################################################
161+
# Loading Annotated Structures
162+
# -----------------------------
163+
#
164+
# When we load pickled ``AtomArray``'s, we should restore our original object out-of-the-box with all annotations preserved.
165+
#
166+
# When loading from CIF, however, we may need to grapple with data type issues, since within CIF files all fields are considered strings.
167+
#
168+
# In the future, we would like to automatically detect annotation data types during loading (and/or allow specification of data types) - we would love contributions and a PR!
169+
#
170+
# Loading from CIF Files
171+
# ~~~~~~~~~~~~~~~~~~~~~~
172+
173+
from atomworks.io.utils.io_utils import load_any
174+
175+
# Load from CIF file
176+
loaded_from_cif = load_any(saved_cif_path, extra_fields="all")[0]
177+
178+
print("Loaded from CIF file:")
179+
print(f" Atoms: {len(loaded_from_cif)}")
180+
print(" Custom annotations:")
181+
for annotation in loaded_from_cif.get_annotation_categories():
182+
if annotation in custom_fields:
183+
dtype = getattr(loaded_from_cif, annotation).dtype
184+
print(f" ✓ {annotation} ({dtype})")
185+
186+
########################################################################
187+
# Handling Data Type Conversions
188+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189+
#
190+
# As we can see above, when boolean annotations are saved to CIF files, they become string representations ("True"/"False"). Here's how to convert them back (we welcome contributions to automate this process and/or allow explicit specification):
191+
192+
193+
# Convert string booleans back to actual boolean type
194+
def fix_boolean_annotation(atom_array: struc.AtomArray, annotation_name: str) -> struc.AtomArray:
195+
"""Convert string boolean annotations back to bool type."""
196+
string_values = getattr(atom_array, annotation_name)
197+
boolean_values = string_values == "True"
198+
atom_array.del_annotation(annotation_name)
199+
atom_array.set_annotation(annotation_name, boolean_values)
200+
return atom_array
201+
202+
203+
# Fix boolean annotations
204+
loaded_from_cif = fix_boolean_annotation(loaded_from_cif, "is_hem_pocket")
205+
loaded_from_cif = fix_boolean_annotation(loaded_from_cif, "is_heme")
206+
207+
print("\nAfter conversion:")
208+
print(f" is_hem_pocket: {loaded_from_cif.is_hem_pocket.dtype}, {np.sum(loaded_from_cif.is_hem_pocket)} True values")
209+
print(f" is_heme: {loaded_from_cif.is_heme.dtype}, {np.sum(loaded_from_cif.is_heme)} True values")
210+
print(f" Sample values: {loaded_from_cif.is_hem_pocket[:3]}")
211+
212+
# %%
213+
214+
# Clean up temporary files
215+
import shutil
216+
217+
shutil.rmtree(temp_dir)
218+
print(f"✓ Cleaned up temporary directory: {temp_dir}")
219+
print("✓ Successfully demonstrated structure annotation, saving, and loading!")
220+
221+
########################################################################
222+
# Related Examples
223+
# ----------
224+
#
225+
# - :doc:`pocket_conditioning_transform` - Create custom transforms for ligand pocket identification and ML feature generation

0 commit comments

Comments
 (0)