Skip to content

Sync draco-postproc updates from AFCDB toolkit#26

Open
NAEV95 wants to merge 1 commit into
mainfrom
draco-postproc-sync
Open

Sync draco-postproc updates from AFCDB toolkit#26
NAEV95 wants to merge 1 commit into
mainfrom
draco-postproc-sync

Conversation

@NAEV95
Copy link
Copy Markdown

@NAEV95 NAEV95 commented May 14, 2026

Summary

Syncs the AFDB-Integration-Kit/ folder from the internal AFCDB draco-postproc branch into this repository. Touches 86 files with the new tooling developed downstream:

  • GPU interface analysis pipeline under afdb_integration_kit/gpu/ (clash detection, interface analysis, batch driver, schema)
  • ipSAE C++ implementation under afdb_integration_kit/ipsae/ (Makefile now auto-fetches Eigen; no vendored dependency tree)
  • Manifest tooling under afdb_integration_kit/manifest/
  • ModelPDB generator wired into Nextflow workflow
  • UniProt batching scripts (batch_convert_colabfold.py, batch_export_metadata.py, batch_export_modelcif_input.py, batch_ipsae.py, batch_validate_assets.py) and new ModelCIF templates
  • Validation parallelization (validators/_parallel.py)
  • Interface annotations module
  • New multi-batch end-to-end Nextflow workflow with validation
  • README, schema, validator, and uv.lock updates

Intentionally excluded

  • slurm-scaling/ — internal HPC orchestration code, not relevant upstream
  • mock_data/ — bulky local fixtures
  • assets/ — internal pipeline diagrams
  • afdb_integration_kit/ipsae/deps/eigen-3.4.0/ — Eigen is now fetched on demand by the Makefile rather than vendored

Test plan

  • make builds ipsae_cpp from a clean checkout (verifying the Eigen auto-fetch step)
  • make check reports missing Eigen prior to make deps
  • Existing Python tests under tests/ still pass
  • Nextflow workflow/end_to_end_with_validation_multibatch.nf runs on the example inputs

Sync the AFDB-Integration-Kit folder from the internal AFCDB
draco-postproc branch. Includes the new GPU interface analysis
pipeline, ipSAE C++ implementation, manifest tooling, ModelPDB
generator, UniProt batching scripts, validation parallelization,
and updated examples / Nextflow workflows.

The internal slurm-scaling/, mock_data/, and assets/ directories
are intentionally excluded from this sync. The vendored Eigen
3.4.0 tree under afdb_integration_kit/ipsae/deps/ is also
excluded; the ipSAE Makefile now fetches Eigen on demand.
@NAEV95 NAEV95 requested a review from mitsenkov May 14, 2026 11:49

# Get raw arrays
res_name_raw = arr.res_name
atom_name_all = arr.atom_name
# Get raw arrays
res_name_raw = arr.res_name
atom_name_all = arr.atom_name
chain_all = arr.chain_id
res_name_raw = arr.res_name
atom_name_all = arr.atom_name
chain_all = arr.chain_id
res_id_all = arr.res_id.astype(np.int32, copy=False)
atom_name_all = arr.atom_name
chain_all = arr.chain_id
res_id_all = arr.res_id.astype(np.int32, copy=False)
coord_all = arr.coord
return # already materialized or built directly

p = result._protein
n_res = p.n_residues

# Move to device
coords_t = torch.tensor(coords_valid, dtype=torch.float32, device=device)
vdw_t = torch.tensor(vdw_valid, dtype=torch.float32, device=device)
valid = ca_mask_flat
ca_valid = ca_flat[valid]
chain_valid = chain_flat[valid]
res_id_valid = res_id_flat[valid]
chain_flat = chain_ids.reshape(B * N)

protein_idx = torch.arange(B, device=device).repeat_interleave(N)
res_idx = torch.arange(N, device=device).repeat(B)
logger.debug(f"Directory scan found {len(file_map)} matched pairs", indent=1)

# Split into found / missing
model_id_set = set(model_ids)

# Create symlinks for PDB files and meta JSON files in the same directory
# ipsae_cpp expects *-model_v1.pdb and *-meta_v1.json pairs
start = time.time()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant