You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-10Lines changed: 21 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,8 @@ A fork with full Lightning + Hydra support is being maintained by <a href="https
28
28
29
29
- <ahref="https://github.com/amorehead">Alex</a> for the PDB dataset preparation script!
30
30
31
+
- <ahref="https://github.com/milot-mirdita">Milot</a> for optimizing the PDB dataset clustering script!
32
+
31
33
- <ahref="https://github.com/patrick-kidger">Patrick</a> for <ahref="https://docs.kidger.site/jaxtyping/">jaxtyping</a>, <ahref="https://github.com/fferflo">Florian</a> for <ahref="https://github.com/fferflo/einx">einx</a>, and of course, <ahref="https://github.com/arogozhnikov">Alex</a> for <ahref="https://einops.rocks/">einops</a>
To acquire the AlphaFold 3 PDB dataset, first download all complexes in the Protein Data Bank (PDB), and then preprocess them with the script referenced below. The PDB can be downloaded from the RCSB: https://www.wwpdb.org/ftp/pdb-ftp-sites#rcsbpdb. The Python script below (i.e., `filter_pdb_mmcifs.py`) assumes you have downloaded the PDB in the **mmCIF file format**, placing it at `data/pdb_data/unfiltered_mmcifs/`. On the RCSB website, navigate down to "Download Protocols", and follow the download instructions depending on your location.
203
+
To acquire the AlphaFold 3 PDB dataset, first download all first-assembly (and asymmetric unit) complexes in the Protein Data Bank (PDB), and then preprocess them with the script referenced below. The PDB can be downloaded from the RCSB: https://www.wwpdb.org/ftp/pdb-ftp-sites#rcsbpdb. The Python script below (i.e., `filter_pdb_mmcifs.py`) assumes you have downloaded the PDB in the **mmCIF file format**, placing it at `data/pdb_data/unfiltered_assembly_mmcifs/` (and `data/pdb_data/unfiltered_asym_mmcifs/`, respectively). On the RCSB website, navigate down to "Download Protocols", and follow the download instructions depending on your location.
202
204
203
-
For example, one can use the following command to download the PDB as a collection of mmCIF files:
205
+
For example, one can use the following commands to download the PDB as a collection of mmCIF files:
Then run the following with `pdb_dir`, `ccd_dir`, and `mmcif_output_dir` replaced with the locations of your local copies of the PDB, CCD, and your desired dataset output directory (i.e., `./data/pdb_data/unfiltered_mmcifs/`, `./data/ccd_data/`, and `./data/pdb_data/mmcifs/`).
249
+
Then run the following with `pdb_assembly_dir`, `pdb_asym_dir`, `ccd_dir`, and `mmcif_output_dir` replaced with the locations of your local copies of the first-assembly PDB, asymmetric unit PDB, CCD, and your desired dataset output directory (i.e., `./data/pdb_data/unfiltered_assembly_mmcifs/`, `./data/pdb_data/unfiltered_asym_mmcifs/`, `./data/ccd_data/`, and `./data/pdb_data/mmcifs/`).
0 commit comments