Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.9
python-version: 3.11
- name: Install proteka requirements
run: pip install -r ${{ github.workspace }}/requirements.txt
- name: Install Python dependencies
run: pip install sphinx sphinx_rtd_theme sphinx-autodoc-typehints
- name: Build Documentation
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.8
python-version: 3.11
- name: Install Python dependencies
run: pip install black
- name: Run linters
Expand Down
4 changes: 3 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ proteka


.. start-intro
Library for comparing and benchmarking protein models
Library for comparing and benchmarking protein models. In particular
it contains an implementation for computing the fraction of native contacts
using as a reference a CG pdb.


.. end-intro
Expand Down
67 changes: 63 additions & 4 deletions docs/source/ensemble.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,66 @@
##########
Ensemble
=========
##########

Ensemble class
----------
*************
Introduction
*************

.. autoclass:: proteka.dataset.Ensemble

About ``Quantity``
===================

A ``Quantity`` wraps a ``numpy.ndarray`` and a ``unit`` (defined in ``proteka.dataset.unit_quantity``). Assigning a ``Quantity`` to an ``Ensemble`` either during initialization or via the dot (.) notation as an attribute:

- If the input is a plain ``numpy.ndarray``, then the unit is assumed as"dimensionless"
- If the input is a ``Quantity``, the input unit will be stored

Retrieving saved ``Quantity``:

- Accessing as an attribute (via dot (.)): returns a ``numpy.ndarray`` with the value of the ``Quantity`` in unit of the stored unit
- Via index bracket ([]): returns the stored ``Quantity`` object, allowing flexible automated unit conversion

List stored quantities: ``.list_quantities()``

* Special cases are "builtin quantities", whose stored units are dictated by the ``unit_system`` (also used instead of the default "dimensionless" during assignment):
* "coords" (ATOMIC_VECTOR): [L]
* "time" (*per-frame* SCALAR): [T]
* "forces" (ATOMIC_VECTOR): [E]/[L]
* "velocities" (ATOMIC_VECTOR): [L]/[T]
* "cell_lengths" (BOX_QUANTITIES): [L]
* "cell_angles": (BOX_QUANTITIES): degree
In addition, the above quantities are tied to the system molecule via the shape,
i.e., each *per-frame* quantity having the same number of frames as ``self.coords``,
and correspond to the same number of atoms as indicated by ``self.top``, if it is an
_ATOMIC_VECTOR_.

Trajectories:
=============

Storing the information about which samples contained in the ``Ensemble`` come from
which trajectory.
Trajectories are sequential. Therefore, samples from different trajectories are
expected to be non-overlapping slices.
Trajectories info is supposed to be stored either during the Ensemble initialization
or after with ``.register_trjs`` method.

Properties
===========

- ``.n_trjs`` (int): number of trajectories
- ``.n_frames_per_trj`` (Dict[str, int]): dictionary of number of frames in each trajectory
- ``.trajectory_slices`` or ``.trjs`` or ``.trajectories`` (Dict[str, slice]): Python ``slice``s for slicing Ensemble quantities according to the ``.trjs`` records
- ``.trj_indices`` (Dict[str, np.ndarray]): indices for different trajectoriesaccording to the ``.trjs`` records

``mdtraj`` interface:
======================

- ``.get_mdtraj_trjs()`` (-> Dict[str, mdtraj.Trajectory]): pack an ``Ensemble``'s ``top``and ``coords`` (and unitcell + simulation times, if available) into a dictionary of ``mdtraj.Trajectory`` for analyses according to ``self.trjs``
- ``.get_all_in_one_mdtraj_trj()``: pack all ``coords`` into one ``Trajectory`` object (maybe not suitable for kinetic analyses, such as TICA and MSM!)


Implementations
===============

.. autoclass:: proteka.dataset.Ensemble
:members:
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Welcome to proteka's documentation!

.. include:: ../../README.rst
:start-after: start-intro
:end-before: end-intro
:end-before: end-install



Expand Down
26 changes: 12 additions & 14 deletions examples/example_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,17 @@
"cells": [
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "6719dbdd",
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append(\"..\")\n",
"import proteka"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "54605c68",
"metadata": {},
"outputs": [],
Expand All @@ -25,7 +23,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "66af79e0",
"metadata": {},
"outputs": [],
Expand All @@ -40,7 +38,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "187cb237",
"metadata": {},
"outputs": [],
Expand All @@ -58,7 +56,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "eb744742",
"metadata": {},
"outputs": [],
Expand All @@ -72,7 +70,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"id": "8874492e",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -278,7 +276,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 24,
"id": "a6d7fc0c",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -318,7 +316,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 26,
"id": "df33a77a",
"metadata": {},
"outputs": [],
Expand All @@ -331,7 +329,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 27,
"id": "da96a338",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -369,7 +367,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 30,
"id": "19365d05",
"metadata": {},
"outputs": [],
Expand All @@ -390,7 +388,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "mlcg-torch21",
"language": "python",
"name": "python3"
},
Expand All @@ -404,7 +402,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.11.6"
}
},
"nbformat": 4,
Expand Down
Binary file not shown.
167 changes: 167 additions & 0 deletions examples/example_dataset_files/cln_amber_300K_mini/traj_0.pdb
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
MODEL 0
ATOM 1 N TYR A 2 0.697 0.874 8.347 1.00 0.00 N
ATOM 2 H TYR A 2 1.688 0.951 8.524 1.00 0.00 H
ATOM 3 CA TYR A 2 0.203 0.027 7.253 1.00 0.00 C
ATOM 4 HA TYR A 2 -0.253 0.665 6.496 1.00 0.00 H
ATOM 5 CB TYR A 2 1.483 -0.694 6.647 1.00 0.00 C
ATOM 6 HB3 TYR A 2 1.849 -1.420 7.373 1.00 0.00 H
ATOM 7 HB2 TYR A 2 2.195 0.112 6.471 1.00 0.00 H
ATOM 8 CG TYR A 2 1.023 -1.478 5.409 1.00 0.00 C
ATOM 9 CD1 TYR A 2 1.152 -0.963 4.116 1.00 0.00 C
ATOM 10 HD1 TYR A 2 1.399 0.068 3.913 1.00 0.00 H
ATOM 11 CE1 TYR A 2 0.693 -1.657 2.985 1.00 0.00 C
ATOM 12 HE1 TYR A 2 0.596 -1.233 1.996 1.00 0.00 H
ATOM 13 CZ TYR A 2 0.174 -2.952 3.147 1.00 0.00 C
ATOM 14 OH TYR A 2 -0.374 -3.567 2.069 1.00 0.00 O
ATOM 15 HH TYR A 2 -0.628 -4.451 2.344 1.00 0.00 H
ATOM 16 CE2 TYR A 2 0.082 -3.533 4.430 1.00 0.00 C
ATOM 17 HE2 TYR A 2 -0.150 -4.583 4.532 1.00 0.00 H
ATOM 18 CD2 TYR A 2 0.529 -2.837 5.570 1.00 0.00 C
ATOM 19 HD2 TYR A 2 0.590 -3.351 6.518 1.00 0.00 H
ATOM 20 C TYR A 2 -0.858 -1.042 7.657 1.00 0.00 C
ATOM 21 O TYR A 2 -0.664 -1.913 8.550 1.00 0.00 O
ATOM 22 N TYR A 3 -1.920 -1.034 6.869 1.00 0.00 N
ATOM 23 H TYR A 3 -1.985 -0.361 6.119 1.00 0.00 H
ATOM 24 CA TYR A 3 -2.971 -2.122 6.948 1.00 0.00 C
ATOM 25 HA TYR A 3 -2.503 -3.022 7.346 1.00 0.00 H
ATOM 26 CB TYR A 3 -4.019 -1.738 8.011 1.00 0.00 C
ATOM 27 HB3 TYR A 3 -3.562 -1.165 8.818 1.00 0.00 H
ATOM 28 HB2 TYR A 3 -4.460 -2.639 8.437 1.00 0.00 H
ATOM 29 CG TYR A 3 -5.120 -0.872 7.518 1.00 0.00 C
ATOM 30 CD1 TYR A 3 -4.902 0.511 7.533 1.00 0.00 C
ATOM 31 HD1 TYR A 3 -3.991 0.905 7.958 1.00 0.00 H
ATOM 32 CE1 TYR A 3 -5.897 1.355 7.035 1.00 0.00 C
ATOM 33 HE1 TYR A 3 -5.692 2.405 6.891 1.00 0.00 H
ATOM 34 CZ TYR A 3 -7.067 0.752 6.575 1.00 0.00 C
ATOM 35 OH TYR A 3 -8.094 1.515 6.199 1.00 0.00 O
ATOM 36 HH TYR A 3 -7.943 2.414 6.497 1.00 0.00 H
ATOM 37 CE2 TYR A 3 -7.290 -0.672 6.540 1.00 0.00 C
ATOM 38 HE2 TYR A 3 -8.176 -1.015 6.028 1.00 0.00 H
ATOM 39 CD2 TYR A 3 -6.294 -1.521 7.012 1.00 0.00 C
ATOM 40 HD2 TYR A 3 -6.393 -2.597 7.032 1.00 0.00 H
ATOM 41 C TYR A 3 -3.631 -2.482 5.621 1.00 0.00 C
ATOM 42 O TYR A 3 -4.082 -3.649 5.485 1.00 0.00 O
ATOM 43 N ASP A 4 -3.671 -1.591 4.653 1.00 0.00 N
ATOM 44 H ASP A 4 -3.398 -0.629 4.791 1.00 0.00 H
ATOM 45 CA ASP A 4 -4.345 -1.792 3.323 1.00 0.00 C
ATOM 46 HA ASP A 4 -4.745 -2.806 3.297 1.00 0.00 H
ATOM 47 CB ASP A 4 -5.638 -0.903 3.320 1.00 0.00 C
ATOM 48 HB3 ASP A 4 -5.427 0.154 3.483 1.00 0.00 H
ATOM 49 HB2 ASP A 4 -6.245 -1.354 4.105 1.00 0.00 H
ATOM 50 CG ASP A 4 -6.354 -1.138 1.963 1.00 0.00 C
ATOM 51 OD1 ASP A 4 -6.107 -0.391 1.019 1.00 0.00 O
ATOM 52 OD2 ASP A 4 -7.212 -2.090 1.882 1.00 0.00 O
ATOM 53 C ASP A 4 -3.429 -1.517 2.043 1.00 0.00 C
ATOM 54 O ASP A 4 -2.702 -0.552 2.158 1.00 0.00 O
ATOM 55 N PRO A 5 -3.532 -2.242 1.001 1.00 0.00 N
ATOM 56 CD PRO A 5 -4.234 -3.537 0.902 1.00 0.00 C
ATOM 57 HD3 PRO A 5 -5.282 -3.380 0.649 1.00 0.00 H
ATOM 58 HD2 PRO A 5 -4.192 -4.085 1.843 1.00 0.00 H
ATOM 59 CG PRO A 5 -3.538 -4.185 -0.295 1.00 0.00 C
ATOM 60 HG3 PRO A 5 -4.245 -4.858 -0.778 1.00 0.00 H
ATOM 61 HG2 PRO A 5 -2.612 -4.715 -0.070 1.00 0.00 H
ATOM 62 CB PRO A 5 -3.221 -3.010 -1.245 1.00 0.00 C
ATOM 63 HB3 PRO A 5 -4.115 -2.702 -1.787 1.00 0.00 H
ATOM 64 HB2 PRO A 5 -2.397 -3.348 -1.873 1.00 0.00 H
ATOM 65 CA PRO A 5 -2.882 -1.907 -0.254 1.00 0.00 C
ATOM 66 HA PRO A 5 -1.839 -2.196 -0.123 1.00 0.00 H
ATOM 67 C PRO A 5 -2.846 -0.444 -0.870 1.00 0.00 C
ATOM 68 O PRO A 5 -1.827 -0.053 -1.445 1.00 0.00 O
ATOM 69 N GLU A 6 -3.823 0.460 -0.597 1.00 0.00 N
ATOM 70 H GLU A 6 -4.535 0.185 0.065 1.00 0.00 H
ATOM 71 CA GLU A 6 -3.968 1.895 -0.953 1.00 0.00 C
ATOM 72 HA GLU A 6 -3.220 2.128 -1.712 1.00 0.00 H
ATOM 73 CB GLU A 6 -5.301 2.249 -1.559 1.00 0.00 C
ATOM 74 HB3 GLU A 6 -5.428 3.324 -1.688 1.00 0.00 H
ATOM 75 HB2 GLU A 6 -6.051 1.926 -0.837 1.00 0.00 H
ATOM 76 CG GLU A 6 -5.720 1.507 -2.813 1.00 0.00 C
ATOM 77 HG3 GLU A 6 -5.631 0.463 -2.512 1.00 0.00 H
ATOM 78 HG2 GLU A 6 -5.076 1.782 -3.649 1.00 0.00 H
ATOM 79 CD GLU A 6 -7.172 1.762 -3.122 1.00 0.00 C
ATOM 80 OE1 GLU A 6 -8.078 1.281 -2.337 1.00 0.00 O
ATOM 81 OE2 GLU A 6 -7.481 2.412 -4.128 1.00 0.00 O
ATOM 82 C GLU A 6 -3.599 2.788 0.334 1.00 0.00 C
ATOM 83 O GLU A 6 -3.918 3.980 0.379 1.00 0.00 O
ATOM 84 N THR A 7 -2.957 2.162 1.373 1.00 0.00 N
ATOM 85 H THR A 7 -2.927 1.157 1.282 1.00 0.00 H
ATOM 86 CA THR A 7 -2.149 2.713 2.423 1.00 0.00 C
ATOM 87 HA THR A 7 -2.224 3.800 2.460 1.00 0.00 H
ATOM 88 CB THR A 7 -2.419 2.408 3.905 1.00 0.00 C
ATOM 89 HB THR A 7 -1.730 2.947 4.555 1.00 0.00 H
ATOM 90 CG2 THR A 7 -3.765 3.007 4.399 1.00 0.00 C
ATOM 91 HG21 THR A 7 -4.510 2.736 3.650 1.00 0.00 H
ATOM 92 HG22 THR A 7 -4.005 2.672 5.407 1.00 0.00 H
ATOM 93 HG23 THR A 7 -3.705 4.093 4.329 1.00 0.00 H
ATOM 94 OG1 THR A 7 -2.440 1.074 4.204 1.00 0.00 O
ATOM 95 HG1 THR A 7 -2.446 0.631 3.352 1.00 0.00 H
ATOM 96 C THR A 7 -0.616 2.469 2.284 1.00 0.00 C
ATOM 97 O THR A 7 0.168 3.240 2.846 1.00 0.00 O
ATOM 98 N GLY A 8 -0.258 1.564 1.431 1.00 0.00 N
ATOM 99 H GLY A 8 -1.014 1.008 1.060 1.00 0.00 H
ATOM 100 CA GLY A 8 1.124 1.660 0.790 1.00 0.00 C
ATOM 101 HA3 GLY A 8 1.569 0.670 0.687 1.00 0.00 H
ATOM 102 HA2 GLY A 8 1.866 2.151 1.420 1.00 0.00 H
ATOM 103 C GLY A 8 1.192 2.369 -0.518 1.00 0.00 C
ATOM 104 O GLY A 8 0.232 3.133 -0.927 1.00 0.00 O
ATOM 105 N THR A 9 2.353 2.284 -1.185 1.00 0.00 N
ATOM 106 H THR A 9 3.127 1.752 -0.814 1.00 0.00 H
ATOM 107 CA THR A 9 2.595 3.201 -2.332 1.00 0.00 C
ATOM 108 HA THR A 9 1.679 3.698 -2.653 1.00 0.00 H
ATOM 109 CB THR A 9 3.757 4.171 -2.103 1.00 0.00 C
ATOM 110 HB THR A 9 4.745 3.713 -2.066 1.00 0.00 H
ATOM 111 CG2 THR A 9 3.891 5.316 -3.098 1.00 0.00 C
ATOM 112 HG21 THR A 9 4.482 4.955 -3.940 1.00 0.00 H
ATOM 113 HG22 THR A 9 2.939 5.690 -3.475 1.00 0.00 H
ATOM 114 HG23 THR A 9 4.520 6.111 -2.698 1.00 0.00 H
ATOM 115 OG1 THR A 9 3.604 4.769 -0.881 1.00 0.00 O
ATOM 116 HG1 THR A 9 2.742 5.189 -0.930 1.00 0.00 H
ATOM 117 C THR A 9 2.997 2.380 -3.563 1.00 0.00 C
ATOM 118 O THR A 9 2.346 2.504 -4.593 1.00 0.00 O
ATOM 119 N TRP A 10 4.065 1.566 -3.441 1.00 0.00 N
ATOM 120 H TRP A 10 4.564 1.426 -2.574 1.00 0.00 H
ATOM 121 CA TRP A 10 4.632 0.820 -4.506 1.00 0.00 C
ATOM 122 HA TRP A 10 4.463 1.270 -5.484 1.00 0.00 H
ATOM 123 CB TRP A 10 6.165 0.770 -4.339 1.00 0.00 C
ATOM 124 HB3 TRP A 10 6.402 -0.034 -3.643 1.00 0.00 H
ATOM 125 HB2 TRP A 10 6.528 1.772 -4.113 1.00 0.00 H
ATOM 126 CG TRP A 10 6.960 0.376 -5.609 1.00 0.00 C
ATOM 127 CD1 TRP A 10 7.119 -0.869 -6.130 1.00 0.00 C
ATOM 128 HD1 TRP A 10 6.778 -1.774 -5.651 1.00 0.00 H
ATOM 129 NE1 TRP A 10 7.779 -0.804 -7.359 1.00 0.00 N
ATOM 130 HE1 TRP A 10 7.992 -1.617 -7.920 1.00 0.00 H
ATOM 131 CE2 TRP A 10 7.973 0.518 -7.676 1.00 0.00 C
ATOM 132 CZ2 TRP A 10 8.568 1.174 -8.794 1.00 0.00 C
ATOM 133 HZ2 TRP A 10 8.892 0.655 -9.683 1.00 0.00 H
ATOM 134 CH2 TRP A 10 8.763 2.567 -8.748 1.00 0.00 C
ATOM 135 HH2 TRP A 10 9.058 3.114 -9.632 1.00 0.00 H
ATOM 136 CZ3 TRP A 10 8.441 3.335 -7.602 1.00 0.00 C
ATOM 137 HZ3 TRP A 10 8.655 4.393 -7.573 1.00 0.00 H
ATOM 138 CE3 TRP A 10 7.817 2.678 -6.581 1.00 0.00 C
ATOM 139 HE3 TRP A 10 7.533 3.182 -5.669 1.00 0.00 H
ATOM 140 CD2 TRP A 10 7.558 1.276 -6.571 1.00 0.00 C
ATOM 141 C TRP A 10 3.994 -0.525 -4.482 1.00 0.00 C
ATOM 142 O TRP A 10 3.234 -0.972 -3.595 1.00 0.00 O
ATOM 143 N TYR A 11 4.174 -1.214 -5.574 1.00 0.00 N
ATOM 144 H TYR A 11 4.834 -0.815 -6.227 1.00 0.00 H
ATOM 145 CA TYR A 11 3.729 -2.550 -5.854 1.00 0.00 C
ATOM 146 HA TYR A 11 2.652 -2.586 -5.691 1.00 0.00 H
ATOM 147 CB TYR A 11 4.039 -2.927 -7.325 1.00 0.00 C
ATOM 148 HB3 TYR A 11 3.674 -3.896 -7.665 1.00 0.00 H
ATOM 149 HB2 TYR A 11 5.119 -3.062 -7.387 1.00 0.00 H
ATOM 150 CG TYR A 11 3.570 -1.911 -8.392 1.00 0.00 C
ATOM 151 CD1 TYR A 11 2.204 -1.686 -8.603 1.00 0.00 C
ATOM 152 HD1 TYR A 11 1.534 -2.338 -8.062 1.00 0.00 H
ATOM 153 CE1 TYR A 11 1.795 -0.696 -9.546 1.00 0.00 C
ATOM 154 HE1 TYR A 11 0.763 -0.470 -9.773 1.00 0.00 H
ATOM 155 CZ TYR A 11 2.814 0.043 -10.174 1.00 0.00 C
ATOM 156 OH TYR A 11 2.461 0.981 -11.047 1.00 0.00 O
ATOM 157 HH TYR A 11 3.275 1.483 -11.140 1.00 0.00 H
ATOM 158 CE2 TYR A 11 4.111 -0.270 -10.017 1.00 0.00 C
ATOM 159 HE2 TYR A 11 4.851 0.311 -10.547 1.00 0.00 H
ATOM 160 CD2 TYR A 11 4.602 -1.205 -9.078 1.00 0.00 C
ATOM 161 HD2 TYR A 11 5.667 -1.338 -8.957 1.00 0.00 H
ATOM 162 C TYR A 11 4.390 -3.661 -4.913 1.00 0.00 C
ATOM 163 O TYR A 11 5.436 -3.398 -4.338 1.00 0.00 O
TER 164 TYR A 11
ENDMDL
END
Binary file not shown.
Loading
Loading