Skip to content

Commit 3f312cb

Browse files
committed
better index.rst for python client
1 parent 492da41 commit 3f312cb

File tree

1 file changed

+139
-20
lines changed

1 file changed

+139
-20
lines changed

source/python-api/index.rst

Lines changed: 139 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -13,43 +13,162 @@
1313
</a>
1414
</div>
1515

16-
Python API documentation
16+
Python API Documentation
1717
========================
1818

19-
This section of our site provides documentation supporting our Python client API.
19+
The OpenProtein Python SDK provides a pythonic interface to the OpenProtein.AI platform for protein engineering. This client library enables you to leverage state-of-the-art foundation models, train custom predictors, design novel sequences, and predict protein structures.
2020

21-
After `installing <./installation.rst>`_ the Python client and `setting up your session <./overview.rst>`_, get started with our docs to use OpenProtein.AI's key platform capabilities with Python.
21+
Getting Started
22+
---------------
2223

23-
**Property Regression Models** enable you to train custom models, predict sequence function, and make improved designs in the context of your data.
24+
1. **Install the package** via pip or conda (`installation guide <./installation.rst>`_)
25+
2. **Create a session** to authenticate with the platform (`session setup <./overview.rst>`_)
26+
3. **Choose your workflow** based on your protein engineering goals
2427

25-
- `API Reference <./api-reference/predictor.rst#openprotein.predictor.PredictorModel>`_
28+
Quick Start
29+
^^^^^^^^^^^
2630

27-
- `Tutorials <./property-regression-models/index.rst>`_
31+
.. code-block:: python
2832
33+
import openprotein
34+
35+
# Connect to the platform
36+
session = openprotein.connect(username="your_username", password="your_password")
37+
38+
# Example: Generate embeddings
39+
future = session.embedding.esm2.embed(sequences=["ACDEFGHIKLMNPQRSTVWY"])
40+
embeddings = future.wait()
41+
42+
Core Concepts
43+
-------------
44+
45+
Understanding these primitives will help you work effectively with the SDK:
46+
47+
**Session Management**
48+
The ``session`` object (``OpenProtein``) is your gateway to all platform capabilities. It manages authentication and provides access to all API modules (``session.embedding``, ``session.fold``, ``session.predictor``, etc.).
49+
50+
**Asynchronous Jobs**
51+
Most operations return ``Future`` objects that track asynchronous jobs. Use ``wait()`` to block until completion, or ``refresh()`` and ``done()`` to poll status. Learn more in the `Jobs System guide <./jobs-system.ipynb>`_.
52+
53+
**Protein Primitives**
54+
- ``Protein``: Represents a single protein chain with sequence and optional MSA
55+
- ``Chain``: Represents ligands, DNA, or RNA molecules
56+
- ``Model``: A collection of proteins and chains forming a complex
57+
- ``AssayDataset``: Your experimental data (sequences + measured properties)
58+
59+
**Embeddings & Reductions**
60+
Foundation models produce embeddings that can be reduced (``MEAN``, ``SUM``), kept per-residue, or transformed with a custom-fitted SVD. These embeddings power downstream prediction and design tasks.
61+
62+
Platform Capabilities
63+
---------------------
64+
65+
The SDK is organized around key protein engineering workflows:
66+
67+
Data & Embeddings
68+
^^^^^^^^^^^^^^^^^
69+
70+
**Foundation Models** - Generate high-quality protein embeddings from state-of-the-art models
71+
72+
- Access to PoET and proprietary OpenProtein models, along with community-based models like ESM.
73+
- Per-residue or reduced embeddings (mean/sum pooling)
74+
- Logits and attention maps for interpretability
75+
- `Tutorials <./foundation-models/index.rst>`_ | `API Reference <./api-reference/embedding.rst>`_
76+
77+
**PoET** - Conditional protein language model for zero-shot prediction and generation
78+
79+
- Create prompts from MSAs to condition on protein families
80+
- Score sequences without experimental data
81+
- Generate novel sequences with desired properties
82+
- Single-site analysis for variant effect prediction
83+
- `Tutorials <./poet/index.rst>`_ | `API Reference <./api-reference/embedding.rst#openprotein.embeddings.PoETModel>`_
84+
85+
**Data Management** - Upload and manage your experimental datasets
86+
87+
- Store assay data (sequences + measurements) on the platform
88+
- Use datasets for training predictors and design workflows
89+
- `API Reference <./api-reference/data.rst>`_
2990

30-
**PoET** provides tools using our state-of-the-art generative model for *de novo* variant effect prediction and controllable protein sequence design.
91+
Prediction & Design
92+
^^^^^^^^^^^^^^^^^^^
3193

32-
- `API Reference <./api-reference/embedding.rst#openprotein.embeddings.PoETModel>`_
94+
**Property Regression Models** - Train custom models on your data
3395

34-
- `Tutorials <./poet/index.rst>`_
96+
- Fit Gaussian Process models using foundation model embeddings
97+
- Cross-validation for uncertainty estimation
98+
- Predict properties for novel sequences
99+
- Single-site saturation mutagenesis analysis
100+
- `Tutorials <./property-regression-models/index.rst>`_ | `API Reference <./api-reference/predictor.rst>`_
35101

36-
**Foundation Models** provide access to high-quality protein sequence embeddings from open source models, and our proprietary models.
102+
**Sequence Design** - Optimize sequences for your objectives
37103

38-
- `API Reference <./api-reference/index.rst#foundation-models>`_
104+
- Genetic algorithm-based design using trained predictors
105+
- Multi-objective optimization support
106+
- Design novel variants optimized for your measured properties
107+
- `Tutorials <./property-regression-models/index.rst>`_ | `API Reference <./api-reference/design.rst>`_
39108

40-
- `Tutorials <./foundation-models/index.rst>`_
109+
Structure
110+
^^^^^^^^^
41111

42-
**Structure Prediction** enables you to generate high-quality structure predictions via ESMFold and AlphaFold2.
112+
**Structure Prediction** - Predict 3D structures from sequences
43113

44-
- `API Reference <./api-reference/fold.rst>`_
114+
- ESMFold for fast single-chain folding
115+
- AlphaFold2 for high-accuracy multi-chain complexes
116+
- Boltz (1, 1x, 2) for advanced complex prediction with constraints
117+
- RosettaFold3 for alternative multi-chain folding
118+
- `Tutorials <./structure-prediction/index.rst>`_ | `API Reference <./api-reference/fold.rst>`_
45119

46-
- `Tutorials <./structure-prediction/index.rst>`_
120+
**Structure Generation** - Design novel protein structures de novo
47121

48-
**Structure Generation** using models like RFdiffusion and BoltzGen allows you to generate de novo protein structures based on your design goals.
122+
- RFdiffusion for diffusion-based structure generation
123+
- BoltzGen for generative structure design
124+
- Useful for binder design and scaffold generation
125+
- `Tutorials <./structure-generation/index.rst>`_ | `API Reference <./api-reference/models.rst>`_
49126

50-
- `API Reference <./api-reference/models.rst>`_
127+
Supporting Tools
128+
^^^^^^^^^^^^^^^^
51129

52-
- `Tutorials <./structure-generation/index.rst>`_
130+
**Alignment** - Multiple sequence alignment and antibody numbering
131+
132+
- Create MSAs via homology search (MMseqs2)
133+
- MAFFT and ClustalOmega alignment
134+
- AbNumber for antibody numbering schemes
135+
- `API Reference <./api-reference/align.rst>`_
136+
137+
**Dimensionality Reduction** - Visualize and analyze embeddings
138+
139+
- SVD for linear dimensionality reduction
140+
- UMAP for non-linear manifold learning
141+
- Fit on training data, transform new sequences
142+
- `API Reference <./api-reference/svd.rst>`_ | `API Reference <./api-reference/umap.rst>`_
143+
144+
Common Workflows
145+
----------------
146+
147+
**Workflow 1: Zero-shot prediction with PoET**
148+
149+
1. Create MSA from your seed sequence → ``session.align.create_msa()``
150+
2. Create a prompt from the MSA → ``session.prompt.create()``
151+
3. Score your variants → ``session.embedding.poet.score()``
152+
153+
**Workflow 2: Train a custom predictor**
154+
155+
1. Upload your assay data → ``session.data.create()``
156+
2. Train a GP model → ``session.embedding.esm2.fit_gp()``
157+
3. Predict on new sequences → ``predictor.predict()``
158+
4. Design optimized variants → ``session.design.genetic_algorithm()``
159+
160+
**Workflow 3: Structure prediction**
161+
162+
1. For single chains: ``session.fold.esmfold.fold()``
163+
2. For complexes: Create MSA → Build ``Protein`` objects → ``session.fold.alphafold2.fold()``
164+
165+
Next Steps
166+
----------
167+
168+
- **New users**: Start with `Installation <./installation.rst>`_ and `Session Setup <./overview.rst>`_
169+
- **Learn the basics**: Review the `Jobs System <./jobs-system.ipynb>`_ to understand async operations
170+
- **Explore tutorials**: Browse capability-specific tutorials below
171+
- **API reference**: Detailed documentation for all classes and methods
53172

54173
.. toctree::
55174
:maxdepth: 2
@@ -58,9 +177,9 @@ After `installing <./installation.rst>`_ the Python client and `setting up your
58177
installation
59178
overview
60179
Jobs System <jobs-system.ipynb>
61-
Property Regression Models <property-regression-models/index>
62-
PoET <poet/index>
63180
Foundation Models <foundation-models/index>
181+
PoET <poet/index>
182+
Property Regression Models <property-regression-models/index>
64183
Structure Prediction <structure-prediction/index>
65184
Structure Generation <structure-generation/index>
66185
API Reference <api-reference/index>

0 commit comments

Comments
 (0)