copy/paste docs index to readme

Jhsmit · Jhsmit · commit b6747e1cd7a1 · 2025-07-31T15:29:47.000+02:00
diff --git a/README.md b/README.md
@@ -1,53 +1,115 @@
 # HDXMS Datasets
 
+Welcome to the HDXMS datasets repository. 
 
-* Free software: MIT license
+The `hdxms-datasets` package provides tools handling HDX-MS datasets.
 
-## Installation
+The package offers the following features:
 
-```bash
-$ pip install hdxms-datasets
-```
+ - Defining datasets and their experimental metadata
+ - Verification of datasets and metadata
+ - Loading datasets from local or remote (WIP) database
+ - Conversion of datasets from various formats (e.g., DynamX, HDExaminer) to a standardized format
+ - Propagation of standard deviations from replicates to fractional relative uptake values
 
-## HDX-MS database
 
-Currently a beta test database is set up at:
-https://github.com/Jhsmit/HDX-MS-datasets
+## Example Usage
 
-## Using HDX-MS datasets
+```python {title="Loading a dataset"}
 
-### Example code
+from hdxms_datasets import DataBase
 
+db = DataBase('path/to/local_db')
+dataset = db.get_dataset('HDX_D9096080')
 
-```python
-from pathlib import Path
-from hdxms_datasets import DataVault
+# Protein identifier information
+print(dataset.protein_identifiers.uniprot_entry_name)
+#> 'SECB_ECOLI'
 
-# local path the download datasets to
-cache_dir = Path('.cache')
+# Access HDX states 
+print([state.name for state in dataset.states])
+#> ['Tetramer', 'Dimer']
 
-# create a vault with local cache dir, set `remote_url` to connect to a different database
-vault = DataVault(cache_dir=cache_dir)
+# Get the sequence of the first state
+state = dataset.states[0]
+print(state.protein_state.sequence)
+#> 'MSEQNNTEMTFQIQRIYT...'
 
-# Download a specific HDX dataset
-vault.fetch_dataset("20221007_1530_SecA_Krishnamurthy")
+# Load peptides
+peptides = state.peptides[0]
 
-# Load the dataset
-ds = vault.load_dataset("20221007_1530_SecA_Krishnamurthy")
+# Access peptide information
+print(peptides.deuteration_type, peptides.pH, peptides.temperature)
+#> DeuterationType.partially_deuterated 8.0 303.15
 
-# Load the FD control of the first 'state' in the dataset.
-fd_control = ds.load_peptides(0, "FD_control")
+# Load the peptide table as standardized narwhals DataFrame
+df = peptides.load(
+    convert=True,  # convert column header names to open hdx stanard
+    aggregate=True, # aggregate centroids / uptake values across replicates
+)
 
-# Load the corresponding experimental peptides.
-peptides = ds.load_peptides(0, "experiment")
+print(df.columns)
+#> ['start', 'end', 'sequence', 'state', 'exposure', 'centroid_mz', 'rt', 'rt_sd', 'uptake', ... 
 
 ```
 
-## Web infterface
+```python {title="Define a set of peptides for a state"}
+from hdxms_datasets import ProteinState, Peptides, verify_sequence, merge_peptides, compute_uptake_metrics
+
+# Define the protein state
+protein_state = ProteinState(
+    sequence="MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA",
+    n_term=1,
+    c_term=155,
+    oligomeric_state=4,
+)
+
+# Define the partially deuterated peptides for the SecB state
+pd_peptides = Peptides(
+    data_file=data_dir / "ecSecB_apo.csv",
+    data_format=PeptideFormat.DynamX_v3_state,
+    deuteration_type=DeuterationType.partially_deuterated,
+    filters={
+        "State": "SecB WT apo",
+        "Exposure": [0.167, 0.5, 1.0, 10.0, 100.000008],
+    },
+    pH=8.0,
+    temperature=303.15,
+    d_percentage=90.0,
+)
+
+# check for difference between the protein state sequence and the peptide sequences
+mismatches = verify_sequence(pd_peptides.load(), protein_state.sequence, n_term=protein_state.n_term)
+print(mismatches)
+#> [] # sequences match
+
+# Define the fully deuterated peptides for the SecB state
+fd_peptides = Peptides(
+    data_file=data_dir / "ecSecB_apo.csv",
+    data_format=PeptideFormat.DynamX_v3_state,
+    deuteration_type=DeuterationType.fully_deuterated,
+    filters={
+        "State": "Full deuteration control",
+        "Exposure": 0.167,
+    },
+)
+
+# merge both peptides together in a single dataframe
+merged = merge_peptides([pd_peptides, fd_peptides])
+print(merged.columns)
+#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd']
+
+# compute uptake metrics for the merged peptides
+# this function computes uptake from centroid mass if not present
+# as well as fractional uptake
+processed = compute_uptake_metrics(merged)
+print(processed.columns)
+#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd', 'fractional_uptake', 'fractional_uptake_sd']
 
-To run the web interface:
-(requires a local clone of the code)
+```
+
+## Installation
 
 ```bash
-solara run hdxms_datasets/web/upload_form.py --production
-```
+$ pip install hdxms-datasets
+```