Skip to content

Commit 910377e

Browse files
committed
2 parents 34082e2 + c2696ae commit 910377e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+72238
-2765
lines changed

.github/workflows/pin_requirements.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
fail-fast: false
1111
matrix:
1212
os: [ubuntu-latest, windows-latest, macOS-latest]
13-
python-version: ["3.10", "3.11", "3.12", "3.13"]
13+
python-version: ["3.11", "3.12", "3.13", "3.14"]
1414
steps:
1515
- name: Checkout code
1616
uses: actions/checkout@v3
@@ -29,11 +29,13 @@ jobs:
2929
with:
3030
name: req-artifact-${{ matrix.os }}-${{ matrix.python-version }}
3131
path: requirements-${{ matrix.os }}-${{ matrix.python-version }}.txt
32+
33+
3234
merge:
3335
runs-on: ubuntu-latest
3436
needs: generate-requirements
3537
steps:
36-
- name: Merge Artifacts
38+
- name: Merge Requirements Artifacts
3739
uses: actions/upload-artifact/merge@v4
3840
with:
3941
name: all-requirements

.github/workflows/pytest.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ jobs:
66
strategy:
77
fail-fast: false
88
matrix:
9-
python-version: ["3.10", "3.11", "3.12", "3.13"]
9+
python-version: ["3.11", "3.12", "3.13", "3.14"]
1010
runs-on: ubuntu-latest
1111
steps:
1212
- name: Check out repository

.gitignore

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,4 +126,16 @@ __datasets/
126126
dev/
127127

128128
# unpublished datasets
129-
datasets_private/
129+
datasets_private/
130+
131+
# Node
132+
node_modules/
133+
npm-debug.log*
134+
yarn-debug.log*
135+
-error.log*
136+
dist/
137+
dist-ssr/
138+
*.local
139+
140+
141+
.claude

docs/cli.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Command Line Interface (CLI)
2+
3+
The `hdxms-datasets` package provides a command-line interface to help you create and manage HDX-MS datasets.
4+
5+
## Installation
6+
7+
First, install the package with the CLI dependencies:
8+
9+
```bash
10+
pip install -e .
11+
```
12+
13+
After installation, the `hdxms-datasets` command will be available in your terminal.
14+
15+
## Commands
16+
17+
### `hdxms-datasets create`
18+
19+
Create a new HDX-MS dataset with a unique ID and template script.
20+
21+
**Basic usage:**
22+
23+
```bash
24+
hdxms-datasets create
25+
```
26+
27+
This will:
28+
1. Generate a unique HDX dataset ID (e.g., `HDX_A1B2C3D4`)
29+
2. Create a new directory in the current directory: `<HDX_ID>/`
30+
3. Generate a template `create_dataset.py` script with configuration
31+
4. Create a `data/` subdirectory for your raw data files
32+
5. Generate a `README.md` with quick start instructions
33+
34+
**Options:**
35+
36+
- `--num-states, -n INTEGER`: Number of protein states (default: 1)
37+
- `--format, -f CHOICE`: Data format - OpenHDX, DynamX_v3_state, DynamX_v3_cluster, HDExaminer (default: OpenHDX)
38+
- `--ph FLOAT`: Experimental pH (default: 7.5)
39+
- `--temperature, -t FLOAT`: Temperature in Kelvin (default: 293.15)
40+
- `--database-dir, -d PATH`: Path to existing database directory to check for ID conflicts
41+
- `--help`: Show help message
42+
43+
**Examples:**
44+
45+
```bash
46+
# Create with defaults (OpenHDX, 1 state, pH 7.5, 20°C)
47+
hdxms-datasets create
48+
49+
# Create with custom parameters
50+
hdxms-datasets create --num-states 2 --format DynamX_v3_state --ph 8.0 --temperature 298.15
51+
52+
# Using short flags
53+
hdxms-datasets create -n 3 -f HDExaminer --ph 7.0 -t 293.15
54+
55+
# Check for ID conflicts with existing database
56+
hdxms-datasets create --database-dir ~/hdx-database/datasets
57+
```
58+
59+
## Configuration via Arguments
60+
61+
All dataset configuration is specified via command-line arguments:
62+
63+
- **Number of states** (`--num-states`): How many different protein states you measured (default: 1)
64+
- **Data format** (`--format`): Which software generated your data (default: OpenHDX)
65+
- `OpenHDX` - OpenHDX format
66+
- `DynamX_v3_state` - DynamX state files
67+
- `DynamX_v3_cluster` - DynamX cluster files
68+
- `HDExaminer` - HDExaminer files
69+
- **pH** (`--ph`): Experimental pH value (default: 7.5)
70+
- **Temperature** (`--temperature`): Temperature in Kelvin (default: 293.15 K = 20°C)
71+
72+
## Workflow Example
73+
74+
```bash
75+
# Step 1: Create a new dataset with custom parameters
76+
$ hdxms-datasets create --num-states 2 --format DynamX_v3_state --ph 8.0
77+
78+
✓ Generated new dataset ID: HDX_A1B2C3D4
79+
============================================================
80+
✓ Dataset template created successfully!
81+
============================================================
82+
83+
Dataset ID: HDX_A1B2C3D4
84+
Location: C:\Users\username\HDX_A1B2C3D4
85+
Format: DynamX_v3_state
86+
States: 2
87+
pH: 8.0
88+
Temperature: 293.15 K (20.0°C)
89+
90+
Next steps:
91+
1. cd HDX_A1B2C3D4
92+
2. Place your data files in the data/ directory
93+
3. Edit create_dataset.py with your specific information
94+
4. python create_dataset.py
95+
96+
# Step 2: Navigate to the new directory
97+
$ cd HDX_A1B2C3D4
98+
99+
# Step 3: Copy your data files
100+
$ copy C:\path\to\my\data.csv data\
101+
102+
# Step 4: Edit the template script
103+
$ notepad create_dataset.py
104+
# Edit the file with your specific information:
105+
# - Replace protein sequences
106+
# - Update data file names
107+
# - Add author information
108+
# - Add publication details
109+
110+
# Step 5: Run the script to create your dataset
111+
$ python create_dataset.py
112+
✓ Dataset submitted successfully with ID: HDX_A1B2C3D4
113+
Dataset location: C:\Users\username\HDX_A1B2C3D4\dataset\HDX_A1B2C3D4
114+
```
115+
116+
## Generated Template Structure
117+
118+
After running `hdxms-datasets create`, you'll have:
119+
120+
```
121+
HDX_A1B2C3D4/
122+
├── create_dataset.py # Template script to edit
123+
├── README.md # Quick start guide
124+
└── data/ # Directory for your raw data files
125+
```
126+
127+
The `create_dataset.py` template includes:
128+
- Clearly marked sections to edit
129+
- Inline comments explaining each field
130+
- List-based structure for protein states and peptides (flexible and easy to extend)
131+
- Pre-configured pH and temperature values from your command-line arguments
132+
- Example values to guide you
133+
- Automatic sequence verification
134+
- Dataset submission code
135+
136+
Please note that this template is not exhaustive and other metadata fields may be used
137+
depending on your dataset's requirements.
138+
139+
## Future Commands (Planned)
140+
141+
The CLI is designed to be extensible. Future commands may include:
142+
143+
- `hdxms-datasets validate`: Validate a dataset before submission
144+
- `hdxms-datasets upload`: Upload a dataset to a remote database
145+
- `hdxms-datasets export`: Export a dataset to different formats
146+
147+
## Getting Help
148+
149+
For more information about any command:
150+
151+
```bash
152+
hdxms-datasets --help
153+
hdxms-datasets create --help
154+
```

docs/fields.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,13 @@ Standard deviation of the uptake value
8989
## Calculated fields:
9090
These fields are derived from other fields defined in the above sections.
9191

92+
### n_replicates
93+
added after data aggregation
94+
Total number of replicates that were aggregated together
9295

96+
### n_clusters
97+
added after data aggregation
98+
Total number of isotopic clusters that were aggregated together. When replicates include multiple isotopic clusters (different charged states), this value will be larger than n_replicates.
9399

94100
### frac_fd_control (float)
95101
Fractional deuterium uptake with respect to fully deuterated control sample

examples/from_custom_xlsx_file.py

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -138,23 +138,6 @@
138138

139139
# %%
140140

141-
pub = Publication(
142-
title="Simple and Fast Maximally Deuterated Control (maxD) Preparation for Hydrogen-Deuterium Exchange Mass Spectrometry Experiments",
143-
doi="10.1021/acs.analchem.2c01446",
144-
url="https://pubs.acs.org/doi/10.1021/acs.analchem.2c01446",
145-
)
146-
147-
# %%
148-
# Make sure to add the correct licsense for your dataset
149-
# If you are the author, you can choose any license you like
150-
# The preferred / default license is CC0
151-
metadata = DatasetMetadata( # type: ignore[call-arg]
152-
authors=[Author(name="Daniele Peterle", affiliation="Northeastern University")],
153-
publication=pub,
154-
license="CC BY-NC 4.0",
155-
conversion_notes="Converted published Supplementary data",
156-
)
157-
158141
protein_info = ProteinIdentifiers(
159142
uniprot_accession_number="P68082",
160143
uniprot_entry_name="MYG_HORSE",
@@ -238,10 +221,25 @@
238221

239222
# %%
240223

224+
pub = Publication(
225+
title="Simple and Fast Maximally Deuterated Control (maxD) Preparation for Hydrogen-Deuterium Exchange Mass Spectrometry Experiments",
226+
doi="10.1021/acs.analchem.2c01446",
227+
url="https://pubs.acs.org/doi/10.1021/acs.analchem.2c01446",
228+
)
229+
230+
# Make sure to add the correct licsense for your dataset
231+
# If you are the author, you can choose any license you like
232+
# The preferred / default license is CC0
233+
241234
dataset = HDXDataSet( # type: ignore[call-arg]
242235
states=[state],
243236
description="1 Mb dataset from Peterle et al. 2022",
244-
metadata=metadata,
237+
metadata=DatasetMetadata( # type: ignore[call-arg]
238+
authors=[Author(name="Daniele Peterle", affiliation="Northeastern University")],
239+
publication=pub,
240+
license="CC BY-NC 4.0",
241+
conversion_notes="Converted published Supplementary data",
242+
),
245243
protein_identifiers=protein_info,
246244
structure=structure,
247245
)

examples/from_hxms_file.py

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,6 @@
5757
# define the states, one state per file
5858
hxms_files = list(data_dir.glob("*.hxms"))
5959
hxms_files
60-
# %%
61-
hxms_file = hxms_files[2]
62-
hxms_file.stem.split("_")[-1]
6360

6461

6562
# %%
@@ -80,15 +77,11 @@ def get_ligand(fpath: Path) -> Optional[str]:
8077
return tag
8178

8279

83-
get_ligand(hxms_file)
84-
8580
# %%
8681

8782
# structure mapping: chain A, residue offset -15 to match sequence numbering
8883
mapping = StructureMapping(chain=["A"], residue_offset=-15)
8984

90-
# %%
91-
9285

9386
# create a helper function to create a open-hdxms state object from the hdxms file
9487
def make_state(hxms_file: Path) -> State:
@@ -214,10 +207,8 @@ def make_state(hxms_file: Path) -> State:
214207
view
215208

216209
# %%
217-
218-
merged = merge_peptides(dataset.states[1].peptides)
219-
220210
# compute uptake metrics (uptake, fractional deuterium), view result in peptide plot
211+
merged = merge_peptides(dataset.states[1].peptides)
221212
processed = compute_uptake_metrics(merged).to_polars()
222213

223214
df_exposure = slice_exposure(processed)[5]

0 commit comments

Comments
 (0)