Skip to content

Commit ab0eed3

Browse files
committed
initial commit
initial base package - largely LLM generated
0 parents  commit ab0eed3

File tree

5 files changed

+1490
-0
lines changed

5 files changed

+1490
-0
lines changed

README.md

Lines changed: 358 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,358 @@
1+
# openms-python
2+
3+
**CAUTION: this package is under heavy development and is largely LLM generated. Breaking changes are expected and documentation might be out of date.**
4+
5+
**A Pythonic wrapper around pyOpenMS for mass spectrometry data analysis**
6+
7+
`openms-python` provides an intuitive, Python-friendly interface to OpenMS, making mass spectrometry data analysis feel natural for Python developers and data scientists.
8+
9+
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/downloads/)
10+
[![License](https://img.shields.io/badge/license-BSD--3-green)](LICENSE)
11+
12+
## Why openms-python?
13+
14+
[pyOpenMS](https://pyopenms.readthedocs.io/) is a Python binding for the powerful OpenMS C++ library. However, being a direct C++ binding, it doesn't always feel "Pythonic". This package wraps pyOpenMS to provide:
15+
16+
**Pythonic properties** instead of verbose getters/setters
17+
**Intuitive iteration** with smart filtering
18+
**pandas DataFrame integration** for data analysis
19+
**Method chaining** for processing pipelines
20+
**Type hints** for better IDE support
21+
**Clean, documented API** with examples
22+
23+
### Before (pyOpenMS)
24+
```python
25+
import pyopenms as oms
26+
27+
exp = oms.MSExperiment()
28+
oms.MzMLFile().load("data.mzML", exp)
29+
30+
n_spectra = exp.getNrSpectra()
31+
for i in range(n_spectra):
32+
spec = exp.getSpectrum(i)
33+
if spec.getMSLevel() == 1:
34+
rt = spec.getRT()
35+
peaks = spec.get_peaks()
36+
mz = peaks[0]
37+
intensity = peaks[1]
38+
print(f"RT: {spec.retention_time:.2f}s, Peaks: {spec.mz}, intens: {intensity}")
39+
```
40+
41+
### After (openms-python)
42+
```python
43+
from openms_python import MSExperiment
44+
45+
exp = Py_MSExperiment.from_file("data.mzML")
46+
47+
print(f"Loaded {len(exp)} spectra")
48+
for spec in exp.ms1_spectra():
49+
print(f"RT: {spec.retention_time:.2f}s, mz: {spec.mz}, intens: {spec.intensity}")
50+
51+
52+
## or convert to pandas dataframe
53+
df = spec.to_dataframe() # Get peaks as DataFrame
54+
```
55+
56+
### Reading mzML Files
57+
58+
```python
59+
from openms_python import Py_MSExperiment
60+
61+
# Load experiment
62+
exp = Py_MSExperiment.from_file('data.mzML')
63+
64+
# Get basic info
65+
print(f"Total spectra: {len(exp)}")
66+
print(f"RT range: {exp.rt_range}")
67+
print(f"MS levels: {exp.ms_levels}")
68+
69+
# Print summary
70+
exp.print_summary()
71+
```
72+
73+
### Working with Spectra
74+
75+
```python
76+
# Access individual spectra
77+
spec = exp[0]
78+
79+
# Access multiple spectra with slicing
80+
first_10 = exp[0:10] # First 10 spectra
81+
last_5 = exp[-5:] # Last 5 spectra
82+
every_other = exp[::2] # Every other spectrum
83+
ms1_only = exp[1:4] # Spectra 2-4 (0 indexing)
84+
85+
print(f"First spectrum: {spec}")
86+
print(f"First 10 spectra: {len(first_10)} spectra")
87+
print(f"Last 5 spectra: {len(last_5)} spectra")
88+
89+
# Pythonic properties
90+
print(f"Retention time: {spec.retention_time} seconds")
91+
print(f"MS level: {spec.ms_level}")
92+
print(f"Number of peaks: {len(spec)}")
93+
print(f"Total ion current: {spec.total_ion_current}")
94+
95+
# Boolean helpers
96+
if spec.is_ms1:
97+
print("This is an MS1 spectrum")
98+
99+
# Get peaks as NumPy arrays
100+
mz, intensity = spec.peaks
101+
102+
# Or as a DataFrame
103+
peaks_df = spec.to_dataframe()
104+
print(peaks_df.head())
105+
```
106+
107+
### Smart Iteration
108+
109+
```python
110+
# Iterate over MS1 spectra only
111+
for spec in exp.ms1_spectra():
112+
print(f"MS1 at RT={spec.retention_time:.2f}s")
113+
114+
# Iterate over MS2 spectra only
115+
for spec in exp.ms2_spectra():
116+
print(f"MS2: precursor m/z = {spec.precursor_mz:.4f}")
117+
118+
# Filter by retention time
119+
for spec in exp.rt_filter[100:200]:
120+
print(f"Spectrum at RT={spec.retention_time:.2f}s")
121+
```
122+
123+
### DataFrame Integration
124+
125+
```python
126+
# Convert entire experiment to DataFrame
127+
df = exp.to_dataframe(include_peaks=True)
128+
print(df.head())
129+
130+
# Spectrum-level DataFrame
131+
df_spectra = exp.to_dataframe(include_peaks=False)
132+
133+
# MS2 peaks only
134+
df_ms2 = exp.to_dataframe(include_peaks=True, ms_level=2)
135+
```
136+
137+
### Method Chaining
138+
139+
```python
140+
# Filter and process in a pipeline
141+
filtered_exp = (exp
142+
.filter_by_ms_level(1)
143+
.filter_by_rt(100, 500)
144+
.filter_by_mz(400, 500)
145+
.filter_top_n_peaks(100))
146+
147+
# OR
148+
filtered_exp = (exp
149+
.filter_by_ms_level(1)
150+
.rt_filter[100:500]
151+
.mz_filter[400:500]
152+
.filter_top_n_peaks(100))
153+
154+
print(f"After filtering: {len(filtered_exp)} spectra")
155+
```
156+
157+
### Data Manipulation
158+
159+
```python
160+
# Filter peaks by m/z
161+
filtered_spec = spec.filter_by_mz(100, 500)
162+
# OR
163+
filtered_spec = spec.mz_filter[100:500]
164+
165+
# Get top N peaks
166+
top_10 = spec.top_n_peaks(10)
167+
```
168+
169+
### Writing Files
170+
171+
```python
172+
# Save to mzML
173+
exp.to_mzml('output.mzML')
174+
175+
# Or use convenience function
176+
from openms_python import write_mzml
177+
write_mzml(exp, 'output.mzML')
178+
```
179+
180+
### Context Managers
181+
182+
```python
183+
from openms_python.io import MzMLReader, MzMLWriter
184+
185+
# Reading with context manager
186+
with MzMLReader('data.mzML') as exp:
187+
print(f"Loaded {len(exp)} spectra")
188+
for spec in exp.ms1_spectra():
189+
print(spec)
190+
191+
# Writing with context manager
192+
with MzMLWriter('output.mzML') as writer:
193+
writer.write(exp)
194+
```
195+
196+
## Advanced Examples
197+
198+
### Creating Spectra from Scratch
199+
200+
```python
201+
import pandas as pd
202+
from openms_python import Py_MSSpectrum
203+
204+
# From DataFrame
205+
df = pd.DataFrame({
206+
'mz': [100.0, 200.0, 300.0],
207+
'intensity': [50.0, 100.0, 75.0]
208+
})
209+
210+
spec = Spectrum.from_dataframe(
211+
df,
212+
retention_time=120.5,
213+
ms_level=1,
214+
native_id='spectrum=1'
215+
)
216+
```
217+
218+
### Creating Experiments from DataFrames
219+
220+
```python
221+
# Create experiment from grouped data
222+
df = pd.DataFrame({
223+
'spectrum_id': [0, 0, 1, 1, 2, 2],
224+
'mz': [100, 200, 150, 250, 120, 220],
225+
'intensity': [50, 100, 60, 110, 55, 105],
226+
'retention_time': [10.0, 10.0, 20.0, 20.0, 30.0, 30.0],
227+
'ms_level': [1, 1, 1, 1, 1, 1]
228+
})
229+
230+
exp = Py_MSExperiment.from_dataframe(df)
231+
```
232+
233+
### Analysis Workflow
234+
235+
```python
236+
from openms_python import Py_MSExperiment
237+
import pandas as pd
238+
import matplotlib.pyplot as plt
239+
240+
# Load data
241+
exp = Py_MSExperiment.from_file('data.mzML')
242+
243+
# Get MS2 spectra as DataFrame
244+
df_ms2 = exp.to_dataframe(include_peaks=True, ms_level=2)
245+
246+
# Analyze precursor distribution
247+
precursor_stats = df_ms2.groupby('precursor_mz').agg({
248+
'intensity': 'sum',
249+
'spectrum_index': 'count'
250+
}).rename(columns={'spectrum_index': 'n_spectra'})
251+
252+
print(precursor_stats.head())
253+
254+
# Plot TIC over time
255+
df_spectra = exp.to_dataframe(include_peaks=False)
256+
plt.figure(figsize=(10, 4))
257+
plt.plot(df_spectra['retention_time'], df_spectra['total_ion_current'])
258+
plt.xlabel('Retention Time (s)')
259+
plt.ylabel('Total Ion Current')
260+
plt.title('TIC over time')
261+
plt.show()
262+
```
263+
264+
## API Reference
265+
266+
### Py_MSExperiment
267+
268+
**Properties:**
269+
- `n_spectra`: Number of spectra
270+
- `rt_range`: Tuple of (min_rt, max_rt)
271+
- `ms_levels`: Set of MS levels present
272+
273+
**Methods:**
274+
- `from_file(filepath)`: Load from mzML file (class method)
275+
- `from_dataframe(df, group_by)`: Create from DataFrame (class method)
276+
- `to_file(filepath)`: Save to mzML file
277+
- `to_dataframe(include_peaks, ms_level)`: Convert to DataFrame
278+
- `ms1_spectra()`: Iterator over MS1 spectra
279+
- `ms2_spectra()`: Iterator over MS2 spectra
280+
- `spectra_by_level(level)`: Iterator over specific MS level
281+
- `spectra_in_rt_range(min_rt, max_rt)`: Iterator over RT range
282+
- `filter_by_ms_level(level)`: Filter by MS level
283+
- `filter_by_rt(min_rt, max_rt)`: Filter by RT range
284+
- `filter_top_n_peaks(n)`: Keep top N peaks per spectrum
285+
- `summary()`: Get summary statistics
286+
- `print_summary()`: Print formatted summary
287+
288+
### Py_MSSpectrum
289+
290+
**Properties:**
291+
- `retention_time`: RT in seconds
292+
- `ms_level`: MS level (1, 2, etc.)
293+
- `is_ms1`, `is_ms2`: Boolean helpers
294+
- `precursor_mz`: Precursor m/z (MS2+)
295+
- `precursor_charge`: Precursor charge (MS2+)
296+
- `native_id`: Native spectrum ID
297+
- `total_ion_current`: Sum of intensities
298+
- `base_peak_mz`: m/z of most intense peak
299+
- `base_peak_intensity`: Intensity of base peak
300+
- `peaks`: Tuple of (mz_array, intensity_array)
301+
302+
**Methods:**
303+
- `from_dataframe(df, **metadata)`: Create from DataFrame (class method)
304+
- `to_dataframe()`: Convert to DataFrame
305+
- `filter_by_mz(min_mz, max_mz)`: Filter peaks by m/z
306+
- `filter_by_intensity(min_intensity)`: Filter peaks by intensity
307+
- `top_n_peaks(n)`: Keep top N peaks
308+
- `normalize_intensity(max_value)`: Normalize intensities
309+
310+
## Development
311+
312+
### Setup Development Environment
313+
314+
```bash
315+
git clone https://github.com/openms/openms-python.git
316+
cd openms-python
317+
pip install -e ".[dev]"
318+
```
319+
320+
## Comparison with pyOpenMS
321+
322+
| Feature | pyOpenMS | openms-python |
323+
|---------|----------|---------------|
324+
| Get spectrum count | `exp.getNrSpectra()` | `len(exp)` |
325+
| Get retention time | `spec.getRT()` | `spec.retention_time` |
326+
| Check MS1 | `spec.getMSLevel() == 1` | `spec.is_ms1` |
327+
| Load file | `MzMLFile().load(path, exp)` | `exp = MSExperiment.from_file(path)` |
328+
| Iterate MS1 | Manual loop + level check | `for spec in exp.ms1_spectra():` |
329+
| Peak data | `peaks = spec.get_peaks(); mz = peaks[0]` | `mz, intensity = spec.peaks` |
330+
| DataFrame | Not available | `df = exp.to_dataframe()` |
331+
332+
## Contributing
333+
334+
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
335+
336+
## License
337+
338+
This project is licensed under the BSD-3-Clause License - see the [LICENSE](LICENSE) file for details.
339+
340+
## Acknowledgments
341+
342+
- Built on top of the excellent [pyOpenMS](https://pyopenms.readthedocs.io/) library
343+
- Part of the [OpenMS](https://www.openms.de/) ecosystem
344+
345+
## Citation
346+
347+
If you use openms-python in your research, please cite:
348+
349+
```
350+
Röst HL, Sachsenberg T, Aiche S, et al. OpenMS: a flexible open-source software platform
351+
for mass spectrometry data analysis. Nat Methods. 2016;13(9):741-748.
352+
```
353+
354+
## Support
355+
356+
- **Documentation**: [https://openms-python.readthedocs.io](https://openms-python.readthedocs.io)
357+
- **Issues**: [GitHub Issues](https://github.com/openms/openms-python/issues)
358+
- **Discussions**: [GitHub Discussions](https://github.com/openms/openms-python/discussions)

openms_python/__init__.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
"""
2+
openms_python: A Pythonic wrapper around pyOpenMS
3+
4+
This package provides a more intuitive, Python-friendly interface to OpenMS
5+
for mass spectrometry data analysis.
6+
7+
Example:
8+
>>> from openms_python import MSExperiment
9+
>>> exp = MSExperiment.from_file('data.mzML')
10+
>>> print(f"Loaded {len(exp)} spectra")
11+
>>> for spec in exp.ms1_spectra():
12+
... print(f"RT: {spec.retention_time:.2f}, Peaks: {len(spec)}")
13+
"""
14+
15+
__version__ = "0.1.2"
16+
__author__ = "MiniMax Agent"
17+
18+
from .py_msexperiment import Py_MSExperiment
19+
from .py_msspectrum import Py_MSSpectrum
20+
from .io import read_mzml, write_mzml
21+
22+
__all__ = [
23+
"Py_MSExperiment",
24+
"Py_MSSpectrum",
25+
"read_mzml",
26+
"write_mzml",
27+
]

0 commit comments

Comments
 (0)