|
| 1 | +# openms-python |
| 2 | + |
| 3 | +**CAUTION: this package is under heavy development and is largely LLM generated. Breaking changes are expected and documentation might be out of date.** |
| 4 | + |
| 5 | +**A Pythonic wrapper around pyOpenMS for mass spectrometry data analysis** |
| 6 | + |
| 7 | +`openms-python` provides an intuitive, Python-friendly interface to OpenMS, making mass spectrometry data analysis feel natural for Python developers and data scientists. |
| 8 | + |
| 9 | +[](https://www.python.org/downloads/) |
| 10 | +[](LICENSE) |
| 11 | + |
| 12 | +## Why openms-python? |
| 13 | + |
| 14 | +[pyOpenMS](https://pyopenms.readthedocs.io/) is a Python binding for the powerful OpenMS C++ library. However, being a direct C++ binding, it doesn't always feel "Pythonic". This package wraps pyOpenMS to provide: |
| 15 | + |
| 16 | +✅ **Pythonic properties** instead of verbose getters/setters |
| 17 | +✅ **Intuitive iteration** with smart filtering |
| 18 | +✅ **pandas DataFrame integration** for data analysis |
| 19 | +✅ **Method chaining** for processing pipelines |
| 20 | +✅ **Type hints** for better IDE support |
| 21 | +✅ **Clean, documented API** with examples |
| 22 | + |
| 23 | +### Before (pyOpenMS) |
| 24 | +```python |
| 25 | +import pyopenms as oms |
| 26 | + |
| 27 | +exp = oms.MSExperiment() |
| 28 | +oms.MzMLFile().load("data.mzML", exp) |
| 29 | + |
| 30 | +n_spectra = exp.getNrSpectra() |
| 31 | +for i in range(n_spectra): |
| 32 | + spec = exp.getSpectrum(i) |
| 33 | + if spec.getMSLevel() == 1: |
| 34 | + rt = spec.getRT() |
| 35 | + peaks = spec.get_peaks() |
| 36 | + mz = peaks[0] |
| 37 | + intensity = peaks[1] |
| 38 | + print(f"RT: {spec.retention_time:.2f}s, Peaks: {spec.mz}, intens: {intensity}") |
| 39 | +``` |
| 40 | + |
| 41 | +### After (openms-python) |
| 42 | +```python |
| 43 | +from openms_python import MSExperiment |
| 44 | + |
| 45 | +exp = Py_MSExperiment.from_file("data.mzML") |
| 46 | + |
| 47 | +print(f"Loaded {len(exp)} spectra") |
| 48 | +for spec in exp.ms1_spectra(): |
| 49 | + print(f"RT: {spec.retention_time:.2f}s, mz: {spec.mz}, intens: {spec.intensity}") |
| 50 | + |
| 51 | + |
| 52 | +## or convert to pandas dataframe |
| 53 | +df = spec.to_dataframe() # Get peaks as DataFrame |
| 54 | +``` |
| 55 | + |
| 56 | +### Reading mzML Files |
| 57 | + |
| 58 | +```python |
| 59 | +from openms_python import Py_MSExperiment |
| 60 | + |
| 61 | +# Load experiment |
| 62 | +exp = Py_MSExperiment.from_file('data.mzML') |
| 63 | + |
| 64 | +# Get basic info |
| 65 | +print(f"Total spectra: {len(exp)}") |
| 66 | +print(f"RT range: {exp.rt_range}") |
| 67 | +print(f"MS levels: {exp.ms_levels}") |
| 68 | + |
| 69 | +# Print summary |
| 70 | +exp.print_summary() |
| 71 | +``` |
| 72 | + |
| 73 | +### Working with Spectra |
| 74 | + |
| 75 | +```python |
| 76 | +# Access individual spectra |
| 77 | +spec = exp[0] |
| 78 | + |
| 79 | +# Access multiple spectra with slicing |
| 80 | +first_10 = exp[0:10] # First 10 spectra |
| 81 | +last_5 = exp[-5:] # Last 5 spectra |
| 82 | +every_other = exp[::2] # Every other spectrum |
| 83 | +ms1_only = exp[1:4] # Spectra 2-4 (0 indexing) |
| 84 | + |
| 85 | +print(f"First spectrum: {spec}") |
| 86 | +print(f"First 10 spectra: {len(first_10)} spectra") |
| 87 | +print(f"Last 5 spectra: {len(last_5)} spectra") |
| 88 | + |
| 89 | +# Pythonic properties |
| 90 | +print(f"Retention time: {spec.retention_time} seconds") |
| 91 | +print(f"MS level: {spec.ms_level}") |
| 92 | +print(f"Number of peaks: {len(spec)}") |
| 93 | +print(f"Total ion current: {spec.total_ion_current}") |
| 94 | + |
| 95 | +# Boolean helpers |
| 96 | +if spec.is_ms1: |
| 97 | + print("This is an MS1 spectrum") |
| 98 | + |
| 99 | +# Get peaks as NumPy arrays |
| 100 | +mz, intensity = spec.peaks |
| 101 | + |
| 102 | +# Or as a DataFrame |
| 103 | +peaks_df = spec.to_dataframe() |
| 104 | +print(peaks_df.head()) |
| 105 | +``` |
| 106 | + |
| 107 | +### Smart Iteration |
| 108 | + |
| 109 | +```python |
| 110 | +# Iterate over MS1 spectra only |
| 111 | +for spec in exp.ms1_spectra(): |
| 112 | + print(f"MS1 at RT={spec.retention_time:.2f}s") |
| 113 | + |
| 114 | +# Iterate over MS2 spectra only |
| 115 | +for spec in exp.ms2_spectra(): |
| 116 | + print(f"MS2: precursor m/z = {spec.precursor_mz:.4f}") |
| 117 | + |
| 118 | +# Filter by retention time |
| 119 | +for spec in exp.rt_filter[100:200]: |
| 120 | + print(f"Spectrum at RT={spec.retention_time:.2f}s") |
| 121 | +``` |
| 122 | + |
| 123 | +### DataFrame Integration |
| 124 | + |
| 125 | +```python |
| 126 | +# Convert entire experiment to DataFrame |
| 127 | +df = exp.to_dataframe(include_peaks=True) |
| 128 | +print(df.head()) |
| 129 | + |
| 130 | +# Spectrum-level DataFrame |
| 131 | +df_spectra = exp.to_dataframe(include_peaks=False) |
| 132 | + |
| 133 | +# MS2 peaks only |
| 134 | +df_ms2 = exp.to_dataframe(include_peaks=True, ms_level=2) |
| 135 | +``` |
| 136 | + |
| 137 | +### Method Chaining |
| 138 | + |
| 139 | +```python |
| 140 | +# Filter and process in a pipeline |
| 141 | +filtered_exp = (exp |
| 142 | + .filter_by_ms_level(1) |
| 143 | + .filter_by_rt(100, 500) |
| 144 | + .filter_by_mz(400, 500) |
| 145 | + .filter_top_n_peaks(100)) |
| 146 | + |
| 147 | +# OR |
| 148 | +filtered_exp = (exp |
| 149 | + .filter_by_ms_level(1) |
| 150 | + .rt_filter[100:500] |
| 151 | + .mz_filter[400:500] |
| 152 | + .filter_top_n_peaks(100)) |
| 153 | + |
| 154 | +print(f"After filtering: {len(filtered_exp)} spectra") |
| 155 | +``` |
| 156 | + |
| 157 | +### Data Manipulation |
| 158 | + |
| 159 | +```python |
| 160 | +# Filter peaks by m/z |
| 161 | +filtered_spec = spec.filter_by_mz(100, 500) |
| 162 | +# OR |
| 163 | +filtered_spec = spec.mz_filter[100:500] |
| 164 | + |
| 165 | +# Get top N peaks |
| 166 | +top_10 = spec.top_n_peaks(10) |
| 167 | +``` |
| 168 | + |
| 169 | +### Writing Files |
| 170 | + |
| 171 | +```python |
| 172 | +# Save to mzML |
| 173 | +exp.to_mzml('output.mzML') |
| 174 | + |
| 175 | +# Or use convenience function |
| 176 | +from openms_python import write_mzml |
| 177 | +write_mzml(exp, 'output.mzML') |
| 178 | +``` |
| 179 | + |
| 180 | +### Context Managers |
| 181 | + |
| 182 | +```python |
| 183 | +from openms_python.io import MzMLReader, MzMLWriter |
| 184 | + |
| 185 | +# Reading with context manager |
| 186 | +with MzMLReader('data.mzML') as exp: |
| 187 | + print(f"Loaded {len(exp)} spectra") |
| 188 | + for spec in exp.ms1_spectra(): |
| 189 | + print(spec) |
| 190 | + |
| 191 | +# Writing with context manager |
| 192 | +with MzMLWriter('output.mzML') as writer: |
| 193 | + writer.write(exp) |
| 194 | +``` |
| 195 | + |
| 196 | +## Advanced Examples |
| 197 | + |
| 198 | +### Creating Spectra from Scratch |
| 199 | + |
| 200 | +```python |
| 201 | +import pandas as pd |
| 202 | +from openms_python import Py_MSSpectrum |
| 203 | + |
| 204 | +# From DataFrame |
| 205 | +df = pd.DataFrame({ |
| 206 | + 'mz': [100.0, 200.0, 300.0], |
| 207 | + 'intensity': [50.0, 100.0, 75.0] |
| 208 | +}) |
| 209 | + |
| 210 | +spec = Spectrum.from_dataframe( |
| 211 | + df, |
| 212 | + retention_time=120.5, |
| 213 | + ms_level=1, |
| 214 | + native_id='spectrum=1' |
| 215 | +) |
| 216 | +``` |
| 217 | + |
| 218 | +### Creating Experiments from DataFrames |
| 219 | + |
| 220 | +```python |
| 221 | +# Create experiment from grouped data |
| 222 | +df = pd.DataFrame({ |
| 223 | + 'spectrum_id': [0, 0, 1, 1, 2, 2], |
| 224 | + 'mz': [100, 200, 150, 250, 120, 220], |
| 225 | + 'intensity': [50, 100, 60, 110, 55, 105], |
| 226 | + 'retention_time': [10.0, 10.0, 20.0, 20.0, 30.0, 30.0], |
| 227 | + 'ms_level': [1, 1, 1, 1, 1, 1] |
| 228 | +}) |
| 229 | + |
| 230 | +exp = Py_MSExperiment.from_dataframe(df) |
| 231 | +``` |
| 232 | + |
| 233 | +### Analysis Workflow |
| 234 | + |
| 235 | +```python |
| 236 | +from openms_python import Py_MSExperiment |
| 237 | +import pandas as pd |
| 238 | +import matplotlib.pyplot as plt |
| 239 | + |
| 240 | +# Load data |
| 241 | +exp = Py_MSExperiment.from_file('data.mzML') |
| 242 | + |
| 243 | +# Get MS2 spectra as DataFrame |
| 244 | +df_ms2 = exp.to_dataframe(include_peaks=True, ms_level=2) |
| 245 | + |
| 246 | +# Analyze precursor distribution |
| 247 | +precursor_stats = df_ms2.groupby('precursor_mz').agg({ |
| 248 | + 'intensity': 'sum', |
| 249 | + 'spectrum_index': 'count' |
| 250 | +}).rename(columns={'spectrum_index': 'n_spectra'}) |
| 251 | + |
| 252 | +print(precursor_stats.head()) |
| 253 | + |
| 254 | +# Plot TIC over time |
| 255 | +df_spectra = exp.to_dataframe(include_peaks=False) |
| 256 | +plt.figure(figsize=(10, 4)) |
| 257 | +plt.plot(df_spectra['retention_time'], df_spectra['total_ion_current']) |
| 258 | +plt.xlabel('Retention Time (s)') |
| 259 | +plt.ylabel('Total Ion Current') |
| 260 | +plt.title('TIC over time') |
| 261 | +plt.show() |
| 262 | +``` |
| 263 | + |
| 264 | +## API Reference |
| 265 | + |
| 266 | +### Py_MSExperiment |
| 267 | + |
| 268 | +**Properties:** |
| 269 | +- `n_spectra`: Number of spectra |
| 270 | +- `rt_range`: Tuple of (min_rt, max_rt) |
| 271 | +- `ms_levels`: Set of MS levels present |
| 272 | + |
| 273 | +**Methods:** |
| 274 | +- `from_file(filepath)`: Load from mzML file (class method) |
| 275 | +- `from_dataframe(df, group_by)`: Create from DataFrame (class method) |
| 276 | +- `to_file(filepath)`: Save to mzML file |
| 277 | +- `to_dataframe(include_peaks, ms_level)`: Convert to DataFrame |
| 278 | +- `ms1_spectra()`: Iterator over MS1 spectra |
| 279 | +- `ms2_spectra()`: Iterator over MS2 spectra |
| 280 | +- `spectra_by_level(level)`: Iterator over specific MS level |
| 281 | +- `spectra_in_rt_range(min_rt, max_rt)`: Iterator over RT range |
| 282 | +- `filter_by_ms_level(level)`: Filter by MS level |
| 283 | +- `filter_by_rt(min_rt, max_rt)`: Filter by RT range |
| 284 | +- `filter_top_n_peaks(n)`: Keep top N peaks per spectrum |
| 285 | +- `summary()`: Get summary statistics |
| 286 | +- `print_summary()`: Print formatted summary |
| 287 | + |
| 288 | +### Py_MSSpectrum |
| 289 | + |
| 290 | +**Properties:** |
| 291 | +- `retention_time`: RT in seconds |
| 292 | +- `ms_level`: MS level (1, 2, etc.) |
| 293 | +- `is_ms1`, `is_ms2`: Boolean helpers |
| 294 | +- `precursor_mz`: Precursor m/z (MS2+) |
| 295 | +- `precursor_charge`: Precursor charge (MS2+) |
| 296 | +- `native_id`: Native spectrum ID |
| 297 | +- `total_ion_current`: Sum of intensities |
| 298 | +- `base_peak_mz`: m/z of most intense peak |
| 299 | +- `base_peak_intensity`: Intensity of base peak |
| 300 | +- `peaks`: Tuple of (mz_array, intensity_array) |
| 301 | + |
| 302 | +**Methods:** |
| 303 | +- `from_dataframe(df, **metadata)`: Create from DataFrame (class method) |
| 304 | +- `to_dataframe()`: Convert to DataFrame |
| 305 | +- `filter_by_mz(min_mz, max_mz)`: Filter peaks by m/z |
| 306 | +- `filter_by_intensity(min_intensity)`: Filter peaks by intensity |
| 307 | +- `top_n_peaks(n)`: Keep top N peaks |
| 308 | +- `normalize_intensity(max_value)`: Normalize intensities |
| 309 | + |
| 310 | +## Development |
| 311 | + |
| 312 | +### Setup Development Environment |
| 313 | + |
| 314 | +```bash |
| 315 | +git clone https://github.com/openms/openms-python.git |
| 316 | +cd openms-python |
| 317 | +pip install -e ".[dev]" |
| 318 | +``` |
| 319 | + |
| 320 | +## Comparison with pyOpenMS |
| 321 | + |
| 322 | +| Feature | pyOpenMS | openms-python | |
| 323 | +|---------|----------|---------------| |
| 324 | +| Get spectrum count | `exp.getNrSpectra()` | `len(exp)` | |
| 325 | +| Get retention time | `spec.getRT()` | `spec.retention_time` | |
| 326 | +| Check MS1 | `spec.getMSLevel() == 1` | `spec.is_ms1` | |
| 327 | +| Load file | `MzMLFile().load(path, exp)` | `exp = MSExperiment.from_file(path)` | |
| 328 | +| Iterate MS1 | Manual loop + level check | `for spec in exp.ms1_spectra():` | |
| 329 | +| Peak data | `peaks = spec.get_peaks(); mz = peaks[0]` | `mz, intensity = spec.peaks` | |
| 330 | +| DataFrame | Not available | `df = exp.to_dataframe()` | |
| 331 | + |
| 332 | +## Contributing |
| 333 | + |
| 334 | +Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change. |
| 335 | + |
| 336 | +## License |
| 337 | + |
| 338 | +This project is licensed under the BSD-3-Clause License - see the [LICENSE](LICENSE) file for details. |
| 339 | + |
| 340 | +## Acknowledgments |
| 341 | + |
| 342 | +- Built on top of the excellent [pyOpenMS](https://pyopenms.readthedocs.io/) library |
| 343 | +- Part of the [OpenMS](https://www.openms.de/) ecosystem |
| 344 | + |
| 345 | +## Citation |
| 346 | + |
| 347 | +If you use openms-python in your research, please cite: |
| 348 | + |
| 349 | +``` |
| 350 | +Röst HL, Sachsenberg T, Aiche S, et al. OpenMS: a flexible open-source software platform |
| 351 | +for mass spectrometry data analysis. Nat Methods. 2016;13(9):741-748. |
| 352 | +``` |
| 353 | + |
| 354 | +## Support |
| 355 | + |
| 356 | +- **Documentation**: [https://openms-python.readthedocs.io](https://openms-python.readthedocs.io) |
| 357 | +- **Issues**: [GitHub Issues](https://github.com/openms/openms-python/issues) |
| 358 | +- **Discussions**: [GitHub Discussions](https://github.com/openms/openms-python/discussions) |
0 commit comments