Skip to content

Commit 56e28d8

Browse files
Merge branch 'master' into copilot/add-chromatogram-support
2 parents eee1eb8 + 77f199b commit 56e28d8

File tree

7 files changed

+1164
-87
lines changed

7 files changed

+1164
-87
lines changed

.github/copilot-instructions.md

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# GitHub Copilot Instructions for openms-python
2+
3+
## Repository Overview
4+
5+
`openms-python` is a Pythonic wrapper around pyOpenMS for mass spectrometry data analysis. The goal is to provide an intuitive, Python-friendly interface that makes working with mass spectrometry data feel natural for Python developers and data scientists.
6+
7+
**Key Principle**: Make pyOpenMS more Pythonic by wrapping verbose C++ bindings with intuitive Python APIs.
8+
9+
## Code Style and Conventions
10+
11+
### Python Style
12+
- Follow PEP 8 conventions
13+
- Use Black formatter with 100 character line length (configured in `pyproject.toml`)
14+
- Target Python 3.8+ compatibility
15+
- Use type hints for better IDE support and code clarity
16+
- Prefer clear, descriptive names over abbreviations
17+
18+
### Wrapper Design Patterns
19+
20+
1. **Properties over getters/setters**: Use `@property` decorators instead of verbose get/set methods
21+
```python
22+
# Good
23+
spec.retention_time
24+
# Avoid
25+
spec.getRT()
26+
```
27+
28+
2. **Pythonic iteration**: Support Python's iteration protocols (`__iter__`, `__len__`, `__getitem__`)
29+
```python
30+
for spec in experiment.ms1_spectra():
31+
print(spec.retention_time)
32+
```
33+
34+
3. **Method chaining**: Return `self` from mutation methods to enable fluent interfaces
35+
```python
36+
exp.filter_by_ms_level(1).filter_by_rt(100, 500)
37+
```
38+
39+
4. **DataFrame integration**: Provide `to_dataframe()` and `from_dataframe()` methods for pandas interoperability
40+
41+
5. **Context managers**: Support `with` statements for file I/O operations
42+
43+
6. **Mapping interface for metadata**: Classes wrapping `MetaInfoInterface` should support dict-like access
44+
```python
45+
feature["label"] = "sample_a"
46+
```
47+
48+
### Class Naming Convention
49+
- Wrapper classes use the `Py_` prefix (e.g., `Py_MSExperiment`, `Py_FeatureMap`)
50+
- This distinguishes them from pyOpenMS classes while maintaining recognizability
51+
52+
### File Organization
53+
- Core wrapper classes: `py_*.py` files (e.g., `py_msexperiment.py`, `py_featuremap.py`)
54+
- I/O utilities: `io.py` and `_io_utils.py`
55+
- Helper utilities: `_meta_mapping.py` for metadata handling
56+
- Workflow helpers: `workflows.py` for high-level pipelines
57+
- Example data: `examples/` directory contains sample files like `small.mzML`
58+
59+
## Testing Requirements
60+
61+
### Test Structure
62+
- All tests in `tests/` directory
63+
- Test files follow `test_*.py` naming convention
64+
- Use pytest as the testing framework
65+
- Aim for good coverage of wrapper functionality
66+
67+
### Running Tests
68+
```bash
69+
# Install development dependencies
70+
pip install -e ".[dev]"
71+
72+
# Run all tests
73+
pytest -v
74+
75+
# Run with coverage
76+
pytest -v --cov=openms_python --cov-report=term-missing
77+
```
78+
79+
### Test Patterns
80+
- Test basic wrapper functionality (properties, methods)
81+
- Test DataFrame conversions (to/from)
82+
- Test file I/O (load/store operations)
83+
- Test iteration and filtering
84+
- Test method chaining
85+
- Use `conftest.py` for shared fixtures
86+
87+
## Development Setup
88+
89+
### Installation
90+
```bash
91+
git clone https://github.com/openms/openms-python.git
92+
cd openms-python
93+
pip install -e ".[dev]"
94+
```
95+
96+
### Dependencies
97+
- **Core**: pyopenms (>=3.0.0), pandas (>=1.3.0), numpy (>=1.20.0)
98+
- **Dev**: pytest, pytest-cov, black, flake8, mypy
99+
100+
### Code Formatting
101+
```bash
102+
# Format code with Black
103+
black openms_python tests
104+
105+
# Check style with flake8
106+
flake8 openms_python tests
107+
```
108+
109+
## Key Architecture Patterns
110+
111+
### 1. Wrapper Pattern
112+
Most classes wrap a corresponding pyOpenMS class and delegate to it while providing Pythonic interfaces:
113+
```python
114+
class Py_MSExperiment:
115+
def __init__(self, exp=None):
116+
self._exp = exp if exp is not None else oms.MSExperiment()
117+
118+
@property
119+
def retention_time(self):
120+
return self._exp.getRT()
121+
```
122+
123+
### 2. Factory Methods
124+
Use class methods for alternative constructors:
125+
```python
126+
@classmethod
127+
def from_file(cls, filepath):
128+
# Load from file and return new instance
129+
130+
@classmethod
131+
def from_dataframe(cls, df):
132+
# Create from pandas DataFrame
133+
```
134+
135+
### 3. Smart Filtering
136+
Provide multiple ways to filter data:
137+
- Method-based: `filter_by_rt(min_rt, max_rt)`
138+
- Property-based: `rt_filter[min:max]`
139+
- Iterator-based: `ms1_spectra()`, `ms2_spectra()`
140+
141+
### 4. Metadata Handling
142+
Classes that wrap `MetaInfoInterface` should implement mapping protocol:
143+
- `__getitem__`, `__setitem__`, `__delitem__`
144+
- `__contains__`, `__iter__`, `__len__`
145+
- `get()`, `pop()`, `update()` methods
146+
147+
## Common Tasks
148+
149+
### Adding a New Wrapper Class
150+
1. Create a new `py_<classname>.py` file
151+
2. Wrap the corresponding pyOpenMS class
152+
3. Add Pythonic properties for common getters/setters
153+
4. Implement `__len__`, `__iter__`, `__getitem__` if applicable
154+
5. Add `to_dataframe()` and `from_dataframe()` if appropriate
155+
6. Add `load()` and `store()` methods for file I/O
156+
7. Write comprehensive tests in `tests/test_py_<classname>.py`
157+
8. Update `__init__.py` to export the new class
158+
9. Add examples to README.md
159+
160+
### Adding Helper Functions
161+
- High-level workflow functions go in `workflows.py`
162+
- I/O utilities go in `io.py` or `_io_utils.py`
163+
- Metadata utilities go in `_meta_mapping.py`
164+
165+
### Documentation
166+
- Add docstrings to all public classes and methods
167+
- Include usage examples in docstrings
168+
- Update README.md with new features
169+
- Keep API reference section in README current
170+
171+
## Special Considerations
172+
173+
### Memory Management
174+
- Be mindful of memory when working with large datasets
175+
- Provide streaming alternatives for large files (see `stream_mzml`)
176+
- Consider using generators for iteration over large collections
177+
178+
### pyOpenMS Compatibility
179+
- The package depends on pyOpenMS >= 3.0.0
180+
- When wrapping pyOpenMS classes, preserve all functionality
181+
- Add convenience methods but don't remove or break existing capabilities
182+
183+
### Error Handling
184+
- Provide clear, helpful error messages
185+
- Validate inputs before passing to pyOpenMS
186+
- Handle common edge cases (empty containers, missing files, etc.)
187+
188+
### Performance
189+
- Wrapper overhead should be minimal
190+
- Avoid unnecessary data copies
191+
- Use NumPy arrays for peak data when possible
192+
- Consider performance implications of DataFrame conversions
193+
194+
## Examples and Documentation
195+
196+
The README.md contains extensive examples. When adding new features:
197+
1. Add code examples showing the improvement over pyOpenMS
198+
2. Use "Before (pyOpenMS)" vs "After (openms-python)" format
199+
3. Include practical use cases
200+
4. Show integration with pandas/numpy when relevant
201+
202+
## CI/CD
203+
204+
The repository uses GitHub Actions for continuous integration:
205+
- Workflow: `.github/workflows/integration-tests.yml`
206+
- Runs on: Python 3.10 (configurable via matrix)
207+
- Tests run automatically on push to main and on pull requests
208+
209+
## Contributing Guidelines
210+
211+
When contributing:
212+
1. Make minimal, focused changes
213+
2. Maintain backward compatibility unless explicitly breaking
214+
3. Add tests for new functionality
215+
4. Format code with Black
216+
5. Ensure all tests pass
217+
6. Update documentation as needed
218+
219+
## Questions or Issues?
220+
221+
- Check existing documentation in README.md
222+
- Review existing wrapper implementations for patterns
223+
- Look at test files for usage examples
224+
- Open a discussion on GitHub for design questions

README.md

Lines changed: 93 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,79 @@ normalized_tic = chrom.normalize_to_tic()
366366
chrom["sample_id"] = "Sample_A"
367367
chrom["replicate"] = 1
368368
print(chrom.get("sample_id"))
369+
370+
371+
### Ion Mobility Support
372+
373+
`openms-python` provides comprehensive support for ion mobility data through float data arrays and mobilograms.
374+
375+
#### Float Data Arrays
376+
377+
Spectra can have additional data arrays (e.g., ion mobility values) associated with each peak:
378+
379+
```python
380+
from openms_python import Py_MSSpectrum
381+
import pandas as pd
382+
import numpy as np
383+
384+
# Create a spectrum with ion mobility data
385+
df = pd.DataFrame({
386+
'mz': [100.0, 200.0, 300.0],
387+
'intensity': [50.0, 100.0, 75.0],
388+
'ion_mobility': [1.5, 2.3, 3.1]
389+
})
390+
391+
spec = Py_MSSpectrum.from_dataframe(df, retention_time=60.5, ms_level=1)
392+
393+
# Access ion mobility values
394+
print(spec.ion_mobility) # array([1.5, 2.3, 3.1])
395+
396+
# Set ion mobility values
397+
spec.ion_mobility = np.array([1.6, 2.4, 3.2])
398+
399+
# Convert to DataFrame with float arrays
400+
df = spec.to_dataframe(include_float_arrays=True)
401+
print(df)
402+
# mz intensity ion_mobility
403+
# 0 100.0 50.0 1.6
404+
# 1 200.0 100.0 2.4
405+
# 2 300.0 75.0 3.2
406+
```
407+
408+
#### Mobilograms
409+
410+
Mobilograms represent the ion mobility dimension, showing intensity vs. drift time for a specific m/z.
411+
412+
**Note:** OpenMS C++ has a native `Mobilogram` class that may not yet be wrapped in pyopenms. This wrapper uses `MSChromatogram` as the underlying representation for mobilogram data.
413+
414+
```python
415+
from openms_python import Py_Mobilogram
416+
import numpy as np
417+
418+
# Create a mobilogram from arrays
419+
drift_times = np.array([1.0, 1.5, 2.0, 2.5, 3.0])
420+
intensities = np.array([100.0, 150.0, 200.0, 180.0, 120.0])
421+
422+
mob = Py_Mobilogram.from_arrays(drift_times, intensities, mz=500.0)
423+
424+
print(f"m/z: {mob.mz}")
425+
print(f"Points: {len(mob)}")
426+
print(f"Base peak drift time: {mob.base_peak_drift_time}")
427+
428+
# Convert to DataFrame
429+
df = mob.to_dataframe()
430+
print(df.head())
431+
# drift_time intensity mz
432+
# 0 1.0 100.0 500.0
433+
# 1 1.5 150.0 500.0
434+
# 2 2.0 200.0 500.0
435+
436+
# Create from DataFrame
437+
df = pd.DataFrame({
438+
'drift_time': [1.0, 2.0, 3.0],
439+
'intensity': [50.0, 100.0, 75.0]
440+
})
441+
mob = Py_Mobilogram.from_dataframe(df, mz=600.0)
369442
```
370443

371444
## Workflow helpers
@@ -797,16 +870,34 @@ plt.show()
797870
- `base_peak_mz`: m/z of most intense peak
798871
- `base_peak_intensity`: Intensity of base peak
799872
- `peaks`: Tuple of (mz_array, intensity_array)
873+
- `float_data_arrays`: List of FloatDataArray objects
874+
- `ion_mobility`: Ion mobility values as NumPy array
875+
- `drift_time`: Spectrum-level drift time value
800876

801877
**Methods:**
802878
- `from_dataframe(df, **metadata)`: Create from DataFrame (class method)
803-
- `to_dataframe()`: Convert to DataFrame
879+
- `to_dataframe(include_float_arrays=True)`: Convert to DataFrame
804880
- `filter_by_mz(min_mz, max_mz)`: Filter peaks by m/z
805881
- `filter_by_intensity(min_intensity)`: Filter peaks by intensity
806882
- `top_n_peaks(n)`: Keep top N peaks
807883
- `normalize_intensity(max_value)`: Normalize intensities
808884

809-
- `normalize_intensity(max_value)`: Normalize intensities
885+
### Py_Mobilogram
886+
887+
**Properties:**
888+
- `name`: Name of the mobilogram
889+
- `mz`: m/z value this mobilogram represents
890+
- `drift_time`: Drift time values as NumPy array
891+
- `intensity`: Intensity values as NumPy array
892+
- `peaks`: Tuple of (drift_time_array, intensity_array)
893+
- `total_ion_current`: Sum of intensities
894+
- `base_peak_drift_time`: Drift time of most intense point
895+
- `base_peak_intensity`: Intensity of base peak
896+
897+
**Methods:**
898+
- `from_arrays(drift_time, intensity, mz=None, name=None)`: Create from arrays (class method)
899+
- `from_dataframe(df, **metadata)`: Create from DataFrame (class method)
900+
- `to_dataframe()`: Convert to DataFrame
810901

811902
### Identifications, ProteinIdentifications & PeptideIdentifications
812903

openms_python/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
from .py_msexperiment import Py_MSExperiment
2525
from .py_msspectrum import Py_MSSpectrum
2626
from .py_chromatogram import Py_MSChromatogram
27+
from .py_mobilogram import Py_Mobilogram
2728
from .py_feature import Py_Feature
2829
from .py_featuremap import Py_FeatureMap
2930
from .py_consensusmap import Py_ConsensusMap
@@ -101,6 +102,7 @@ def get_example(name: str, *, load: bool = False, target_dir: Union[str, Path, N
101102
"Py_MSExperiment",
102103
"Py_MSSpectrum",
103104
"Py_MSChromatogram",
105+
"Py_Mobilogram",
104106
"Py_Feature",
105107
"Py_FeatureMap",
106108
"Py_ConsensusMap",

0 commit comments

Comments
 (0)