Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
176 changes: 0 additions & 176 deletions CONFIG_IMPROVEMENT.md
Original file line number Diff line number Diff line change
@@ -1,176 +0,0 @@
# MuMDIA Configuration Management Improvement

## The Problem
The original `run.py` had extremely complex argument parsing and configuration management:

- **~100 lines** of argument parsing code
- **Complex merging logic** between CLI args and config files
- **Difficult to maintain** argument parsing functions
- **Hard to understand** configuration flow
- **Manual config validation** and error handling

## The Solution: Simplified Configuration with Dataclasses

### Before (Complex approach in `run.py`):

```python
def parse_arguments() -> Tuple[argparse.ArgumentParser, argparse.Namespace]:
"""Parse command line arguments - 50+ lines of argparse setup"""
parser = argparse.ArgumentParser()
parser.add_argument("--mzml_file", type=str, help="Path to mzML file", default=None)
parser.add_argument("--fasta_file", type=str, help="Path to FASTA file", default=None)
# ... 30+ more arguments ...
args = parser.parse_args()
return parser, args

def was_arg_explicitly_provided(parser: argparse.ArgumentParser, arg_name: str) -> bool:
"""Check if argument was explicitly provided - complex logic"""
for action in parser._actions:
if arg_name in action.dest:
for option in action.option_strings:
if option in sys.argv:
return True
return False

def modify_config(parser: argparse.ArgumentParser, args: argparse.Namespace, config_path: str) -> Dict[str, Any]:
"""Load and modify config - 50+ lines of complex merging logic"""
# Load base config from JSON
# Override with CLI args using complex checking
# Save effective config
# Return merged config dict

# In main():
parser, args = parse_arguments()
config = modify_config(parser, args, args.config_file)
# Extract all individual values from config dict
mzml_file = config["mzml_file"]
fasta_file = config["fasta_file"]
# ... dozens more extractions ...
```

### After (Clean dataclass approach in `config_manager_clean.py`):

```python
@dataclass
class MuMDIAConfig:
"""Clean, type-safe configuration with defaults"""
mzml_file: str = ""
fasta_file: str = ""
result_dir: str = "results"
n_windows: int = 10
training_fdr: float = 0.05
final_fdr: float = 0.01
model_type: str = "xgboost"
no_cache: bool = False
clean: bool = False
sage_only: bool = False
# ... all options clearly defined

@classmethod
def from_json(cls, json_path: str) -> "MuMDIAConfig":
"""Simple JSON loading with error handling"""

@classmethod
def from_args(cls, args=None) -> "MuMDIAConfig":
"""Simple CLI parsing with config file support"""

def validate(self) -> None:
"""Clean validation logic"""

def get_config() -> MuMDIAConfig:
"""One-liner to get validated config"""
config = MuMDIAConfig.from_args()
config.validate()
return config

# In main():
config = get_config() # That's it! One line replaces 100+ lines!
# Direct attribute access:
print(config.mzml_file)
print(config.n_windows)
```

## Key Improvements

### 1. **Dramatic Code Reduction**
- **Before**: ~100 lines of complex parsing logic
- **After**: 1 line: `config = get_config()`
- **Reduction**: 99% fewer lines for config management!

### 2. **Type Safety**
- **Before**: Untyped dictionary access like `config["mzml_file"]`
- **After**: Type-safe attribute access like `config.mzml_file`
- **Benefit**: IDE autocomplete, type checking, fewer runtime errors

### 3. **Clear Defaults**
- **Before**: Defaults scattered across argparse definitions
- **After**: All defaults clearly visible in dataclass definition
- **Benefit**: Easy to see and modify default values

### 4. **Better Error Handling**
- **Before**: Manual validation scattered throughout code
- **After**: Centralized validation in `validate()` method
- **Benefit**: Consistent error messages and validation logic

### 5. **Simpler Usage Patterns**

#### Command line usage:
```bash
# Simple usage
python run_simple.py --mzml_file data.mzML --fasta_file proteins.fasta

# With options
python run_simple.py --mzml_file data.mzML --fasta_file proteins.fasta --n_windows 5 --verbose --no-cache

# With config file
python run_simple.py --config_file my_config.json

# Config file with CLI overrides
python run_simple.py --config_file my_config.json --clean --result_dir custom_results
```

#### JSON config files:
```json
{
"mzml_file": "data.mzML",
"fasta_file": "proteins.fasta",
"result_dir": "my_results",
"n_windows": 15,
"training_fdr": 0.1,
"model_type": "nn",
"verbose": true
}
```

## Implementation Status

✅ **Created**: `config_manager_clean.py` - Complete simplified config system
✅ **Created**: `run_simple.py` - Demo of clean config usage
✅ **Created**: `config_demo.py` - Working demonstration
✅ **Tested**: Both CLI arguments and JSON config files work perfectly

## Migration Path

The new config system is **fully backwards compatible**. You can:

1. **Immediate adoption**: Use `config_manager_clean.py` for new features
2. **Gradual migration**: Replace complex config logic piece by piece
3. **Side-by-side**: Run both systems during transition period

## Developer Experience Impact

**Before**: Developers had to:
- Navigate 100+ lines of complex argument parsing
- Understand config merging logic
- Manually handle validation
- Debug dictionary key errors
- Maintain scattered default values

**After**: Developers can:
- See all configuration at a glance in the dataclass
- Get IDE support with autocomplete and type checking
- Add new config options by just adding a dataclass field
- Trust that validation is centralized and consistent
- Focus on business logic instead of configuration plumbing

The new approach makes MuMDIA **much more maintainable and developer-friendly**!
Empty file added CONFIG_SIMPLIFIED_COMPLETE.md
Empty file.
Loading