Skip to content

Latest commit

 

History

History
159 lines (129 loc) · 4.46 KB

File metadata and controls

159 lines (129 loc) · 4.46 KB

CoreCut Package - Installation and Usage Summary

What Was Created

I've successfully created a complete, pip-installable Python package called CoreCut that extracts structural cores from protein families using Foldseek alignments.

Package Structure

corecut/
├── corecut/                     # Main package
│   ├── __init__.py             # Package initialization
│   ├── cli.py                  # Command-line interface
│   ├── core_extractor.py       # Core extraction logic
│   └── foldseek_utils.py       # Foldseek interaction utilities
├── tests/                      # Test suite
│   ├── __init__.py
│   ├── test_core_extractor.py
│   └── test_foldseek_utils.py
├── examples/                   # Usage examples
│   └── usage_example.py
├── demo/                       # Demonstration script
│   └── demo.py
├── dist/                       # Built packages
│   ├── corecut-0.1.0.tar.gz
│   └── corecut-0.1.0-py3-none-any.whl
├── README.md                   # Comprehensive documentation
├── CHANGELOG.md               # Version history
├── LICENSE                    # MIT license
├── pyproject.toml            # Modern Python packaging config
├── setup.py                  # Fallback setup configuration
└── MANIFEST.in               # Package manifest

Installation

Method 1: From Source (Current)

cd /home/cactuskid/projects/corecut
pip install -e .

Method 2: From Built Package

pip install dist/corecut-0.1.0-py3-none-any.whl

Method 3: When Published (Future)

pip install corecut

Dependencies

External Requirements

Python Dependencies (Auto-installed)

  • pandas >= 1.3.0
  • numpy >= 1.20.0
  • biopython >= 1.79
  • tqdm >= 4.60.0

Usage

Command Line Interface

# Basic usage
corecut /path/to/pdb_files/

# With custom parameters
corecut /path/to/pdb_files/ \
    --output-dir results/ \
    --hit-thresh 0.9 \
    --min-thresh 0.7

# Using existing Foldseek results
corecut /path/to/pdb_files/ \
    --foldseek-results existing_results.m8

Programmatic Usage

from corecut import extract_core, run_foldseek_search

# Run Foldseek comparison
run_foldseek_search(
    input_folder="/path/to/pdb/files",
    output_path="foldseek_results.m8"
)

# Extract cores
extract_core(
    resdf_path="foldseek_results.m8",
    outfile="core_results.csv",
    hitthresh=0.8,
    minthresh=0.6
)

Output

Files Created

  • core_extraction_results.csv - Core boundary data
  • core_structs/ - Core region PDB files
  • nter_structs/ - N-terminal region PDB files
  • cter_structs/ - C-terminal region PDB files
  • foldseek_results.m8 - Raw Foldseek alignments

CSV Format

,min,max,len
protein1,20,79,100
protein2,15,74,90
protein3,20,74,95

Algorithm Overview

  1. Structure Comparison: Uses Foldseek for all-vs-all structural alignments
  2. Alignment Analysis: Maps alignment regions to identify conserved positions
  3. Core Definition: Finds positions aligned in ≥ hit_thresh proportion of structures
  4. Fallback Logic: If no core found, uses min_thresh as cutoff
  5. Structure Extraction: Uses BioPython to extract and save core/terminal regions

Features

Complete pip package with proper structure
Command-line tool with comprehensive options
Python library for programmatic use
Automatic dependency management
Comprehensive documentation
Test suite with pytest
Example scripts and demonstrations
Error handling and progress reporting
Flexible configuration options
Modern packaging (pyproject.toml + setup.py)

Quality Assurance

  • ✅ Package installs correctly
  • ✅ Command-line interface works
  • ✅ All tests pass
  • ✅ Package builds for distribution
  • ✅ Demo script shows functionality
  • ✅ Comprehensive documentation

Next Steps for Distribution

  1. Create GitHub repository
  2. Set up CI/CD (GitHub Actions)
  3. Publish to PyPI: twine upload dist/*
  4. Add more tests for edge cases
  5. Create Docker image with Foldseek included
  6. Add conda package recipe

The CoreCut package is now ready for use and distribution!