I've successfully created a complete, pip-installable Python package called CoreCut that extracts structural cores from protein families using Foldseek alignments.
corecut/
├── corecut/ # Main package
│ ├── __init__.py # Package initialization
│ ├── cli.py # Command-line interface
│ ├── core_extractor.py # Core extraction logic
│ └── foldseek_utils.py # Foldseek interaction utilities
├── tests/ # Test suite
│ ├── __init__.py
│ ├── test_core_extractor.py
│ └── test_foldseek_utils.py
├── examples/ # Usage examples
│ └── usage_example.py
├── demo/ # Demonstration script
│ └── demo.py
├── dist/ # Built packages
│ ├── corecut-0.1.0.tar.gz
│ └── corecut-0.1.0-py3-none-any.whl
├── README.md # Comprehensive documentation
├── CHANGELOG.md # Version history
├── LICENSE # MIT license
├── pyproject.toml # Modern Python packaging config
├── setup.py # Fallback setup configuration
└── MANIFEST.in # Package manifest
cd /home/cactuskid/projects/corecut
pip install -e .pip install dist/corecut-0.1.0-py3-none-any.whlpip install corecut- Foldseek: Must be installed separately and accessible in PATH
- Install via conda:
conda install -c conda-forge foldseek - Or from: https://github.com/deepmind/foldseek
- Install via conda:
- pandas >= 1.3.0
- numpy >= 1.20.0
- biopython >= 1.79
- tqdm >= 4.60.0
# Basic usage
corecut /path/to/pdb_files/
# With custom parameters
corecut /path/to/pdb_files/ \
--output-dir results/ \
--hit-thresh 0.9 \
--min-thresh 0.7
# Using existing Foldseek results
corecut /path/to/pdb_files/ \
--foldseek-results existing_results.m8from corecut import extract_core, run_foldseek_search
# Run Foldseek comparison
run_foldseek_search(
input_folder="/path/to/pdb/files",
output_path="foldseek_results.m8"
)
# Extract cores
extract_core(
resdf_path="foldseek_results.m8",
outfile="core_results.csv",
hitthresh=0.8,
minthresh=0.6
)core_extraction_results.csv- Core boundary datacore_structs/- Core region PDB filesnter_structs/- N-terminal region PDB filescter_structs/- C-terminal region PDB filesfoldseek_results.m8- Raw Foldseek alignments
,min,max,len
protein1,20,79,100
protein2,15,74,90
protein3,20,74,95
- Structure Comparison: Uses Foldseek for all-vs-all structural alignments
- Alignment Analysis: Maps alignment regions to identify conserved positions
- Core Definition: Finds positions aligned in ≥ hit_thresh proportion of structures
- Fallback Logic: If no core found, uses min_thresh as cutoff
- Structure Extraction: Uses BioPython to extract and save core/terminal regions
✅ Complete pip package with proper structure
✅ Command-line tool with comprehensive options
✅ Python library for programmatic use
✅ Automatic dependency management
✅ Comprehensive documentation
✅ Test suite with pytest
✅ Example scripts and demonstrations
✅ Error handling and progress reporting
✅ Flexible configuration options
✅ Modern packaging (pyproject.toml + setup.py)
- ✅ Package installs correctly
- ✅ Command-line interface works
- ✅ All tests pass
- ✅ Package builds for distribution
- ✅ Demo script shows functionality
- ✅ Comprehensive documentation
- Create GitHub repository
- Set up CI/CD (GitHub Actions)
- Publish to PyPI:
twine upload dist/* - Add more tests for edge cases
- Create Docker image with Foldseek included
- Add conda package recipe
The CoreCut package is now ready for use and distribution!