This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
MGST (Mikunn Galactic Search Tool) is an Elite Dangerous galaxy analysis toolkit built as a modular Python package. It provides flexible JSON pattern-based filtering with multiple search modes (corridor, sectors, galaxy) and database management tools. The project is designed around a simple, powerful pattern matching system that allows complex searches without modifying core code.
# Create and activate base environment (automatically installs MGST)
micromamba create -f environment.yml
micromamba activate mgst
# Or create development environment (includes testing and linting tools)
micromamba create -f environment-dev.yml
micromamba activate mgst-dev# Run all tests with coverage
pytest
# Run specific test file
pytest tests/test_core/test_filtering.py
# Run tests with verbose output
pytest -v
# Run tests matching pattern
pytest -k "test_pattern"# Format code with black
black src/ tests/
# Type checking with mypy
mypy src/
# Lint with flake8
flake8 src/
# Run pre-commit hooks
pre-commit run --all-files# Compress existing sector database (saves 83.6% space)
python scripts/compress_sector_database.py
# Build sector index for efficient corridor searches
python scripts/build_sector_index.py \
--source Databases/galaxy_sectors_compressed \
--target Databases/galaxy_sectors_compressed \
--workers 12Database Compression Benefits:
- Space Savings: 83.6% reduction (609.8GB → 99.9GB for full galaxy database)
- Performance: Maintains streaming speed with 64MB decompression buffers
- Compatibility: All existing commands work transparently with compressed databases
- Dual Support: System automatically detects and uses compressed files when available
Sector Index Benefits:
- Efficient Corridor Searches: Maps sector center coordinates for fast spatial prefiltering
- Low Memory Usage: <1GB RAM regardless of database size
- Parallel Search: Enables multi-threaded sector-based corridor searches
- Incremental Updates: Rebuild index quickly after database updates
The package follows a layered architecture with clear separation of concerns:
src/mgst/core/ - Core processing engines
filtering.py- High-performance parallel galaxy data filtering with JSONL streamingspatial.py- Spatial prefiltering and corridor search optimizationsearch_modes.py- Search mode implementations (corridor, sectors, galaxy)
src/mgst/configs/ - Pattern matching system
json_pattern.py- JSON pattern matching enginepattern_validator.py- Pattern validation and error handling- Pattern files in
patterns/directory define search criteria
src/mgst/data/ - Data processing utilities
loaders.py- Data loading utilities with validation and batch processingvalidators.py- Comprehensive data validation for system and body datacompressed_reader.py- Transparent gzip compression support with streaming decompressionindexed_reader.py- Indexed database reader for efficient sector-based searches
src/mgst/cli/ - Command-line interfaces
main.py- Main entry point with subcommands (filter, db)filter.py- Filter command with all search modesdatabase.py- Database construction and management commands
JSON Pattern Matching: The JSONPatternConfig class enables adding new search criteria by creating JSON files that define:
- System name patterns (with wildcards)
- Body criteria (type, atmosphere, temperature, gravity, etc.)
- Parent-child relationships (moons, rings)
- Logical combinations (AND/OR)
Multiple Search Modes: The filtering system supports:
- Galaxy: Search entire galaxy
- Sectors: Search specific named sectors
- Corridor: Search cylindrical corridor between two coordinates
- Pattern: Generic pattern-based search
Parallel Processing: The filtering system uses ProcessPoolExecutor with memory-efficient JSONL streaming to handle millions of systems across multiple worker processes.
Memory Management: Large datasets are processed using:
- JSONL streaming with configurable chunk sizes
- Garbage collection between processing chunks
- Sector-level file organization
Compressed Database Support: The system supports transparent gzip compression for massive space savings:
- Automatic detection of compressed (.jsonl.gz) and uncompressed (.jsonl) files
- Streaming decompression maintaining memory efficiency with 50GB+ files
- 83.6% space reduction (609.8GB → 99.9GB) with production galaxy databases
- Compatible with all existing processing: filtering, spatial prefiltering
- Use compressed databases with:
mgst filter --database Databases/galaxy_sectors_compressed ...
Indexed Database Architecture: For efficient sector-based searching:
- Database Structure: Sector-level JSONL.gz files (one per sector, ~12,000 files total)
- Lightweight Index: JSON file with sector center coordinates for spatial prefiltering
- Memory Efficient: Index uses <1GB RAM regardless of database size
- Parallel Search:
SectorResolverenables multi-threaded sector-based searches - Search Strategy: For corridor searches, use sector index to identify nearby sectors, then scan only those sector files
- Performance: With 12 workers processing different sectors in parallel, searches remain very fast
- Easy Updates: Update individual sector files, then rebuild index quickly
Patterns are JSON files that define search criteria:
{
"description": "Supply Hub Candidates - ELW or Water World with rocky moon",
"name": "*",
"bodies": [
{
"comment": "Match ELW or Water World (the parent planet)",
"subType": ["Earth-like world", "Water world"]
},
{
"comment": "Match rocky moon orbiting the ELW/Water World",
"subType": "Rocky body",
"parents": [{"Planet": "*"}]
}
]
}Pattern files are used with the filter command:
mgst filter --mode corridor \
--start "-468,-92,4474" \
--end "-575,-37,5142" \
--radius 500 \
--pattern-file patterns/supply_hub_1.json \
--database Databases/galaxy_sectors_compressed \
--output output/supply_hubs_001/results.jsonlIMPORTANT: All analysis runs should be organized into unique subdirectories within the output/ directory to prevent file conflicts and maintain clean organization. Each run should use a descriptive subdirectory name that includes:
- Run identifier: Sequential number, timestamp, or descriptive name
- Search type: Corridor, sector, or galaxy search
- Purpose: Brief description of the search criteria
Recommended naming patterns:
# Sequential runs with descriptive names
output/corridor_lagoon_trifid_001/
output/sector_search_test_002/
output/supply_hub_candidates_20251023/
# Timestamp-based runs
output/corridor_search_$(date +%Y%m%d_%H%M%S)/
# Purpose-based
output/supply_hub_validation/
output/exploration_route_planning/Example commands with proper output organization:
# Corridor search with organized output
mgst filter --mode corridor \
--start "0,0,0" --end "1000,0,0" --radius 500 \
--pattern-file patterns/interesting_systems.json \
--database Databases/galaxy_sectors_compressed \
--output output/corridor_test_$(date +%Y%m%d)/results.jsonl \
--workers 12
# Sector search
mgst filter --mode sectors \
--sectors "Lagoon_Sector,Trifid_Sector" \
--pattern-file patterns/supply_hub_1.json \
--database Databases/galaxy_sectors_compressed \
--output output/lagoon_trifid_supply_$(date +%Y%m%d)/results.tsvIMPORTANT: The filtering system automatically creates comprehensive log files in each output subdirectory:
-
stdin.txt: Complete command information including:- Timestamp and full command line
- All arguments and parameters used
- Working directory and output paths
- Relevant environment variables
- Python path and virtual environment info
-
stdout.txt: Complete output log capturing:- All console output during processing
- Configuration details and validation
- Processing statistics and progress
- Error messages and warnings
- Final results summary
-
stderr.txt: Progress tracking:- Progress bar updates
- Worker status information
-
*_search_metadata.json: Search parameters:- Search mode and coordinates
- Pattern file used
- Database files searched
- Timestamp and configuration
These log files enable full reproducibility and debugging of any analysis run. Each output subdirectory becomes a complete record of the analysis performed.
- Input Processing: JSONL.gz files are read with streaming decompression across multiple worker processes
- Pattern Matching: Each system passes through the JSON pattern matcher
- Output Generation: Qualifying systems are written to TSV/JSONL with automatic logging
- Spatial Optimization: Corridor searches use sector-level spatial indexing for efficiency
The test suite uses pytest with fixtures for sample data. Key testing patterns:
- Pattern testing with sample system data
- Integration testing for the complete filtering pipeline
- Search mode validation ensuring correct sector/corridor filtering
- Data validation testing for all supported file formats
The CLI follows a simple two-command pattern:
- Main entry point (
mgst) with subcommands:filteranddb - Consistent error handling with user-friendly messages
- Progress bars and verbose logging options
- Dry-run modes for testing searches
The modular design allows adding new search modes by extending search_modes.py and new pattern features by updating json_pattern.py.
mgst filter --mode corridor \
--start "-468,-92,4474" \
--end "-575,-37,5142" \
--radius 500 \
--pattern-file patterns/interesting_systems.json \
--database Databases/galaxy_sectors_compressed \
--output output/corridor_lagoon_trifid_$(date +%Y%m%d)/results.jsonl \
--workers 12mgst filter --mode sectors \
--sectors "Lagoon_Sector,Trifid_Sector,Omega_Sector" \
--pattern-file patterns/supply_hub_1.json \
--database Databases/galaxy_sectors_compressed \
--output output/multi_sector_$(date +%Y%m%d)/results.tsv \
--workers 8mgst filter --mode corridor \
--start "0,0,0" --end "1000,0,0" --radius 100 \
--pattern-file patterns/all_systems.json \
--database Databases/galaxy_sectors_compressed \
--dry-runThis will show which sector files would be searched without actually running the search.