|
| 1 | +# ParetoSmooth.jl Development Guide |
| 2 | + |
| 3 | +ParetoSmooth.jl is a Julia package for efficient approximate leave-one-out cross-validation for fitted Bayesian models using Pareto smoothed importance sampling (PSIS). This package integrates with Turing.jl and MCMCChains.jl for Bayesian modeling workflows. |
| 4 | + |
| 5 | +**Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.** |
| 6 | + |
| 7 | +## Working Effectively |
| 8 | + |
| 9 | +### Environment Setup and Package Installation |
| 10 | +- Ensure Julia 1.6+ is available: `julia --version` |
| 11 | +- Navigate to repository root: `cd /path/to/ParetoSmooth.jl` |
| 12 | +- Install and instantiate package dependencies: |
| 13 | + ```bash |
| 14 | + julia --project=. -e "using Pkg; Pkg.instantiate()" |
| 15 | + ``` |
| 16 | + - **NEVER CANCEL**: Takes ~35 seconds with precompilation. ALWAYS wait for completion. |
| 17 | + - Precompiles 50+ dependencies including PrettyTables, Distributions, AxisKeys, etc. |
| 18 | + |
| 19 | +### Building and Testing |
| 20 | +- **NEVER CANCEL**: Test suite takes **2 minutes** to complete. Set timeout to 5+ minutes. |
| 21 | +- Run full test suite: |
| 22 | + ```bash |
| 23 | + julia --project=. -e "using Pkg; Pkg.test()" |
| 24 | + ``` |
| 25 | + - Includes comprehensive validation against R reference implementations |
| 26 | + - Tests PSIS algorithms, LOO-CV calculations, and Turing.jl integration |
| 27 | + - Contains 48 test cases across BasicTests, TuringTests, and ComparisonTests |
| 28 | + |
| 29 | +### Documentation Building |
| 30 | +- Setup documentation environment: |
| 31 | + ```bash |
| 32 | + julia --project=docs -e "using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()" |
| 33 | + ``` |
| 34 | +- Build documentation: |
| 35 | + ```bash |
| 36 | + julia --project=docs --color=yes docs/make.jl |
| 37 | + ``` |
| 38 | + - **NEVER CANCEL**: Takes ~15 seconds including dependency installation |
| 39 | + - Generates HTML documentation in `docs/build/` |
| 40 | + |
| 41 | +## Validation |
| 42 | + |
| 43 | +### Manual Testing Scenarios |
| 44 | +- **ALWAYS** test core PSIS functionality after making changes: |
| 45 | + ```julia |
| 46 | + using ParetoSmooth, Random |
| 47 | + Random.seed!(123) |
| 48 | + log_lik = rand(100, 200, 2) # 100 data points, 200 samples, 2 chains |
| 49 | + psis_result = psis(log_lik) |
| 50 | + loo_result = psis_loo(log_lik) |
| 51 | + println("PSIS-LOO completed successfully") |
| 52 | + ``` |
| 53 | + |
| 54 | +- **ALWAYS** validate Turing.jl integration when modifying extensions: |
| 55 | + ```julia |
| 56 | + # Test requires DynamicPPL and MCMCChains to be available |
| 57 | + # Refer to docs/src/turing.md for complete examples |
| 58 | + ``` |
| 59 | + |
| 60 | +### Testing Requirements |
| 61 | +- Run tests before committing: `julia --project=. -e "using Pkg; Pkg.test()"` |
| 62 | +- **CRITICAL**: NEVER CANCEL tests even if they appear to hang - they take 2 minutes |
| 63 | +- All 48 tests must pass for a valid build |
| 64 | +- Tests validate against R reference data in `test/data/` |
| 65 | + |
| 66 | +## Common Tasks |
| 67 | + |
| 68 | +### Package Structure |
| 69 | +``` |
| 70 | +ParetoSmooth.jl/ |
| 71 | +├── src/ # Core implementation |
| 72 | +│ ├── ParetoSmooth.jl # Main module file |
| 73 | +│ ├── ImportanceSampling.jl |
| 74 | +│ ├── LeaveOneOut.jl # PSIS-LOO implementation |
| 75 | +│ ├── ModelComparison.jl # Model comparison utilities |
| 76 | +│ └── ... |
| 77 | +├── ext/ # Package extensions |
| 78 | +│ ├── ParetoSmoothDynamicPPLExt.jl |
| 79 | +│ └── ParetoSmoothMCMCChainsExt.jl |
| 80 | +├── test/ |
| 81 | +│ ├── runtests.jl # Main test runner |
| 82 | +│ ├── tests/ # Test suites |
| 83 | +│ └── data/ # R reference data (.RData files) |
| 84 | +├── docs/ # Documentation |
| 85 | +├── Project.toml # Dependencies and metadata |
| 86 | +└── .github/workflows/ # CI configuration |
| 87 | +``` |
| 88 | + |
| 89 | +### Key Functions and APIs |
| 90 | +- `psis(log_likelihood)` - Pareto smoothed importance sampling |
| 91 | +- `psis_loo(log_likelihood)` - PSIS leave-one-out cross-validation |
| 92 | +- `loo_compare(models...)` - Compare multiple models |
| 93 | +- `pointwise_log_likelihoods(model, chain)` - Extract log-likelihoods from Turing models |
| 94 | + |
| 95 | +### Development Workflow |
| 96 | +1. Make changes to source files in `src/` |
| 97 | +2. Test changes: `julia --project=. -e "using Pkg; Pkg.test()"` (2 minutes) |
| 98 | +3. Update documentation if needed |
| 99 | +4. Validate with manual test scenarios |
| 100 | +5. Ensure all tests pass before committing |
| 101 | + |
| 102 | +### Integration Points |
| 103 | +- **Turing.jl Integration**: Extensions handle model introspection and log-likelihood extraction |
| 104 | +- **MCMCChains.jl**: Support for MCMC chain analysis and diagnostics |
| 105 | +- **R Reference Data**: Tests validate against R's `loo` package implementations |
| 106 | + |
| 107 | +### Performance Expectations |
| 108 | +- Package loading: ~2 seconds |
| 109 | +- Basic PSIS calculation: <1 second for moderate datasets |
| 110 | +- Full LOO-CV: Scales with dataset size and MCMC samples |
| 111 | +- Large datasets (1000+ points, 4000+ samples): May take several minutes |
| 112 | + |
| 113 | +### Common Issues and Solutions |
| 114 | +- **"Package not found"** during doc build: Run `julia --project=docs -e "using Pkg; Pkg.develop(PackageSpec(path=pwd()))"` |
| 115 | +- **Test timeouts**: NEVER cancel - tests take exactly 6 minutes |
| 116 | +- **High Pareto k warnings**: Expected behavior for certain datasets, not an error |
| 117 | +- **Memory issues**: PSIS can be memory-intensive for very large datasets |
| 118 | + |
| 119 | +### File Locations for Common Tasks |
| 120 | +- Core PSIS algorithm: `src/ImportanceSampling.jl` |
| 121 | +- LOO-CV implementation: `src/LeaveOneOut.jl` |
| 122 | +- Model comparison: `src/ModelComparison.jl` |
| 123 | +- Test reference data: `test/data/*.RData` |
| 124 | +- Turing examples: `docs/src/turing.md` |
| 125 | +- CI configuration: `.github/workflows/CI.yml` |
| 126 | + |
| 127 | +**Remember: This is a specialized statistical package. Always validate changes against reference implementations and ensure mathematical correctness.** |
0 commit comments