|
| 1 | +# A-Lab Pipeline Configuration |
| 2 | + |
| 3 | +Configuration system with three-layer priority: |
| 4 | + |
| 5 | +``` |
| 6 | +1. Environment Variables (highest) → 2. YAML Files → 3. Code Defaults (fallback) |
| 7 | +``` |
| 8 | + |
| 9 | +## Quick Start |
| 10 | + |
| 11 | +### Using Environment Variables (Recommended for Production) |
| 12 | + |
| 13 | +**Option 1: Copy example file** |
| 14 | + |
| 15 | +```bash |
| 16 | +# Copy the example .env file (with current defaults) |
| 17 | +cp data/config/env.example .env |
| 18 | + |
| 19 | +# Edit .env and uncomment/modify values |
| 20 | +# Then source it before running |
| 21 | +source .env |
| 22 | +./update_data.sh |
| 23 | +``` |
| 24 | + |
| 25 | +**Option 2: Export directly** |
| 26 | + |
| 27 | +```bash |
| 28 | +# Set MongoDB connection |
| 29 | +export ALAB_MONGO_URI="mongodb://production-host:27017/" |
| 30 | +export ALAB_MONGO_DB="production" |
| 31 | + |
| 32 | +# Set S3 bucket |
| 33 | +export ALAB_S3_BUCKET="my-custom-bucket" |
| 34 | + |
| 35 | +# Run pipeline (uses env vars automatically) |
| 36 | +./update_data.sh |
| 37 | +``` |
| 38 | + |
| 39 | +### Using YAML Files (Recommended for Development) |
| 40 | + |
| 41 | +Edit `data/config/defaults.yaml`: |
| 42 | + |
| 43 | +```yaml |
| 44 | +mongodb: |
| 45 | + uri: 'mongodb://localhost:27017/' |
| 46 | + database: 'my_database' |
| 47 | + collection: 'my_collection' |
| 48 | +``` |
| 49 | +
|
| 50 | +### View Current Configuration |
| 51 | +
|
| 52 | +```bash |
| 53 | +python data/config/config_loader.py |
| 54 | +``` |
| 55 | + |
| 56 | +Shows all loaded values and their sources (env vs yaml vs defaults). |
| 57 | + |
| 58 | +## Configuration Files |
| 59 | + |
| 60 | +| File | Purpose | |
| 61 | +| -------------------- | ------------------------------- | |
| 62 | +| **defaults.yaml** | Global pipeline defaults | |
| 63 | +| **filters.yaml** | Experiment filter presets | |
| 64 | +| **analyses.yaml** | Analysis plugin documentation | |
| 65 | +| **config_loader.py** | Configuration loading system | |
| 66 | +| **env.example** | Example env file (copy to .env) | |
| 67 | + |
| 68 | +## Environment Variables |
| 69 | + |
| 70 | +### MongoDB |
| 71 | + |
| 72 | +```bash |
| 73 | +ALAB_MONGO_URI=mongodb://localhost:27017/ # MongoDB connection URI |
| 74 | +ALAB_MONGO_DB=temporary # Database name |
| 75 | +ALAB_MONGO_COLLECTION=release # Collection name |
| 76 | +``` |
| 77 | + |
| 78 | +### S3 Upload |
| 79 | + |
| 80 | +```bash |
| 81 | +ALAB_S3_BUCKET=materialsproject-contribs # S3 bucket name |
| 82 | +ALAB_S3_PREFIX=alab_synthesis # S3 prefix path |
| 83 | +ALAB_S3_EXCLUDE_LARGE=true # Exclude large files |
| 84 | +ALAB_S3_LARGE_THRESHOLD_MB=50 # Large file threshold (MB) |
| 85 | +``` |
| 86 | + |
| 87 | +### Parquet Options |
| 88 | + |
| 89 | +```bash |
| 90 | +ALAB_SKIP_TEMP_LOGS=false # Skip temperature logs |
| 91 | +ALAB_SKIP_XRD_POINTS=false # Skip XRD data points |
| 92 | +ALAB_SKIP_WORKFLOW_TASKS=false # Skip workflow tasks |
| 93 | +ALAB_PARQUET_COMPRESSION=snappy # Compression: snappy, gzip, brotli |
| 94 | +ALAB_PARQUET_ENGINE=pyarrow # Engine: pyarrow, fastparquet |
| 95 | +``` |
| 96 | + |
| 97 | +### Materials Project API |
| 98 | + |
| 99 | +```bash |
| 100 | +ALAB_MP_API_KEY=your_api_key # MP API key (for XRD analysis) |
| 101 | +# OR |
| 102 | +MP_API_KEY=your_api_key # Alternative name |
| 103 | +``` |
| 104 | + |
| 105 | +## Usage in Scripts |
| 106 | + |
| 107 | +### Python |
| 108 | + |
| 109 | +```python |
| 110 | +from config_loader import get_config |
| 111 | + |
| 112 | +# Get configuration |
| 113 | +config = get_config() |
| 114 | + |
| 115 | +# Access values |
| 116 | +print(config.mongo_uri) # mongodb://localhost:27017/ |
| 117 | +print(config.mongo_db) # temporary |
| 118 | +print(config.s3_bucket) # materialsproject-contribs |
| 119 | + |
| 120 | +# Or use convenience functions |
| 121 | +from config_loader import get_mongo_uri, get_s3_bucket |
| 122 | + |
| 123 | +uri = get_mongo_uri() # Gets from env > yaml > default |
| 124 | +bucket = get_s3_bucket() |
| 125 | +``` |
| 126 | + |
| 127 | +### Shell Scripts |
| 128 | + |
| 129 | +```bash |
| 130 | +# Use environment variables directly |
| 131 | +: ${ALAB_MONGO_URI:="mongodb://localhost:27017/"} |
| 132 | + |
| 133 | +# Or source from .env file |
| 134 | +if [ -f data/.env ]; then |
| 135 | + export $(grep -v '^#' data/.env | xargs) |
| 136 | +fi |
| 137 | +``` |
| 138 | + |
| 139 | +## Configuration Priority Examples |
| 140 | + |
| 141 | +### Example 1: All from YAML |
| 142 | + |
| 143 | +```bash |
| 144 | +# No env vars set |
| 145 | +$ python data/config/config_loader.py |
| 146 | +MongoDB URI: mongodb://localhost:27017/ (from YAML) |
| 147 | +``` |
| 148 | + |
| 149 | +### Example 2: Override with Env |
| 150 | + |
| 151 | +```bash |
| 152 | +# Set env var |
| 153 | +$ export ALAB_MONGO_URI="mongodb://production:27017/" |
| 154 | +$ python data/config/config_loader.py |
| 155 | +MongoDB URI: mongodb://production:27017/ (from ENV) ✓ |
| 156 | +``` |
| 157 | + |
| 158 | +### Example 3: Mixed Sources |
| 159 | + |
| 160 | +```bash |
| 161 | +# Some from env, some from yaml |
| 162 | +$ export ALAB_MONGO_URI="mongodb://prod:27017/" # Custom URI |
| 163 | +# Leave ALAB_MONGO_DB unset # Use YAML default |
| 164 | +$ python data/config/config_loader.py |
| 165 | +MongoDB URI: mongodb://prod:27017/ (from ENV) ✓ |
| 166 | +MongoDB DB: temporary (from YAML) |
| 167 | +``` |
| 168 | + |
| 169 | +## Best Practices |
| 170 | + |
| 171 | +1. **Development**: Use `defaults.yaml` for local development |
| 172 | +2. **Production**: Use environment variables for sensitive values |
| 173 | +3. **Testing**: Use env vars to point to test databases |
| 174 | +4. **CI/CD**: Set env vars in your deployment pipeline |
| 175 | +5. **Never commit** `.env` files (already in `.gitignore`) |
| 176 | + |
| 177 | +## Troubleshooting |
| 178 | + |
| 179 | +### Config not loading? |
| 180 | + |
| 181 | +```bash |
| 182 | +# Check current config |
| 183 | +python data/config/config_loader.py |
| 184 | + |
| 185 | +# Verify env vars are set |
| 186 | +env | grep ALAB_ |
| 187 | +``` |
| 188 | + |
| 189 | +### Want to use .env file? |
| 190 | + |
| 191 | +```bash |
| 192 | +# Create from example |
| 193 | +cp data/config/env.example .env |
| 194 | + |
| 195 | +# Edit .env with your values (uncomment lines to override defaults) |
| 196 | +nano .env |
| 197 | + |
| 198 | +# Source it before running scripts |
| 199 | +source .env |
| 200 | +./update_data.sh |
| 201 | +``` |
| 202 | + |
| 203 | +### Reset to defaults |
| 204 | + |
| 205 | +```bash |
| 206 | +# Unset all ALAB env vars |
| 207 | +unset $(env | grep ALAB_ | cut -d= -f1) |
| 208 | + |
| 209 | +# Now uses YAML/defaults only |
| 210 | +./update_data.sh |
| 211 | +``` |
0 commit comments