Observatory Performance Configuration

This document describes the environment variables available for configuring the Observatory data generation performance optimizations.

Phase 1 & Phase 2 Optimizations

Core Configuration

Variable	Default	Description
`OBSERVATORY_CHUNK_SIZE`	`20`	Number of directories to process in each chunk. Smaller values use less memory but may take longer.
`ENABLE_INCREMENTAL_OBSERVATORY`	`false`	Enable incremental processing - only rebuild directories with changes since last run.
`ENABLE_OBSERVATORY_GC`	`false`	Force garbage collection between chunks to manage memory usage.
`OBSERVATORY_MAX_MEMORY_MB`	`1024`	Maximum memory usage target in MB (for future monitoring features).

Usage Examples

Production Environment (Large Dataset)

# Use smaller chunks and enable GC for memory-constrained environments
export OBSERVATORY_CHUNK_SIZE=10
export ENABLE_OBSERVATORY_GC=true
export OBSERVATORY_MAX_MEMORY_MB=512

Development Environment (Faster Processing)

# Use larger chunks for faster processing when memory is not a concern
export OBSERVATORY_CHUNK_SIZE=50
export ENABLE_INCREMENTAL_OBSERVATORY=true

Pre-production Environment (Incremental)

# Enable incremental processing to speed up daily runs
export ENABLE_INCREMENTAL_OBSERVATORY=true
export OBSERVATORY_CHUNK_SIZE=20

Performance Tuning Guidelines

Chunk Size Selection

Small chunks (5-10): Best for memory-constrained environments (< 2GB RAM)
Medium chunks (15-25): Good balance for most production environments
Large chunks (30-50): Best for development or high-memory environments (> 8GB RAM)

Incremental Processing

Enable when: You have daily/regular automated runs
Disable when: Manual runs or when you need guaranteed full data refresh
Note: First run after enabling incremental will still be a full run

Garbage Collection

Enable when: Running in memory-constrained environments
Disable when: Performance is more important than memory usage
Note: Adds slight processing overhead but prevents memory issues

Database Indexes

Before enabling these optimizations, ensure the performance indexes are applied:

-- Apply indexes from performance-indexes.sql
source performance-indexes.sql;

Monitoring

The optimized Observatory service provides console logging for monitoring:

Starting chunked data generation with chunk size: 20
Processing 120 directories in chunks of 20
Processing chunk 1/6
Processing chunk 2/6  
...
Forced garbage collection after chunk 3
Chunked data generation completed

Troubleshooting

Memory Issues

If you still encounter memory issues:

Reduce OBSERVATORY_CHUNK_SIZE to 5-10
Enable ENABLE_OBSERVATORY_GC=true
Consider running during off-peak hours

Performance Issues

If processing is too slow:

Increase OBSERVATORY_CHUNK_SIZE to 30-50
Disable ENABLE_OBSERVATORY_GC
Ensure database indexes are applied

Data Inconsistency

If incremental processing misses changes:

Run one manual generation: generateData(true)
Check the change detection queries in getChangedDirectoryIds()
Temporarily disable incremental processing

Migration Path

From Legacy to Optimized

Phase 1: Apply database indexes
Phase 2: Enable chunked processing with default settings
Phase 3: Fine-tune chunk sizes based on your environment
Phase 4: Enable incremental processing for regular automated runs

The legacy getDataLegacy() method is maintained for rollback capability if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observatory Performance Configuration

Phase 1 & Phase 2 Optimizations

Core Configuration

Usage Examples

Production Environment (Large Dataset)

Development Environment (Faster Processing)

Pre-production Environment (Incremental)

Performance Tuning Guidelines

Chunk Size Selection

Incremental Processing

Garbage Collection

Database Indexes

Monitoring

Troubleshooting

Memory Issues

Performance Issues

Data Inconsistency

Migration Path

From Legacy to Optimized

FilesExpand file tree

OBSERVATORY_CONFIG.md

Latest commit

History

OBSERVATORY_CONFIG.md

File metadata and controls

Observatory Performance Configuration

Phase 1 & Phase 2 Optimizations

Core Configuration

Usage Examples

Production Environment (Large Dataset)

Development Environment (Faster Processing)

Pre-production Environment (Incremental)

Performance Tuning Guidelines

Chunk Size Selection

Incremental Processing

Garbage Collection

Database Indexes

Monitoring

Troubleshooting

Memory Issues

Performance Issues

Data Inconsistency

Migration Path

From Legacy to Optimized