A multi-source weather data engineering platform built on Lambda Architecture principles, designed to collect, process, store, and analyze aviation weather data from multiple sources including NOAA, AWS, and potentially other providers.
This project consists of two major components:
Source-agnostic weather data platform with Lambda Architecture implementation:
- weather-common: Shared models and interfaces (source-agnostic)
- weather-ingestion: Universal data collection and S3 upload (Speed Layer)
- weather-processing: Stream and batch processing (Batch Layer)
- weather-storage: Multi-backend storage (Snowflake, DynamoDB, S3) with Phase 4 GSI implementation
- weather-analytics: Universal analytics and reporting (Serving Layer)
- weather-infrastructure: AWS CDK infrastructure as code
Original NOAA-specific METAR/TAF decoder (maintained for reference and gradual migration)
The platform implements Lambda Architecture to handle both real-time and batch processing:
- Speed Layer: Real-time ingestion of weather data from multiple sources → S3 → DynamoDB with time-bucket GSI
- Batch Layer: Historical data processing and reprocessing
- Serving Layer: Unified query interface combining real-time and batch views
- Java 17+: Modern Java features and performance
- Maven: Multi-module build management
- AWS Services: S3, Lambda, DynamoDB, CloudWatch
- Snowflake: Data warehouse for analytics
- JUnit 5: Comprehensive testing framework
- JaCoCo: Code coverage analysis
- SonarQube: Code quality and security scanning
- Log4j2/Logback: Enterprise logging with centralized configuration
- GitHub Actions: CI/CD pipeline
- LocalStack: Local DynamoDB testing with Testcontainers
METAR (Meteorological Aerodrome Report) is a current weather report format used in aviation. Typical METAR reports contain information such as location, report issue time, wind, visibility, clouds, weather phenomena, temperature, dewpoint, and atmospheric pressure.
Example METAR:
2021/12/28 01:52 KCLT 280152Z 22006KT 10SM BKN240 17/13 A2989 RMK AO2 SLP116 T01720133
TAF (Terminal Aerodrome Forecast) is a weather forecast report format used in aviation. TAF reports provide trends and changes in visibility, wind, clouds, and weather over periods of time.
Example TAF:
2021/12/28 02:52 TAF AMD KCLT 280150Z 2802/2906 21006KT P6SM SCT040 BKN150 FM281100 22005KT P6SM SCT008
- Java 17 or higher
- Maven 3.8+
- AWS CLI configured (for deployment)
- Snowflake account (for data warehouse features)
# Clone the repository
git clone https://github.com/bclasky1539/noakweather-engineering-pipeline.git
cd noakweather-engineering-pipeline
# Build and test legacy module
cd noakweather-legacy
./wethb.sh # Build
./wetht.sh # Test with coverage
mvn clean install
# Build and test platform modules
cd ../noakweather-platform
./wethb.sh # Build
./wetht.sh # Test with coverage
mvn clean install
# Build entire project from root
cd ..
mvn clean install
# Run tests with coverage
mvn test jacoco:report
# View coverage report
open target/site/jacoco/index.html
# Run SonarQube analysis
mvn clean verify sonar:sonar \
-Dsonar.organization=bclasky1539 \
-Dsonar.host.url=https://sonarcloud.io \
-Dsonar.login=$SONAR_TOKEN
-
AWS IAM User Setup for DynamoDB - Complete guide for creating AWS IAM users with DynamoDB permissions
- IAM user creation and permission setup
- Access key generation and secure storage
- AWS credentials file configuration
- Security best practices and troubleshooting
-
S3 Bucket Setup - Comprehensive guide for configuring S3 buckets for dual-storage weather data
- AWS CLI and Console bucket creation
- Lifecycle policies for cost optimization (30-day retention, Glacier archival)
- Bucket structure and date partitioning examples
- Security best practices (encryption, public access blocking)
- Environment variable configuration and troubleshooting
-
Single Station Integration Test - Step-by-step guide for testing dual-storage NOAA data ingestion
- Pre-flight checklist (AWS credentials, S3 access, Maven build)
- Test execution for KCLT (Charlotte Douglas International)
- Validation commands for raw text and JSON files
- Success criteria and verification steps
- Troubleshooting common issues
-
Logging Configuration Setup - Centralized logging configuration for multi-module projects
- Log4j2 master configuration
- Maven resources plugin setup
- Environment variable configuration
- Log rotation and retention policies
- Phase 4 GSI Deployment Guide - Zero-downtime DynamoDB GSI deployment
- Pre-deployment checklist
- Step-by-step deployment instructions
- Rollback procedures
- Performance benchmarks (50x improvement)
-
Code Standards - Comprehensive coding standards and best practices
- Package organization and architecture principles
- Naming conventions and code structure
- Error handling patterns and testing standards
- Git workflow and quality metrics
- Continuous integration requirements
-
Weather Format References - METAR/TAF format specifications
- Official ICAO and FAA standards
- Complete format structure diagrams
- Weather element reference guide
- Live data feeds and validation tools
- Parsing considerations and implementation notes
-
Architecture Decisions - Lambda Architecture design patterns
- Speed Layer: Real-time data ingestion
- Batch Layer: Historical data processing
- Serving Layer: Query interface design
-
DynamoDB Repository API (weather-storage module)
- CRUD operations for weather data
- Time-bucket GSI query methods
- Batch operations and statistics
- Integration with AWS SDK v2
-
Parser API (weather-processing module)
- Universal parser interface
- NOAA METAR/TAF parsers
- Parse result handling
- Error handling patterns
Performance Improvements:
- 50x faster time-range queries using
time-bucket-indexGSI - Hourly time buckets for optimal query performance
- Backward-compatible table scan fallback
- Zero-downtime deployment support
Technical Details:
GSI Schema:
- Index Name: time-bucket-index
- Partition Key: time_bucket (String, "YYYY-MM-DD-HH")
- Sort Key: observation_time (Number, epoch seconds)
- Projection: ALL
- Billing: On-demand
Query Performance:
- Table Scan: O(n) - ~200ms for 10,000 items
- GSI Query: O(m) - ~4ms for same result (50x faster!)
Deployment Strategy:
- Deploy code with GSI support + fallback → Works immediately using table scan
- Add GSI to production table → ~5 minutes to create
- Queries automatically switch to GSI → 50x performance improvement
- Zero downtime throughout entire process
See Phase 4 Deployment Guide for details.
This project follows a phased migration approach:
- Phase 1 (Complete): Multi-module structure with platform foundation
- Phase 2 (Complete): NOAA models and parsers
- Phase 3 (Complete): Universal ingestion layer with S3 upload
- Phase 4 (Complete): DynamoDB storage with time-bucket GSI and comprehensive testing
- Phase 5 (Next): Analytics and serving layer
- Phase 6 (Planned): Additional data sources
- Phase 7 (Planned): Legacy deprecation
The legacy decoder retrieves METAR and TAF data from NOAA or local files.
Parameters:
- Type:
m(METAR) ort(TAF) - Source: 4-letter ICAO code (e.g.,
KCLT) orfile:filename.txt - Print:
YorN - Logging:
I(Info),W(Warnings),D(Debug)
Example:
cd noakweather-legacy
./weth.sh m KCLT Y I
cd noakweather-platform/weather-storage
# Add time-bucket-index GSI to production table
mvn exec:java -Dexec.mainClass="weather.storage.tools.AddGSIsToAwsTable"
# Expected output:
# ✓ Table status verified: ACTIVE
# ✓ No existing GSI found (safe to add)
# ✓ Creating time-bucket-index GSI...
# ✓ Waiting for GSI to become ACTIVE...
# ✓ GSI deployment successful!
# ✓ Query performance improved 50x
See AWS IAM User Setup Guide for AWS credentials setup.
Current Status (v1.13.0-SNAPSHOT):
- Total Tests: 221 (weather-storage) + additional tests in other modules
- Code Coverage: 90%+ overall (DynamoDB repository ~90%, parsers 85%+)
- Build Time: ~21 seconds for weather-storage module
- Lines of Code: ~15,000+ lines across platform modules
- Zero Failures: All tests passing
Test Infrastructure:
- LocalStack for DynamoDB testing
- Testcontainers for container management
- JUnit 5 with AssertJ assertions
- Comprehensive integration and unit tests
- Create a feature branch from
main - Make your changes following the code standards
- Ensure all tests pass and coverage meets requirements
- Submit a pull request
Apache License 2.0 - See LICENSE for details
Active Development - Phase 4 Complete, Phase 5 In Progress
Phase 4 Complete (January 2026)
- DynamoDB time-bucket GSI implementation
- 50x performance improvement on time-range queries
- Zero-downtime deployment support
- Comprehensive integration test suite (221 tests)
- Production-ready with complete documentation
Phase 5 - Analytics & Serving Layer
- Query interface combining real-time + batch views
- Analytics dashboard
- API endpoints for weather data access
- Real-time + batch view reconciliation
Maintainer: Brian Clasky (quark95cos@noayok.com)
Resources: