Skip to content

Commit e51ce45

Browse files
committed
add release notes file
1 parent 0ad2dc7 commit e51ce45

File tree

1 file changed

+108
-0
lines changed

1 file changed

+108
-0
lines changed

RELEASE_NOTES.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# COSMOS Release Notes
2+
## v3.0.0 from v2.0.1
3+
4+
COSMOS v3.0.0 introduces several major architectural changes that fundamentally enhance the system's capabilities. The primary feature is a new website reindexing system that allows COSMOS to stay up-to-date with source website changes, addressing a key limitation of previous versions where websites could only be scraped once. This release includes comprehensive updates to the data models, frontend interface, rule creation system, and backend processing along with some bugfixes from v2.0.1.
5+
6+
The Environmental Justice (EJ) system has been significantly expanded, growing less than 100 manually curated datasets to approximately 1,000 datasets through the integration of machine learning classification of NASA CMR records. This expansion is supported by a new modular processing suite that generates and extracts metadata using Subject Matter Expert (SME) criteria.
7+
8+
To support future machine learning integration, COSMOS now implements a sophisticated two-column system that allows fields to maintain both ML-generated classifications and manual curator overrides. This system has been seamlessly integrated into the data models, serializers, and APIs, ensuring that both automated and human-curated data can coexist while maintaining clear precedence rules.
9+
10+
To ensure reliability and maintainability of these major changes, this release includes extensive testing coverage with 213 new tests spanning URL processing, pattern management, Environmental Justice functionality, workflow triggers, and data migrations. Additionally, we've added comprehensive documentation across 15 new README files that cover everything from fundamental pattern system concepts to detailed API specifications and ML integration guidelines.
11+
12+
13+
### Major Features
14+
15+
#### Reindexing System
16+
- **New Data Models**: Introduced DumpUrl, DeltaUrl, and CuratedUrl to support the reindexing workflow
17+
- **Automated Workflows**:
18+
- New process to calculate deltas, deletions, and additions during migration
19+
- Automatic promotion of DeltaUrls to CuratedUrls
20+
- Status-based triggers for data ingestion and processing
21+
- **Duplicate Prevention**: System now prevents duplicate patterns and URLs
22+
- **Enhanced Frontend**:
23+
- Added reindexing status column to collection and URL list pages
24+
- New deletion tracking column on URL list page
25+
- Updated collection list to display delta URL counts
26+
- Improved URL list page accessibility via delta URL count
27+
28+
#### Pattern System Improvements
29+
- Complete modularization of the pattern system
30+
- Enhanced handling of edge cases including overlapping patterns
31+
- Improved unapply logic
32+
- Functional inclusion rules
33+
- Pattern precedence system: most specific pattern takes priority, with pattern length as tiebreaker
34+
35+
#### Environmental Justice (EJ) Enhancement
36+
- Expanded from 89 manual datasets to 1063 ML-classified NASA CMR records
37+
- New modular processing suite for metadata generation
38+
- Enhanced API with multiple data sources:
39+
- Spreadsheet (original manual classifications)
40+
- ML Production
41+
- ML Testing
42+
- Combined (ML production with spreadsheet overrides)
43+
- Custom processing suite for CMR metadata extraction
44+
45+
#### Infrastructure Updates
46+
- Streamlined database backup and restore
47+
- Optimized Docker builds
48+
- Fixed LetsEncrypt staging issues
49+
- Modified Traefik timeouts for long-running jobs
50+
- Updated Sinequa worker configuration:
51+
- Reduced worker count to 3 for neural workload optimization
52+
- Added neural indexing to all webcrawlers
53+
- Removed deprecated version mappings
54+
55+
#### API Enhancements
56+
- New endpoints for curated and delta URLs:
57+
- GET /curated-urls-api/<str:config_folder>/
58+
- GET /delta-urls-api/<str:config_folder>/
59+
- Backwards compatibility through remapped CandidateUrl endpoint
60+
- Updated Environmental Justice API with new data source parameter
61+
62+
### Technical Improvements
63+
64+
#### Two-Column System
65+
- New architecture to support dual ML/manual classifications
66+
- Seamless integration with models, serializers, and APIs
67+
- Prioritization system for manual overrides
68+
69+
#### Testing
70+
Added 213 new tests across multiple areas:
71+
- URL APIs and processing (19 tests)
72+
- Delta and pattern management (31 tests)
73+
- Environmental Justice API (7 tests)
74+
- Environmental Justice Mappings and Thresholding (58)
75+
- Workflow and status triggers (10 tests)
76+
- Migration and promotion processes (31 tests)
77+
- Field modifications and TDAMM tags (25 tests)
78+
- Additional system functionality (30 tests)
79+
80+
81+
#### Documentation
82+
Added comprehensive documentation across 15 READMEs covering:
83+
- Pattern system fundamentals and examples
84+
- Reindexing statuses and triggers
85+
- Model lifecycles and testing procedures
86+
- URL inclusion/exclusion logic
87+
- Environmental Justice classifier and API
88+
- ML column functionality
89+
- SQL dump restoration
90+
91+
### Bug Fixes
92+
- Fixed non-functional includes
93+
- Resolved pagination issues for patterns (previously limited to 50)
94+
- Eliminated ability to create duplicate URLs and patterns
95+
- Corrected faulty unapply logic for modification patterns
96+
- Fixed unrepeatable logic for overlapping patterns
97+
- Allowed long running jobs to complete without timeouts
98+
99+
### UI Updates
100+
- Renamed application from "SDE Indexing Helper" to "COSMOS"
101+
- Refactored collection list code for easier column management
102+
- Enhanced URL list page with new status and deletion tracking
103+
- Improved navigation through delta URL count integration
104+
105+
### Administrative Changes
106+
- Added new admin panels for enhanced system management
107+
- Updated installation requirements
108+
- Enhanced database backup and restore functionality

0 commit comments

Comments
 (0)