Skip to content

Commit bb7dde7

Browse files
committed
Merge branch 'dev' of https://github.com/NASA-IMPACT/COSMOS into 3000-add-curated-urls-column-to-homepage
2 parents 21a921a + 7348a76 commit bb7dde7

File tree

24 files changed

+513
-178
lines changed

24 files changed

+513
-178
lines changed

RELEASE_NOTES.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# COSMOS Release Notes
2+
## v3.0.0 from v2.0.1
3+
4+
COSMOS v3.0.0 introduces several major architectural changes that fundamentally enhance the system's capabilities. The primary feature is a new website reindexing system that allows COSMOS to stay up-to-date with source website changes, addressing a key limitation of previous versions where websites could only be scraped once. This release includes comprehensive updates to the data models, frontend interface, rule creation system, and backend processing along with some bugfixes from v2.0.1.
5+
6+
The Environmental Justice (EJ) system has been significantly expanded, growing less than 100 manually curated datasets to approximately 1,000 datasets through the integration of machine learning classification of NASA CMR records. This expansion is supported by a new modular processing suite that generates and extracts metadata using Subject Matter Expert (SME) criteria.
7+
8+
To support future machine learning integration, COSMOS now implements a sophisticated two-column system that allows fields to maintain both ML-generated classifications and manual curator overrides. This system has been seamlessly integrated into the data models, serializers, and APIs, ensuring that both automated and human-curated data can coexist while maintaining clear precedence rules.
9+
10+
To ensure reliability and maintainability of these major changes, this release includes extensive testing coverage with 213 new tests spanning URL processing, pattern management, Environmental Justice functionality, workflow triggers, and data migrations. Additionally, we've added comprehensive documentation across 15 new README files that cover everything from fundamental pattern system concepts to detailed API specifications and ML integration guidelines.
11+
12+
13+
### Major Features
14+
15+
#### Reindexing System
16+
- **New Data Models**: Introduced DumpUrl, DeltaUrl, and CuratedUrl to support the reindexing workflow
17+
- **Automated Workflows**:
18+
- New process to calculate deltas, deletions, and additions during migration
19+
- Automatic promotion of DeltaUrls to CuratedUrls
20+
- Status-based triggers for data ingestion and processing
21+
- **Duplicate Prevention**: System now prevents duplicate patterns and URLs
22+
- **Enhanced Frontend**:
23+
- Added reindexing status column to collection and URL list pages
24+
- New deletion tracking column on URL list page
25+
- Updated collection list to display delta URL counts
26+
- Improved URL list page accessibility via delta URL count
27+
28+
#### Pattern System Improvements
29+
- Complete modularization of the pattern system
30+
- Enhanced handling of edge cases including overlapping patterns
31+
- Improved unapply logic
32+
- Functional inclusion rules
33+
- Pattern precedence system: most specific pattern takes priority, with pattern length as tiebreaker
34+
35+
#### Environmental Justice (EJ) Enhancement
36+
- Expanded from 92 manual datasets to 1063 ML-classified NASA CMR records
37+
- New modular processing suite for metadata generation
38+
- Enhanced API with multiple data sources:
39+
- Spreadsheet (original manual classifications)
40+
- ML Production
41+
- ML Testing
42+
- Combined (ML production with spreadsheet overrides)
43+
- Custom processing suite for CMR metadata extraction
44+
45+
#### Infrastructure Updates
46+
- Streamlined database backup and restore
47+
- Optimized Docker builds
48+
- Fixed LetsEncrypt staging issues
49+
- Modified Traefik timeouts for long-running jobs
50+
- Updated Sinequa worker configuration:
51+
- Reduced worker count to 3 for neural workload optimization
52+
- Added neural indexing to all webcrawlers
53+
- Removed deprecated version mappings
54+
55+
#### API Enhancements
56+
- New endpoints for curated and delta URLs:
57+
- GET /curated-urls-api/<str:config_folder>/
58+
- GET /delta-urls-api/<str:config_folder>/
59+
- Backwards compatibility through remapped CandidateUrl endpoint
60+
- Updated Environmental Justice API with new data source parameter
61+
62+
### Technical Improvements
63+
64+
#### Two-Column System
65+
- New architecture to support dual ML/manual classifications
66+
- Seamless integration with models, serializers, and APIs
67+
- Prioritization system for manual overrides
68+
69+
#### Testing
70+
Added 213 new tests across multiple areas:
71+
- URL APIs and processing (19 tests)
72+
- Delta and pattern management (31 tests)
73+
- Environmental Justice API (7 tests)
74+
- Environmental Justice Mappings and Thresholding (58)
75+
- Workflow and status triggers (10 tests)
76+
- Migration and promotion processes (31 tests)
77+
- Field modifications and TDAMM tags (25 tests)
78+
- Additional system functionality (30 tests)
79+
80+
81+
#### Documentation
82+
Added comprehensive documentation across 15 READMEs covering:
83+
- Pattern system fundamentals and examples
84+
- Reindexing statuses and triggers
85+
- Model lifecycles and testing procedures
86+
- URL inclusion/exclusion logic
87+
- Environmental Justice classifier and API
88+
- ML column functionality
89+
- SQL dump restoration
90+
91+
### Bug Fixes
92+
- Fixed non-functional includes
93+
- Resolved pagination issues for patterns (previously limited to 50)
94+
- Eliminated ability to create duplicate URLs and patterns
95+
- Corrected faulty unapply logic for modification patterns
96+
- Fixed unrepeatable logic for overlapping patterns
97+
- Allowed long running jobs to complete without timeouts
98+
99+
### UI Updates
100+
- Renamed application from "SDE Indexing Helper" to "COSMOS"
101+
- Refactored collection list code for easier column management
102+
- Enhanced URL list page with new status and deletion tracking
103+
- Improved navigation through delta URL count integration
104+
105+
### Administrative Changes
106+
- Added new admin panels for enhanced system management
107+
- Updated installation requirements
108+
- Enhanced database backup and restore functionality

SQLDumpRestoration.md

Lines changed: 108 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,4 +82,111 @@ docker-compose -f local.yml up
8282
docker-compose -f local.yml run --rm django python manage.py createsuperuser
8383
```
8484

85-
8. Log in to the SDE Indexing Helper frontend to ensure that all data has been correctly populated in the UI.
85+
8. Log in to the COSMOS frontend to ensure that all data has been correctly populated in the UI.
86+
87+
88+
89+
# making the backup
90+
91+
```bash
92+
ssh sde
93+
cat .envs/.production/.postgres
94+
```
95+
96+
find the values for the variables:
97+
POSTGRES_HOST=sde-indexing-helper-db.c3cr2yyh5zt0.us-east-1.rds.amazonaws.com
98+
POSTGRES_PORT=5432
99+
POSTGRES_DB=postgres
100+
POSTGRES_USER=postgres
101+
POSTGRES_PASSWORD=this_is_A_web_application_built_in_2023
102+
103+
```bash
104+
docker ps
105+
```
106+
107+
b3fefa2c19fb
108+
109+
note here that you need to put the
110+
```bash
111+
docker exec -t your_postgres_container_id pg_dump -U your_postgres_user -d your_database_name > backup.sql
112+
```
113+
```bash
114+
docker exec -t container_id pg_dump -h host -U user -d database -W > prod_backup.sql
115+
```
116+
117+
docker exec -t b3fefa2c19fb env PGPASSWORD="this_is_A_web_application_built_in_2023" pg_dump -h sde-indexing-helper-db.c3cr2yyh5zt0.us-east-1.rds.amazonaws.com -U postgres -d postgres > prod_backup.sql
118+
119+
# move the backup to local
120+
go back to local computer and scp the file
121+
122+
```bash
123+
scp sde:/home/ec2-user/sde_indexing_helper/prod_backup.sql .
124+
```
125+
scp prod_backup.sql sde_staging:/home/ec2-user/sde-indexing-helper
126+
if you have trouble transferring the file, you can use rsync:
127+
rsync -avzP prod_backup.sql sde_staging:/home/ec2-user/sde-indexing-helper/
128+
129+
# restoring the backup
130+
bring down the local containers
131+
```bash
132+
docker-compose -f local.yml down
133+
docker-compose -f local.yml up postgres
134+
docker ps
135+
```
136+
137+
find the container id
138+
139+
c11d7bae2e56
140+
141+
find the local variables from
142+
cat .envs/.production/.postgres
143+
POSTGRES_HOST=sde-indexing-helper-staging-db.c3cr2yyh5zt0.us-east-1.rds.amazonaws.com
144+
POSTGRES_PORT=5432
145+
POSTGRES_DB=sde_staging
146+
POSTGRES_USER=postgres
147+
POSTGRES_PASSWORD=postgres
148+
149+
150+
```bash
151+
docker exec -it <container id> bash
152+
```
153+
docker exec -it c11d7bae2e56 bash
154+
155+
## do all the database shit you need to
156+
157+
158+
psql -U <POSTGRES_USER> -d <POSTGRES_DB>
159+
psql -U postgres -d sde_staging
160+
or, if you are on one of the servers:
161+
psql -h sde-indexing-helper-staging-db.c3cr2yyh5zt0.us-east-1.rds.amazonaws.com -U postgres -d postgres
162+
163+
\c postgres
164+
DROP DATABASE sde_staging;
165+
CREATE DATABASE sde_staging;
166+
167+
# do the backup
168+
169+
```bash
170+
docker cp prod_backup.sql c11d7bae2e56:/
171+
docker exec -it c11d7bae2e56 bash
172+
```
173+
174+
```bash
175+
psql -U <POSTGRES_USER> -d <POSTGRES_DB> -f backup.sql
176+
```
177+
psql -U VnUvMKBSdkoFIETgLongnxYHrYVJKufn -d sde_indexing_helper -f prod_backup.sql
178+
179+
psql -h sde-indexing-helper-staging-db.c3cr2yyh5zt0.us-east-1.rds.amazonaws.com -U postgres -d postgres -f prod_backup.sql
180+
pg_restore -h sde-indexing-helper-staging-db.c3cr2yyh5zt0.us-east-1.rds.amazonaws.com -U postgres -d postgres prod_backup.sql
181+
182+
183+
184+
docker down
185+
186+
docker up build
187+
188+
migrate
189+
190+
down
191+
192+
up

compose/production/django/start

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,8 @@ set -o nounset
77

88
python /app/manage.py collectstatic --noinput
99

10-
exec /usr/local/bin/gunicorn config.wsgi --bind 0.0.0.0:5000 --chdir=/app
10+
exec /usr/local/bin/gunicorn config.wsgi \
11+
--bind 0.0.0.0:5000 \
12+
--chdir=/app \
13+
--timeout 600 \
14+
--graceful-timeout 600 \

compose/production/traefik/traefik.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,28 @@ entryPoints:
1010
redirections:
1111
entryPoint:
1212
to: web-secure
13+
transport:
14+
respondingTimeouts:
15+
readTimeout: "600s"
16+
writeTimeout: "600s"
17+
idleTimeout: "600s"
1318

1419
web-secure:
1520
# https
1621
address: ":443"
22+
transport:
23+
respondingTimeouts:
24+
readTimeout: "600s"
25+
writeTimeout: "600s"
26+
idleTimeout: "600s"
1727

1828
flower:
1929
address: ":5555"
30+
transport:
31+
respondingTimeouts:
32+
readTimeout: "600s"
33+
writeTimeout: "600s"
34+
idleTimeout: "600s"
2035

2136
certificatesResolvers:
2237
letsencrypt:

config/urls.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
from django.urls import include, path
55
from django.views import defaults as default_views
66

7-
admin.site.site_header = "SDE Indexing Helper Administration" # default: "Django Administration"
8-
admin.site.index_title = "SDE Indexing Helper" # default: "Site administration"
9-
admin.site.site_title = "SDE Indexing Helper" # default: "Django site admin"
7+
admin.site.site_header = "COSMOS Administration" # default: "Django Administration"
8+
admin.site.index_title = "COSMOS" # default: "Site administration"
9+
admin.site.site_title = "COSMOS" # default: "Django site admin"
1010

1111
urlpatterns = [
1212
path("", include("sde_collections.urls", namespace="sde_collections")),

config/wsgi.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
WSGI config for SDE Indexing Helper project.
2+
WSGI config for COSMOS.
33
44
This module contains the WSGI application used by Django's development server
55
and any production WSGI deployments. It should expose a module-level variable

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
# -- Project information -----------------------------------------------------
3030

31-
project = "SDE Indexing Helper"
31+
project = "COSMOS"
3232
copyright = """2023, NASA IMPACT"""
3333
author = "NASA IMPACT"
3434

File renamed without changes.

0 commit comments

Comments
 (0)