Skip to content

Commit cbadf23

Browse files
committed
Merge branch 'dev' of https://github.com/NASA-IMPACT/COSMOS into 1052-update-cosmos-to-create-jobs-for-scrapers-and-indexers
2 parents 92c118d + e54f94b commit cbadf23

File tree

75 files changed

+10266
-709
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+10266
-709
lines changed

.envs/.local/.django

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,17 @@ SINEQUA_CONFIGS_REPO_WEBAPP_PR_BRANCH='dummy_branch'
3333
# Slack Webhook
3434
# ------------------------------------------------------------------------------
3535
SLACK_WEBHOOK_URL=''
36-
LRM_USER=''
37-
LRM_PASSWORD=''
36+
37+
#Server Credentials
38+
#--------------------------------------------------------------------------------
39+
LRM_DEV_USER=''
40+
LRM_DEV_PASSWORD=''
3841
XLI_USER=''
3942
XLI_PASSWORD=''
4043
LRM_QA_USER=''
4144
LRM_QA_PASSWORD=''
45+
46+
#Server Tokens
47+
#--------------------------------------------------------------------------------
48+
LRM_DEV_TOKEN=''
49+
XLI_TOKEN=''

README.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ $ docker-compose -f local.yml run --rm django python manage.py loaddata sde_coll
8282
Navigate to the server running prod, then to the project folder. Run the following command to create a backup:
8383

8484
```bash
85-
docker-compose -f production.yml run --rm --user root django python manage.py dumpdata --natural-foreign --natural-primary --exclude=contenttypes --exclude=auth.Permission --indent 2 --output /app/backups/prod_backup-20240812.json
85+
docker-compose -f production.yml run --rm --user root django python manage.py dumpdata --natural-foreign --natural-primary --exclude=contenttypes --exclude=auth.Permission --indent 2 --output /app/backups/prod_backup-20241114.json
8686
```
8787
This will have saved the backup in a folder outside of the docker container. Now you can copy it to your local machine.
8888

@@ -208,3 +208,24 @@ Eventually, job creation will be done seamlessly by the webapp. Until then, edit
208208
- JavaScript: `/sde_indexing_helper/static/js`
209209
- CSS: `/sde_indexing_helper/static/css`
210210
- Images: `/sde_indexing_helper/static/images`
211+
212+
213+
## Running Long Scripts on the Server
214+
```shell
215+
tmux new -s docker_django
216+
```
217+
Once you are inside, you can run dmshell or for example a managment command:
218+
219+
```shell
220+
docker-compose -f production.yml run --rm django python manage.py deduplicate_urls
221+
```
222+
223+
Later, you can do this to get back in.
224+
```shell
225+
tmux attach -t docker_django
226+
```
227+
228+
To delete the session:
229+
```shell
230+
tmux kill-session -t docker_django
231+
```

compose/production/traefik/traefik.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ certificatesResolvers:
3131
http:
3232
routers:
3333
web-secure-router:
34-
rule: "Host(`sde-indexing-helper.nasa-impact.net`)"
34+
rule: 'Host(`{{ env "TRAEFIK_DOMAIN" }}`)'
3535
entryPoints:
3636
- web-secure
3737
middlewares:
@@ -42,7 +42,7 @@ http:
4242
certResolver: letsencrypt
4343

4444
flower-secure-router:
45-
rule: "Host(`sde-indexing-helper.nasa-impact.net`)"
45+
rule: 'Host(`{{ env "TRAEFIK_DOMAIN" }}`)'
4646
entryPoints:
4747
- flower
4848
service: flower

config/settings/base.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,9 @@
343343
SLACK_WEBHOOK_URL = env("SLACK_WEBHOOK_URL")
344344
XLI_USER = env("XLI_USER")
345345
XLI_PASSWORD = env("XLI_PASSWORD")
346-
LRM_USER = env("LRM_USER")
347-
LRM_PASSWORD = env("LRM_PASSWORD")
346+
LRM_DEV_USER = env("LRM_DEV_USER")
347+
LRM_DEV_PASSWORD = env("LRM_DEV_PASSWORD")
348348
LRM_QA_USER = env("LRM_QA_USER")
349349
LRM_QA_PASSWORD = env("LRM_QA_PASSWORD")
350+
LRM_DEV_TOKEN = env("LRM_DEV_TOKEN")
351+
XLI_TOKEN = env("XLI_TOKEN")

environmental_justice/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Environmental Justice API
2+
3+
## Overview
4+
This API provides access to Environmental Justice data from multiple sources. It supports retrieving data from individual sources or as a combined dataset with defined precedence rules.
5+
6+
## Endpoints
7+
8+
### GET /api/environmental-justice/
9+
10+
Retrieves environmental justice data based on specified data source.
11+
12+
#### Query Parameters
13+
14+
| Parameter | Description | Default | Options |
15+
|-------------|-------------|------------|----------------------------------------------|
16+
| data_source | Data source filter | "combined" | "spreadsheet", "ml_production", "ml_testing", "combined" |
17+
18+
#### Data Source Behavior
19+
20+
1. **Single Source**
21+
- `?data_source=spreadsheet`: Returns only spreadsheet data
22+
- `?data_source=ml_production`: Returns only ML production data
23+
- `?data_source=ml_testing`: Returns only ML testing data
24+
25+
2. **Combined Data** (Default)
26+
- Access via `?data_source=combined` or no parameter
27+
- Merges data from 'spreadsheet' and 'ml_production' sources
28+
- Precedence rules:
29+
- If the same dataset exists in both sources, the spreadsheet version is used
30+
- Unique datasets from ml_production are included
31+
- ML testing data is not included in combined view
32+
33+
#### Example Requests
34+
35+
```bash
36+
# Get combined data (default)
37+
GET /api/environmental-justice/
38+
39+
# Get combined data (explicit)
40+
GET /api/environmental-justice/?data_source=combined
41+
42+
# Get only spreadsheet data
43+
GET /api/environmental-justice/?data_source=spreadsheet
44+
45+
# Get only ML production data
46+
GET /api/environmental-justice/?data_source=ml_production
47+
48+
# Get only ML testing data
49+
GET /api/environmental-justice/?data_source=ml_testing
50+
```
51+
52+
#### Response Fields
53+
54+
Each record includes the following fields:
55+
- dataset
56+
- description
57+
- description_simplified
58+
- indicators
59+
- intended_use
60+
- latency
61+
- limitations
62+
- project
63+
- source_link
64+
- strengths
65+
- format
66+
- geographic_coverage
67+
- data_visualization
68+
- spatial_resolution
69+
- temporal_extent
70+
- temporal_resolution
71+
- sde_link
72+
- data_source
73+
74+
## Data Source Definitions
75+
76+
- **spreadsheet**: Primary source data from environmental justice spreadsheets
77+
- **ml_production**: Production machine learning processed data
78+
- **ml_testing**: Testing/staging machine learning processed data
79+
80+
## Precedence Rules
81+
When retrieving combined data:
82+
1. If a dataset exists in both spreadsheet and ml_production:
83+
- The spreadsheet version takes precedence
84+
- The ml_production version is excluded
85+
2. Datasets unique to ml_production are included in the response
86+
3. ML testing data is never included in combined results
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Generated by Django 4.2.9 on 2024-11-23 03:18
2+
3+
from django.db import migrations, models
4+
5+
6+
def migrate_destination_server_to_data_source(apps, schema_editor):
7+
EnvironmentalJusticeRow = apps.get_model("environmental_justice", "EnvironmentalJusticeRow")
8+
9+
# Migrate prod to spreadsheet
10+
EnvironmentalJusticeRow.objects.filter(destination_server="prod").update(
11+
data_source="spreadsheet", destination_server=""
12+
)
13+
14+
# Migrate dev to ml_production
15+
EnvironmentalJusticeRow.objects.filter(destination_server="dev").update(
16+
data_source="ml_production", destination_server=""
17+
)
18+
19+
# Migrate test to ml_testing
20+
EnvironmentalJusticeRow.objects.filter(destination_server="test").update(
21+
data_source="ml_testing", destination_server=""
22+
)
23+
24+
25+
class Migration(migrations.Migration):
26+
27+
dependencies = [
28+
("environmental_justice", "0005_environmentaljusticerow_destination_server"),
29+
]
30+
31+
operations = [
32+
migrations.AddField(
33+
model_name="environmentaljusticerow",
34+
name="data_source",
35+
field=models.CharField(
36+
blank=True,
37+
choices=[
38+
("spreadsheet", "Spreadsheet"),
39+
("ml_production", "ML Production"),
40+
("ml_testing", "ML Testing"),
41+
],
42+
default="",
43+
max_length=20,
44+
verbose_name="Data Source",
45+
),
46+
),
47+
migrations.RunPython(migrate_destination_server_to_data_source, reverse_code=migrations.RunPython.noop),
48+
migrations.RemoveField(
49+
model_name="environmentaljusticerow",
50+
name="destination_server",
51+
),
52+
]

environmental_justice/models.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ class EnvironmentalJusticeRow(models.Model):
66
Environmental Justice data from the spreadsheet
77
"""
88

9-
class DestinationServerChoices(models.TextChoices):
10-
DEV = "dev", "Development"
11-
TEST = "test", "Testing"
12-
PROD = "prod", "Production"
9+
class DataSourceChoices(models.TextChoices):
10+
SPREADSHEET = "spreadsheet", "Spreadsheet"
11+
ML_PRODUCTION = "ml_production", "ML Production"
12+
ML_TESTING = "ml_testing", "ML Testing"
1313

14-
destination_server = models.CharField(
15-
"Destination Server", max_length=10, choices=DestinationServerChoices.choices, default="", blank=True
14+
data_source = models.CharField(
15+
"Data Source", max_length=20, choices=DataSourceChoices.choices, default="", blank=True
1616
)
1717

1818
dataset = models.CharField("Dataset", blank=True, default="")

environmental_justice/tests.py

Lines changed: 0 additions & 3 deletions
This file was deleted.
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
import pytest
2+
from django.urls import include, path
3+
from rest_framework.routers import DefaultRouter
4+
from rest_framework.test import APIClient
5+
6+
from environmental_justice.views import EnvironmentalJusticeRowViewSet
7+
8+
# Create router and register our viewset
9+
router = DefaultRouter()
10+
router.register(r"environmental-justice", EnvironmentalJusticeRowViewSet)
11+
12+
# Create temporary urlpatterns for testing
13+
urlpatterns = [
14+
path("api/", include(router.urls)),
15+
]
16+
17+
18+
# Override default URL conf for testing
19+
@pytest.fixture
20+
def client():
21+
"""Return a Django REST framework API client"""
22+
return APIClient()
23+
24+
25+
@pytest.fixture(autouse=True)
26+
def setup_urls():
27+
"""Setup URLs for testing"""
28+
from django.conf import settings
29+
30+
settings.ROOT_URLCONF = __name__
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import factory
2+
from factory.django import DjangoModelFactory
3+
4+
from environmental_justice.models import EnvironmentalJusticeRow
5+
6+
7+
class EnvironmentalJusticeRowFactory(DjangoModelFactory):
8+
class Meta:
9+
model = EnvironmentalJusticeRow
10+
11+
dataset = factory.Sequence(lambda n: f"dataset_{n}")
12+
description = factory.Faker("sentence")
13+
description_simplified = factory.Faker("sentence")
14+
indicators = factory.Faker("sentence")
15+
intended_use = factory.Faker("sentence")
16+
latency = factory.Faker("word")
17+
limitations = factory.Faker("sentence")
18+
project = factory.Faker("word")
19+
source_link = factory.Faker("url")
20+
strengths = factory.Faker("sentence")
21+
format = factory.Faker("file_extension")
22+
geographic_coverage = factory.Faker("country")
23+
data_visualization = factory.Faker("sentence")
24+
spatial_resolution = factory.Faker("word")
25+
temporal_extent = factory.Faker("date")
26+
temporal_resolution = factory.Faker("word")
27+
sde_link = factory.Faker("url")
28+
data_source = EnvironmentalJusticeRow.DataSourceChoices.SPREADSHEET

0 commit comments

Comments
 (0)