Skip to content

Commit 2e6cc06

Browse files
authored
Merge pull request #1102 from NASA-IMPACT/update_cmr_mappings
Update cmr mappings
2 parents 5306d32 + 08e6070 commit 2e6cc06

9 files changed

+2217
-163
lines changed

scripts/ej/README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# EJ Data Processing Pipeline
2+
3+
This pipeline processes NASA Common Metadata Repository (CMR) data and environmental justice (EJ) classifications to create standardized data dumps for the Science Discovery Engine (SDE).
4+
5+
## Overview
6+
7+
The pipeline consists of several components:
8+
- CMR data processing
9+
- Environmental justice classification processing
10+
- Threshold-based filtering
11+
- Data dump creation
12+
13+
## Prerequisites
14+
15+
- Access to CMR collection data
16+
- Access to the classification model predictions (contact Bishwas for access)
17+
18+
## Setup
19+
20+
1. Clone the repository
21+
2. Install dependencies
22+
3. Configure settings in `scripts/ej/config.py`
23+
24+
## Input Files
25+
26+
You need two main input files:
27+
28+
1. **CMR Collections Data**: Generated using:
29+
```bash
30+
github.com/NASA-IMPACT/llm-app-EJ-classifier/blob/develop/scripts/data_processing/download_cmr.py
31+
```
32+
33+
2. **Classification Predictions**: Provided by the classification model, contact Bishwas for access
34+
35+
## Configuration
36+
37+
Edit `config.py` to customize:
38+
39+
- Classification thresholds
40+
- Authorized classifications
41+
- Input/output filenames
42+
- Timestamp formats
43+
44+
Example configuration:
45+
```python
46+
# Adjust thresholds for different indicators
47+
INDICATOR_THRESHOLDS = {
48+
"Climate Change": 1.0,
49+
"Disasters": 0.80,
50+
# ... other thresholds
51+
}
52+
53+
# Change filenames
54+
CMR_FILENAME = "your_cmr_file.json"
55+
INFERENCE_FILENAME = "your_predictions.json"
56+
```
57+
58+
## Usage
59+
60+
### Basic Usage
61+
62+
Run the pipeline on a local machine with the input files:
63+
```bash
64+
python create_ej_dump.py
65+
```
66+
67+
## Output
68+
69+
The pipeline generates a JSON file named `ej_dump_YYYYMMDD_HHMMSS.json` containing:
70+
- Processed CMR metadata
71+
- Environmental justice classifications
72+
73+
## Server Deployment
74+
75+
To deploy the output to the server:
76+
```bash
77+
# Copy to server
78+
scp ej_dump_YYYYMMDD_HHMMSS.json sde:/home/ec2-user/sde_indexing_helper/backups/
79+
80+
# Process on server using dm shell
81+
dmshell
82+
83+
# add your file name to cmr_to_models.py
84+
# paste and run the contents within the shell
85+
```

0 commit comments

Comments
 (0)