Water Pollution Risk Assessment API

A demonstration REST API system built in Go that estimates water pollution risk for industrial facilities based on their characteristics and geographical location.

🎯 Project Overview

This is an MVP/mock system designed to demonstrate how a decision-support pipeline works for environmental risk assessment. It combines:

Industry-specific pollution profiles
Real-time location-based environmental data (mocked)
Multi-factor risk calculation engine
AI/LLM integration layer for mitigation suggestions

Important: This is NOT a production regulatory system. It's a demonstrative backend showing system architecture and logic flow.

🏗️ Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                         Client                               │
│              (Sends POST /api/v1/assess)                     │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    HTTP Handler Layer                        │
│              (Validates requests, routes)                    │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                  Assessment Service                          │
│            (Orchestrates complete workflow)                  │
└─────┬───────────────────────────────────────────────────────┘
      │
      ├─────────────────────────────────────────────┐
      │                                             │
      ▼                                             ▼
┌─────────────────────┐                  ┌──────────────────────┐
│  Location Service   │                  │   Risk Estimator     │
│  (Aggregates data)  │                  │  (Calculates score)  │
└─────┬───────────────┘                  └──────────────────────┘
      │                                             │
      ├──────┬──────┬──────┐                       │
      ▼      ▼      ▼      ▼                       │
   ┌────┐ ┌────┐ ┌────┐                           │
   │ WB │ │ PD │ │ LU │  Adapters                 │
   └────┘ └────┘ └────┘  (External Data)          │
                                                   │
                         ┌─────────────────────────┘
                         │
                         ▼
                  ┌──────────────┐
                  │ LLM Service  │
                  │ (Mitigations)│
                  └──────────────┘
                         │
                         ▼
                  Final Response

Component Responsibilities

Component	Purpose	Production Equivalent
Adapters	Fetch external environmental data	Real API integrations (OSM, WorldPop, USGS)
Location Service	Aggregate location data from multiple sources	Microservice with caching layer
Risk Estimator	Calculate pollution risk using domain rules	ML model or expert system
LLM Service	Generate contextual mitigation strategies	GPT-4/Claude API integration
Assessment Service	Orchestrate complete assessment pipeline	Core business logic service

📁 Project Structure

water-risk-api/
├── cmd/
│   └── server/
│       └── main.go                      # Entry point, server setup
│
├── internal/
│   ├── handlers/
│   │   └── assessment_handler.go       # HTTP request handling
│   │
│   ├── services/
│   │   ├── assessment_service.go       # Main orchestrator
│   │   ├── location_service.go         # Location data aggregation
│   │   └── llm_service.go              # AI integration layer
│   │
│   ├── estimators/
│   │   └── risk_estimator.go           # Risk calculation engine
│   │
│   ├── adapters/
│   │   ├── waterbody_adapter.go        # Water body data (OSM)
│   │   ├── population_adapter.go       # Population density (WorldPop)
│   │   └── landuse_adapter.go          # Land use classification (USGS)
│   │
│   ├── models/
│   │   └── models.go                   # Data structures
│   │
│   └── config/
│       └── config.go                   # Configuration management
│
├── go.mod                               # Go module definition
├── go.sum                               # Dependency checksums
├── README.md                            # This file
├── ARCHITECTURE.md                      # Detailed architecture docs
└── API_DOCS.md                          # API documentation

🚀 Quick Start

Prerequisites

Go 1.21 or higher
curl or Postman for testing

Installation

Clone and navigate

git clone <repository-url>
cd water-risk-api

Initialize Go module

go mod init water-risk-api
go mod tidy

Run the server

cd cmd/server
go run main.go

Verify it's running

curl http://localhost:8080/health
# Expected: {"status":"healthy","service":"water-risk-api"}

Testing the API

Example 1: Assess a textile factory

curl -X POST http://localhost:8080/api/v1/assess \
  -H "Content-Type: application/json" \
  -d '{
    "industry_type": "textile",
    "latitude": 28.6139,
    "longitude": 77.2090,
    "water_usage_m3": 450,
    "treatment_type": "primary"
  }'

Example 2: Low-risk food processing

curl -X POST http://localhost:8080/api/v1/assess \
  -H "Content-Type: application/json" \
  -d '{
    "industry_type": "food-processing",
    "latitude": 34.0522,
    "longitude": -118.2437,
    "water_usage_m3": 85,
    "treatment_type": "tertiary"
  }'

📊 How It Works

1. Request Processing

When you send a POST request to /api/v1/assess:

Client → Handler → Validation → Assessment Service

The handler:

Validates JSON structure
Checks required fields (industry_type, lat/lon, water_usage)
Validates coordinate ranges (-90 to 90 for lat, -180 to 180 for lon)
Validates treatment type (none/primary/secondary/tertiary)

2. Location Data Fetching

The Location Service calls three adapters in sequence (production would parallelize):

A. Water Body Adapter

Mock Behavior: Simulates based on coordinate patterns Production Would Do:

1. Call OpenStreetMap Overpass API
   Query: Find all waterways within 10km radius
   
2. Filter results by type (river, lake, reservoir)

3. Calculate distances using Haversine formula

4. Lookup sensitivity ratings from environmental databases
   - Rivers near drinking water sources: HIGH
   - Industrial reservoirs: MEDIUM
   - Isolated ponds: LOW

Mock Algorithm:

// Simulates "near river" if latitude suggests river regions
if math.Mod(lat, 0.5) < 0.3 {
    return River with HIGH sensitivity
}

B. Population Adapter

Mock Behavior: Uses trigonometric functions to simulate urban/rural Production Would Do:

1. Call WorldPop API or Census Bureau
   Endpoint: /v1/population?lat=X&lon=Y&radius=5km

2. Get population count in area

3. Calculate density: population / area_km²

4. Return people per km²

Mock Algorithm:

urbanness := math.Abs(math.Sin(lat*10)) * math.Abs(math.Cos(lon*10))

if urbanness > 0.7:
    return 3500-5500 people/km² (Urban)
else if urbanness > 0.4:
    return 800-2300 people/km² (Suburban)
else:
    return 50-450 people/km² (Rural)

C. Land Use Adapter

Mock Behavior: Hash-based selection from land use types Production Would Do:

1. Query USGS National Land Cover Database
   or ESA WorldCover API

2. Get land classification code at coordinates
   NLCD codes: 21=Developed, 41=Forest, 82=Cropland, etc.

3. Map to simplified categories:
   - Industrial (appropriate for facilities)
   - Agricultural (risk to food supply)
   - Residential (risk to people)
   - Protected area (highest concern)

Mock Algorithm:

// Pseudo-random selection based on coordinate hash
seed := int(lat*1000 + lon*1000)
return landUseTypes[seed % len(landUseTypes)]

3. Risk Calculation

The Risk Estimator combines multiple weighted factors:

Final Risk = Base Industry Risk 
           + Water Usage Factor
           - Treatment Reduction
           + Proximity Factor
           + Population Factor
           + Land Use Factor

Factor Breakdown

1. Base Industry Risk (40-80 points)

Chemical:        80  (hazardous compounds)
Pharmaceutical:  75  (active ingredients)
Electronics:     70  (heavy metals)
Textile:         65  (dyes, heavy metals)
Paper/Pulp:      55  (lignin, chlorine)
Food Processing: 45  (organic matter)

2. Water Usage (0-20 points)

< 100 m³/day:     0 points
100-500 m³/day:   Scaled 0-15 points
> 500 m³/day:     15-20 points

Formula: min((usage - 100) / 100 * intensity * 10, 20)

3. Treatment Reduction (-45 to 0 points)

None:       0 reduction
Primary:   -15 points (physical settling)
Secondary: -30 points (biological treatment)
Tertiary:  -45 points (advanced purification)

4. Proximity to Water Bodies (0-30 points)

< 1 km:   25 points base
1-3 km:   15 points base
3-5 km:   10 points base
> 5 km:    5 points base

Multiplied by sensitivity:
- HIGH sensitivity: ×1.5
- MEDIUM: ×1.0
- LOW: ×0.7

5. Population Density (0-15 points)

> 3000/km²:   15 points (dense urban)
1000-3000:    10 points (urban)
500-1000:      5 points (suburban)
< 500:         0 points (rural)

6. Land Use Type (-5 to +20 points)

Industrial zone:    -5 (appropriate location)
Commercial:         +5
Mixed-use:          +8
Agricultural:      +10 (food supply risk)
Residential:       +15 (direct human risk)
Protected area:    +20 (ecosystem risk)

Risk Categorization

0-39:    LOW risk
40-69:   MEDIUM risk
70-100:  HIGH risk

4. Mitigation Generation

The LLM Service has two modes:

Current MVP: Rule-Based

if treatment == "none":
    suggest "Install primary treatment"
    
if waterUsage > 100:
    suggest "Implement water recycling"
    
if industryType == "textile":
    suggest "Install dye recovery systems"
    
if nearHighSensitivityWaterBody:
    suggest "Install real-time monitoring"

Production: LLM-Powered

The service structures data into a prompt:

# Context
Industry: Textile
Water Usage: 450 m³/day
Current Treatment: Primary
Risk Score: 78.5/100 (HIGH)

# Key Factors
- High water usage (+18.9 points)
- Nearby river at 2.3 km (+28.7 points)
- Dense urban area (+15 points)
- Primary treatment only (-15 points)

# Request
Generate 4-6 specific, actionable mitigation strategies
considering technical feasibility and cost-effectiveness.

Then calls:

response, err := anthropicClient.Complete(ctx, &CompletionRequest{
    Model: "claude-3-sonnet-20240229",
    Prompt: structuredPrompt,
    MaxTokens: 500,
})

5. Response Assembly

Final JSON structure:

{
  "risk_score": "high",           // Category
  "risk_value": 78.5,             // Numeric 0-100
  "contributing_factors": [       // What drove the score
    {
      "factor": "Water Usage Volume",
      "impact": "increases",
      "weight": 18.9,
      "description": "High water usage (> 500 m³/day)"
    }
  ],
  "mitigation_actions": [         // What to do about it
    "Upgrade to secondary treatment...",
    "Implement water recycling..."
  ],
  "location_context": {           // Environmental data
    "nearby_water_bodies": [...],
    "population_density": 3850,
    "land_use_type": "mixed-use"
  }
}

🔧 Configuration & Customization

Adding New Industry Types

Edit internal/estimators/risk_estimator.go:

func initializeIndustryProfiles() map[string]models.IndustryProfile {
    return map[string]models.IndustryProfile{
        // ... existing industries
        "mining": {
            BaseRisk:       85.0,
            WaterIntensity: 2.5,
            PollutantTypes: []string{"heavy metals", "sediment", "acid drainage"},
        },
    }
}

Adjusting Risk Weights

Modify factor calculation functions in risk_estimator.go:

func calculateProximityRisk(waterBodies []models.WaterBody) float64 {
    // Adjust these thresholds
    if wb.DistanceKm < 1.0 {
        risk = 30.0  // Increase from 25.0 for stricter assessment
    }
}

Plugging in Real APIs

Replace mock logic in adapters:

Example: Real Water Body Adapter

func (a *WaterBodyAdapter) FetchNearbyWaterBodies(ctx context.Context, lat, lon float64) ([]models.WaterBody, error) {
    // Build Overpass API query
    query := fmt.Sprintf(`
        [out:json];
        (
          way["waterway"](around:10000,%f,%f);
          way["natural"="water"](around:10000,%f,%f);
        );
        out geom;
    `, lat, lon, lat, lon)
    
    // Make HTTP request
    resp, err := http.Post(
        "https://overpass-api.de/api/interpreter",
        "application/x-www-form-urlencoded",
        strings.NewReader(query),
    )
    
    // Parse response and calculate distances
    // ... implementation
}

🧪 Testing

Unit Tests Structure

water-risk-api/
├── internal/
│   ├── estimators/
│   │   ├── risk_estimator.go
│   │   └── risk_estimator_test.go
│   ├── adapters/
│   │   ├── waterbody_adapter_test.go
│   │   └── population_adapter_test.go

Example Test

func TestRiskEstimator_HighRiskTextile(t *testing.T) {
    estimator := NewRiskEstimator()
    
    req := models.AssessmentRequest{
        IndustryType:   "textile",
        WaterUsageM3:   500,
        TreatmentType:  "none",
    }
    
    locationCtx := models.LocationContext{
        NearbyWaterBodies: []models.WaterBody{{
            DistanceKm: 1.5,
            Sensitivity: "high",
        }},
        PopulationDensity: 3500,
    }
    
    risk, _ := estimator.EstimateRisk(req, locationCtx)
    
    if risk < 70 {
        t.Errorf("Expected high risk (>70), got %f", risk)
    }
}

📈 Performance Considerations

Current Implementation

Sequential adapter calls: ~500ms per request (with mock data)
No caching
Single-threaded request handling

Production Optimizations

1. Parallel Adapter Calls

var wg sync.WaitGroup
results := make(chan interface{}, 3)

wg.Add(3)
go func() {
    defer wg.Done()
    wb, _ := s.waterBodyAdapter.Fetch(ctx, lat, lon)
    results <- wb
}()
// ... similar for other adapters

wg.Wait()
close(results)

2. Redis Caching

// Cache location data by coordinate grid
cacheKey := fmt.Sprintf("location:%d:%d", 
    int(lat*100), int(lon*100))

if cached, err := redis.Get(cacheKey); err == nil {
    return parseLocationContext(cached)
}

3. Connection Pooling

var httpClient = &http.Client{
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 10,
    },
    Timeout: 10 * time.Second,
}

🔒 Security Considerations

Input Validation

Coordinate bounds checked
Water usage must be non-negative
Treatment type from allowed list
Request size limits (implement in production)

Production Additions Needed

// Rate limiting
import "golang.org/x/time/rate"
limiter := rate.NewLimiter(10, 100) // 10 req/sec, burst 100

// API key authentication
func authMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        apiKey := r.Header.Get("X-API-Key")
        if !validateAPIKey(apiKey) {
            http.Error(w, "Unauthorized", 401)
            return
        }
        next.ServeHTTP(w, r)
    })
}

// CORS configuration
cors.New(cors.Options{
    AllowedOrigins:   []string{"https://your-frontend.com"},
    AllowedMethods:   []string{"POST", "GET"},
    AllowCredentials: true,
})

🌍 Real-World Data Sources

When implementing production adapters, use:

Water Bodies

OpenStreetMap Overpass API: Free, global coverage
- https://overpass-api.de/api/interpreter
USGS National Hydrography: US-specific, authoritative
- https://hydro.nationalmap.gov/arcgis/rest/services

Population Density

WorldPop: Global population distribution
- https://www.worldpop.org/rest/data
US Census Bureau: US demographic data
- https://api.census.gov/data
Eurostat: European statistics
- https://ec.europa.eu/eurostat/api

Land Use

USGS NLCD: US land cover (30m resolution)
- https://www.mrlc.gov/data-services-page
ESA WorldCover: Global 10m resolution
- https://services.terrascope.be/wms/v2
Copernicus: European land monitoring
- https://land.copernicus.eu/api

🚧 Known Limitations

Mock data doesn't reflect actual geography - Real adapters needed
No database persistence - All computation on-demand
No authentication/authorization - Public API in current form
Single-threaded - No concurrent request handling optimization
No caching - Repeated requests re-fetch data
Basic risk model - Production needs expert validation
Rule-based mitigations - LLM integration not implemented
No monitoring/logging - No observability built in

🎓 Learning Resources

Understanding the Risk Model

EPA's Risk Assessment Guidelines
ISO 14001 Environmental Management
Water Quality Standards by region

Go Web Development

Geospatial APIs

📞 Support & Contribution

This is a demonstration project. For questions:

Review code comments in each file
Check ARCHITECTURE.md for design decisions
See API_DOCS.md for endpoint specifications

📄 License

MIT License - Use freely for educational and commercial purposes.

Built with Go 1.21 | No external dependencies for MVP | Production-ready architecture pattern

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Water Pollution Risk Assessment API

🎯 Project Overview

🏗️ Architecture Overview

Component Responsibilities

📁 Project Structure

🚀 Quick Start

Prerequisites

Installation

Testing the API

📊 How It Works

1. Request Processing

2. Location Data Fetching

A. Water Body Adapter

B. Population Adapter

C. Land Use Adapter

3. Risk Calculation

Factor Breakdown

Risk Categorization

4. Mitigation Generation

Current MVP: Rule-Based

Production: LLM-Powered

5. Response Assembly

🔧 Configuration & Customization

Adding New Industry Types

Adjusting Risk Weights

Plugging in Real APIs

🧪 Testing

Unit Tests Structure

Example Test

📈 Performance Considerations

Current Implementation

Production Optimizations

🔒 Security Considerations

Input Validation

Production Additions Needed

🌍 Real-World Data Sources

Water Bodies

Population Density

Land Use

🚧 Known Limitations

🎓 Learning Resources

Understanding the Risk Model

Go Web Development

Geospatial APIs

📞 Support & Contribution

📄 License