Skip to content

AI-powered adverse media screening tool for compliance analysts. Uses LLMs to assess article credibility, extract entities, match persons, and detect adverse mentions.

Notifications You must be signed in to change notification settings

plutopulp/adverse-media-screening

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Adverse Media Screening Tool

An AI-powered tool for compliance analysts to screen individuals against news articles for adverse media mentions. The system uses large language models to assess an article's credibility, extract all person entities and person-person relationships, match the analyst person query to article person entities and assess whether the coverage is adverse.

Project Context: This was developed as a technical assessment for a compliance software role. The repository started from a Next.js + tRPC template provided by the hiring team (commit d970003). All subsequent development is original work.

✨ Features

  • Credibility Assessment: Evaluates article reliability before performing expensive analysis
  • Extracts all person entities from the article and the relationships between entities (for network understanding), resolving coreferencing as much as possible.
  • Person Matching: LLM-based analysis matches individuals against article mentions with explainable confidence scores and detailed signal breakdowns.
  • Adverse Media Sentiment Analysis: Identifies negative mentions with categorized risk levels (fraud, corruption, sanctions, etc.)
  • Results Persistence: Saves screening results to disk (seemed fine for an MVP)
  • Explainable Outputs: Every decision includes structured reasoning, evidence spans, and confidence scores
  • Multi-Provider Support: Works with OpenAI or (possibly :p) Anthropic LLMs (Only developed with openai though and not tested anthropic), where model can be defined in settings environment.

πŸ—οΈ How It Works

The system runs a straightforward pipeline: scrape article β†’ assess credibility β†’ extract entities β†’ match person β†’ analyse sentiment β†’ save result.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Browser (React UI)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Next.js + tRPC (Web Service)                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   FastAPI (AI Service Pipeline)                   β”‚
β”‚                                                                    β”‚
β”‚  1. Scrape Article (newspaper3k)                                  β”‚
β”‚  2. Check Credibility (LLM) ────────┐                            β”‚
β”‚  3. Extract Entities (LLM) ───────────                           β”‚
β”‚  4. Match Person (LLM) ──────────────┼─► OpenAI/Anthropic       β”‚
β”‚  5. Analyse Sentiment (LLM) ──────────                           β”‚
β”‚  6. Save Result (file storage) β”€β”€β”€β”€β”€β”€β”˜                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key points: Everything uses LLMs with temperature=0.0 for consistency. All outputs are structured (Pydantic models) with reasoning attached for explainability. The system is conservative - when uncertain, it flags for human review rather than making assumptions. Results are saved automatically so there's an audit trail.

πŸ“‹ Prerequisites

Verify Docker is installed:

Docker is required for running the app as communication between Next.js and fastapi servers happens over the docker network, but let me know if you encounter any issues and I can address that quickly.

docker compose version  # Should show v2.x.x

πŸš€ Setup

  1. Run the setup script:

    make setup

    Or without Make:

    bash scripts/setup.sh

    This will:

    • Check that Docker and Docker Compose v2 are installed and running
    • Create .env.secrets template files for you to add real API keys
  2. Add your real OpenAI API key to services/ai/.env.secrets:

    The .env.defaults file contains placeholder API keys. You need to override them with your real key:

    # Open services/ai/.env.secrets and add your real key:
    OPENAI__API_KEY=sk-your-actual-openai-api-key-here

    Note: The placeholder keys in .env.defaults won't work - you must add your real API key to .env.secrets.

  3. Build and start:

    make start

    Or without Make:

    cd docker && docker compose build && docker compose up -d

That's it! Access at:

πŸ“– Usage

Performing a Screening

  1. Navigate to http://localhost:3000
  2. Enter the article URL
  3. Provide person details:
    • First name, last name (required)
    • Middle name(s) (optional)
    • Date of birth (optional, improves matching accuracy)
  4. Click "Screen Article"
  5. Review the detailed results with:
    • Article credibility assessment
    • Person matching analysis with confidence scores
    • Adverse media sentiment (if matched)

Viewing Past Screenings

  1. Click "View Results" in the navbar
  2. Browse all past screenings
  3. Click any result card to see full details

🐳 Docker Commands

Using Make

The Makefile supports flexible targeting with SERVICE and ENV variables:

# Build services
make build                    # Build all services
make build SERVICE=ai         # Build AI service only
make build SERVICE=web        # Build web service only

# Start services
make start                    # Start all services in production mode
make start ENV=dev            # Start all services in development mode
make start SERVICE=ai         # Start AI service only
make start ENV=dev SERVICE=web  # Start web in development mode

# Other commands
make stop [SERVICE=all]       # Stop services
make restart [SERVICE=all]    # Restart services
make logs [SERVICE=all]       # View logs (follow mode)
make ps                       # Show service status
make shell SERVICE=ai         # Open shell in AI service
make shell SERVICE=web        # Open shell in web service

Variables:

  • SERVICE=ai|web|all - Target specific service (default: all)
  • ENV=prod|dev - Target environment (default: prod)

Using Docker Compose Directly

cd docker

docker compose build              # Build services
docker compose up -d              # Start (detached)
docker compose down               # Stop and remove
docker compose logs -f            # View logs
docker compose ps                 # Service status
docker compose exec ai sh         # Shell in AI service
docker compose exec web sh        # Shell in web service

βš™οΈ Configuration

The application uses a two-file environment configuration system:

  • .env.defaults (committed to git): Contains all configuration with sensible defaults and placeholder API keys
  • .env.secrets (gitignored): Contains your real API keys and any overrides

This separation keeps secrets out of version control while making configuration transparent.

Configuration Files

AI Service (services/ai/):

  • .env.defaults - Default LLM models, temperature, log level, and placeholder API keys
  • .env.secrets - Your real OpenAI and Anthropic API keys (override placeholders here)

Web Service (services/web/):

  • .env.defaults - AI service URL and Node environment
  • .env.secrets - Any environment-specific overrides (optional)

How It Works

Both config.py (Python) and docker-compose.yml load .env.defaults first, then .env.secrets. Any values in .env.secrets override those in .env.defaults. This means:

  1. The placeholder keys in .env.defaults are not functional - they're just examples
  2. You must add your real API keys to .env.secrets to use the application
  3. You can override any other setting (model, log level, etc.) in .env.secrets without modifying the committed defaults

Results Storage

Screening results are automatically saved to services/ai/results/. This directory is gitignored and created automatically by Docker on first run. You'll start with an empty results list and build your screening history as you use the tool.

Development vs Production

Production Mode (default - faster UX, optimized builds):

make start
# or: docker compose up ai web

Development Mode (hot reload for code changes):

make start ENV=dev
# or: docker compose up ai web-dev

πŸ“ Project Structure

.
β”œβ”€β”€ docker/                     # Docker configuration
β”‚   β”œβ”€β”€ docker-compose.yml     # Service orchestration
β”‚   β”œβ”€β”€ ai.Dockerfile          # AI service image
β”‚   └── web.Dockerfile         # Web service image
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ ai/                    # FastAPI backend
β”‚   β”‚   β”œβ”€β”€ app/               # Application code
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py      # Settings management
β”‚   β”‚   β”‚   β”œβ”€β”€ dependencies.py # Dependency injection
β”‚   β”‚   β”‚   β”œβ”€β”€ routes/        # API endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ services/      # Core pipeline stages
β”‚   β”‚   β”‚   └── utils/         # Utilities
β”‚   β”‚   β”œβ”€β”€ results/           # Saved screening results (gitignored)
β”‚   β”‚   β”œβ”€β”€ .env.defaults      # Default configuration
β”‚   β”‚   └── pyproject.toml     # Python dependencies
β”‚   └── web/                   # Next.js frontend
β”‚       β”œβ”€β”€ src/
β”‚       β”‚   β”œβ”€β”€ app/           # Next.js App Router
β”‚       β”‚   β”œβ”€β”€ lib/           # Utilities
β”‚       β”‚   β”œβ”€β”€ server/        # tRPC server
β”‚       β”‚   └── types/         # TypeScript definitions
β”‚       β”œβ”€β”€ .env.defaults      # Default configuration
β”‚       └── package.json       # Node dependencies
β”œβ”€β”€ Makefile                   # Convenience commands
└── README.md                  # This file

πŸ” API Documentation

Once running, visit http://localhost:5001/docs for interactive API documentation (Swagger UI).

Key Endpoints

  • POST /screening/screen - Perform a new screening
  • GET /screening/results - List all saved results
  • GET /screening/results/{id} - Get specific result
  • GET /health - Health check

πŸ› οΈ Development

Code Quality (AI Service)

make format     # Format Python code (black + isort)
make lint       # Lint Python code (flake8)
make test       # Run tests (pytest)

Viewing Logs

# All services
make logs

# Specific service only
make logs SERVICE=ai
make logs SERVICE=web

# Development mode logs
make logs ENV=dev

Rebuilding After Changes

# Rebuild specific service
make rebuild SERVICE=ai
make rebuild SERVICE=web

# Rebuild all services
make rebuild

πŸ› Troubleshooting

Services Won't Start

Check Docker is running:

docker ps

Check ports are available:

# On Mac/Linux
lsof -i :3000
lsof -i :5001

# On Windows
netstat -ano | findstr :3000
netstat -ano | findstr :5001

View service logs:

make logs

LLM Errors

"Invalid API key":

  • Verify your API key in services/ai/.env.secrets
  • Check you're using the correct provider (OpenAI vs Anthropic)

"Rate limit exceeded":

  • Your API account has hit rate limits
  • Wait and retry, or upgrade your API plan

"Model not found":

  • Verify the model name in .env.defaults (or override in .env.secrets) matches your provider
  • Check your API account has access to the model

Results Not Persisting

Check volume mount:

ls -la services/ai/results/
# Should show: data/ and index.json

Check permissions:

# Results directory should be writable
chmod -R 755 services/ai/results/

View storage logs:

cd docker && docker compose logs -f ai | grep -i result

Frontend Build Errors

Clear Next.js cache:

cd services/web
rm -rf .next
npm run build

Reinstall dependencies:

cd services/web
rm -rf node_modules package-lock.json
npm install

Still Having Issues?

  1. Check the logs: make logs or make logs SERVICE=ai
  2. Verify environment variables: cat services/ai/.env.defaults services/ai/.env.secrets
  3. Restart services: make restart
  4. Rebuild from scratch: make clean && make build && make start
  5. Check specific service: make logs SERVICE=web or make shell SERVICE=ai

About

AI-powered adverse media screening tool for compliance analysts. Uses LLMs to assess article credibility, extract entities, match persons, and detect adverse mentions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published