Skip to content

ehebert7/salesforce-phi-scanner

Repository files navigation

PHI Scanner

Python 3.9+ License: MIT Salesforce

A modular tool for scanning Salesforce orgs to identify fields that may contain Protected Health Information (PHI) and require encryption under HIPAA compliance.

Features

  • SF CLI Integration: Uses existing Salesforce CLI authentication (no credentials to manage)
  • Multiple Scan Modes: Full API scan, local metadata only, or hybrid
  • PHI Risk Classification: Three-tier risk assessment (High/Medium/Low)
  • Data Sampling: Samples actual data with automatic masking
  • Encryption Gap Analysis: Identifies PHI fields not protected by Shield Platform Encryption
  • Encryption Recommendations: Deterministic vs. probabilistic encryption guidance
  • Multiple Output Formats: Excel, CSV, or JSON reports with professional styling
  • Interactive Mode: Select objects from a menu before scanning
  • Web UI Dashboard: Browser-based interface with real-time progress tracking

Installation

From Source

# Clone the repository
git clone https://github.com/YOUR_USERNAME/phi-scanner.git
cd phi-scanner

# Install core dependencies
pip install -r requirements.txt

# Install web UI dependencies (optional)
pip install -r requirements-web.txt

Using pip (Development Mode)

pip install -e .

# With web UI support
pip install -e ".[web]"

# With all optional dependencies
pip install -e ".[all]"

Prerequisites

  1. Python 3.9+
  2. Salesforce CLI installed and authenticated
    npm install -g @salesforce/cli
    sf org login web --alias myorg

Quick Start

1. List Your Authenticated Orgs

python main.py --list-orgs

2. Run an Interactive Scan

python main.py --org myorg --interactive

3. Run with Encryption Gap Analysis

python main.py --org myorg --objects Account,Contact --check-encryption

4. Launch Web UI

python main.py --web
# Open http://localhost:8000 in your browser

5. View the Report

Open the generated PHI_Scan_Report_*.xlsx file in Excel.

Usage Examples

Full Scan (API + Data Sampling)

# Scan default org
python main.py

# Scan specific org
python main.py --org myorg

# Scan specific objects only
python main.py --org myorg --objects Account,Contact,Lead,My_Custom__c

Interactive Object Selection

# Select objects from a menu
python main.py --org myorg --interactive

Available commands in interactive mode:

Command Action
1-5,8,10 Toggle specific objects
all / a Select all objects
none / n Clear all selections
custom / cu Select only custom objects
std / s Select only standard objects
done / d Proceed with scan
quit / q Cancel and exit

Encryption Gap Analysis

# Check which PHI fields are not encrypted
python main.py --org myorg --check-encryption

# Full scan with encryption analysis
python main.py --org myorg --objects Account --check-encryption

Web UI Dashboard

# Launch on default port 8000
python main.py --web

# Launch on custom port
python main.py --web --port 3000

Metadata-Only Scan (Local Files)

# Scan from local SFDX project (no API calls)
python main.py --mode metadata-only --source ./force-app

Output Options

# Excel output (default)
python main.py --org myorg --format excel

# CSV output (single flat file)
python main.py --org myorg --format csv --output ./reports/phi-audit.csv

# JSON output
python main.py --org myorg --format json --output ./reports/phi-audit.json

Command Line Options

Option Short Description
--org -o SF CLI org alias
--mode -m Scan mode: full, metadata-only, hybrid
--source -s Path to force-app for local scanning
--objects Comma-separated list of objects to scan
--format -f Output format: excel, csv, json
--output Output file path
--interactive -i Interactive object selection mode
--check-encryption Check Shield Platform Encryption status
--web Launch web UI dashboard
--port Port for web UI (default: 8000)
--org-name Organization name for report header
--config -c Path to YAML config file
--patterns Path to custom PHI patterns JSON
--list-orgs List authenticated orgs and exit
--quiet -q Suppress progress output

PHI Risk Classification

Tier 1 - High Risk (Confirmed PHI)

Fields matching patterns like:

  • SSN, Social Security
  • Birth Date, DOB
  • Medical, Health, Clinical
  • Diagnosis, Treatment, Medication
  • Insurance, Policy, Claim
  • Surgery, Procedure

Tier 2 - Medium Risk (Likely PHI)

Fields matching patterns like:

  • Phone, Mobile, Email
  • Address, Street, City, Zip
  • Name, First, Last
  • Emergency Contact
  • Account Number, Member ID

Tier 3 - Low Risk (Review Needed)

Fields matching patterns like:

  • Description, Notes, Comments
  • History, Record
  • Payment, Amount, Balance

Encryption Recommendations

Recommendation Use Case
Deterministic Fields that need to be searchable/filterable (SSN, IDs, Phone)
Probabilistic Sensitive text fields (Medical notes, Descriptions)
Review & Encrypt Medium-risk fields requiring business decision
No Encryption Non-PHI fields

Output Report Structure

Excel Report

  • Summary Sheet: Professional audit format with statistics, methodology, and risk definitions
  • Encryption Gaps Sheet: PHI fields not protected by Shield encryption (when using --check-encryption)
  • Per-Object Tabs: All fields with risk tier, assessment, encryption status, and sample data

CSV Report

  • Single flat file with all fields across all objects
  • Includes encryption status columns when using --check-encryption

JSON Report

  • Structured data for programmatic processing
  • Includes metadata, summary, encryption gap analysis, and per-object field details

Project Structure

phi-scanner/
├── main.py                 # CLI entry point
├── pyproject.toml          # Package configuration
├── requirements.txt        # Core dependencies
├── requirements-web.txt    # Web UI dependencies
├── README.md               # This file
├── LICENSE                 # MIT License
├── config/
│   ├── default_patterns.json   # PHI detection patterns
│   └── sample_config.yaml      # Example configuration
├── scanner/
│   ├── __init__.py         # Package init (v1.1.1)
│   ├── config.py           # Configuration management
│   ├── connection.py       # SF CLI integration
│   ├── metadata.py         # Metadata retrieval
│   ├── categorizer.py      # PHI risk classification
│   ├── sampler.py          # Data sampling with masking
│   ├── reporter.py         # Report generation
│   ├── interactive.py      # Interactive object selection
│   └── encryption.py       # Encryption status checker
└── web/
    ├── __init__.py         # Web package init
    ├── app.py              # FastAPI application
    ├── models.py           # Pydantic models + SQLite DB
    ├── templates/          # Jinja2 HTML templates
    └── static/             # CSS and JavaScript

Customization

Custom PHI Patterns

Create a custom patterns file based on config/default_patterns.json:

{
  "tier1_high": [
    "SSN|Social.*Security",
    "My_Custom_PHI_Pattern"
  ],
  "tier2_medium": [...],
  "tier3_low": [...]
}

Then run with:

python main.py --org myorg --patterns ./my-patterns.json

Custom Config File

Copy config/sample_config.yaml and customize:

  • Object selection
  • Sampling settings
  • Output preferences

Troubleshooting

"SF CLI not found"

Install Salesforce CLI:

npm install -g @salesforce/cli

"No authenticated orgs found"

Authenticate to your org:

sf org login web --alias myorg

"Session expired"

Re-authenticate:

sf org login web --alias myorg

Query Timeouts

Increase timeout in config or use --mode metadata-only for large orgs.

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - see LICENSE for details.

Author

Cloud Beacon Consulting

About

PHI (Protected Health Information) scanner for Salesforce orgs - HIPAA compliance tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •