This document provides a comprehensive reference for all built-in patterns used by the SCAN plugin and guidance on creating custom patterns.
- Pattern Overview
- Built-in Patterns
- Pattern Categories
- Custom Pattern Creation
- Pattern Testing
- Best Practices
SCAN uses regular expressions (regex) to identify potential secrets in your codebase. The plugin includes over 50 built-in patterns covering common secret types, and you can add custom patterns for organization-specific secrets.
- Pattern Matching: SCAN scans each file line by line, applying regex patterns
- Context Analysis: The surrounding code context is analyzed to reduce false positives
- Entropy Analysis: High-entropy strings are flagged even if they don't match specific patterns
- Confidence Scoring: Each finding is assigned a confidence level based on multiple factors
AWS Access Key ID
AKIA[0-9A-Z]{16}- Description: AWS Access Key identifiers
- Example:
AKIAIOSFODNN7EXAMPLE - Confidence: High
AWS Secret Access Key
aws(.{0,20})?['\"][0-9a-zA-Z\/+]{40}['\"]- Description: AWS Secret Access Keys
- Example:
aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" - Confidence: High
AWS Session Token
aws(.{0,20})?session(.{0,20})?token- Description: AWS session tokens
- Confidence: Medium
GCP API Key
AIza[0-9A-Za-z\\-_]{35}- Description: Google Cloud Platform API keys
- Example:
AIzaSyDaGmWKa4JsXZ-HjGw7ISLn_3namBGewQe - Confidence: High
GCP Service Account Key
"type": "service_account"- Description: GCP service account JSON key files
- Confidence: High (when in JSON context)
Azure Storage Account Key
DefaultEndpointsProtocol=https;AccountName=.*;AccountKey=.*- Description: Azure storage connection strings
- Confidence: High
Azure Client Secret
azure(.{0,20})?client(.{0,20})?secret- Description: Azure application client secrets
- Confidence: Medium
GitHub Personal Access Token
ghp_[A-Za-z0-9]{36}- Description: GitHub personal access tokens (new format)
- Example:
ghp_1234567890abcdef1234567890abcdef12345678 - Confidence: High
GitHub OAuth Token
gho_[A-Za-z0-9]{36}- Description: GitHub OAuth access tokens
- Confidence: High
GitHub App Token
ghs_[A-Za-z0-9]{36}- Description: GitHub App installation access tokens
- Confidence: High
GitHub Refresh Token
ghr_[A-Za-z0-9]{76}- Description: GitHub refresh tokens
- Confidence: High
GitLab Personal Access Token
glpat-[A-Za-z0-9\\-_]{20}- Description: GitLab personal access tokens
- Confidence: High
Bitbucket App Password
bitbucket(.{0,20})?app(.{0,20})?password- Description: Bitbucket app passwords
- Confidence: Medium
Database Connection String
(mysql|postgresql|mongodb|redis)://[^\\s]+:[^\\s]+@[^\\s]+- Description: Database connection strings with credentials
- Example:
mysql://user:password@localhost:3306/database - Confidence: High
JDBC URL with Credentials
jdbc:[^\\s]+user=[^\\s]+.*password=[^\\s]+- Description: JDBC connection strings
- Confidence: High
MySQL
mysql(.{0,20})?password['"\\s]*[:=]['"\\s]*[^\\s'"]+- Description: MySQL password configurations
- Confidence: Medium
PostgreSQL
postgres(.{0,20})?password['"\\s]*[:=]['"\\s]*[^\\s'"]+- Description: PostgreSQL password configurations
- Confidence: Medium
MongoDB
mongodb(.{0,20})?password['"\\s]*[:=]['"\\s]*[^\\s'"]+- Description: MongoDB password configurations
- Confidence: Medium
Generic API Key
api(.{0,20})?key['"\\s]*[:=]['"\\s]*[A-Za-z0-9]{20,}- Description: Generic API key patterns
- Confidence: Medium
API Secret
api(.{0,20})?secret['"\\s]*[:=]['"\\s]*[A-Za-z0-9]{20,}- Description: Generic API secret patterns
- Confidence: Medium
Bearer Token
bearer\\s+[A-Za-z0-9\\-_\\.]+- Description: Bearer authentication tokens
- Confidence: Medium
Slack Token
xox[baprs]-[A-Za-z0-9]{10,48}- Description: Slack API tokens
- Example:
xoxb-1234567890-1234567890-abcdefghijklmnopqrstuvwx - Confidence: High
Discord Bot Token
[MN][A-Za-z\\d]{23}\\.[\\w-]{6}\\.[\\w-]{27}- Description: Discord bot tokens
- Confidence: High
Stripe API Key
sk_live_[A-Za-z0-9]{24}- Description: Stripe live API keys
- Confidence: High
Twilio API Key
SK[a-z0-9]{32}- Description: Twilio API keys
- Confidence: High
RSA Private Key
-----BEGIN RSA PRIVATE KEY------ Description: RSA private key headers
- Confidence: High
EC Private Key
-----BEGIN EC PRIVATE KEY------ Description: Elliptic Curve private key headers
- Confidence: High
OpenSSH Private Key
-----BEGIN OPENSSH PRIVATE KEY------ Description: OpenSSH private key headers
- Confidence: High
Certificate
-----BEGIN CERTIFICATE------ Description: X.509 certificate headers
- Confidence: Medium (certificates are often public)
Password Assignment
password['"\\s]*[:=]['"\\s]*[^\\s'"]{6,}- Description: Generic password assignments
- Confidence: Low (many false positives)
Secret Assignment
secret['"\\s]*[:=]['"\\s]*[^\\s'"]{8,}- Description: Generic secret assignments
- Confidence: Low
Basic Auth
Basic\\s+[A-Za-z0-9+/]+=*- Description: HTTP Basic Authentication headers
- Confidence: Medium
JWT Token
eyJ[A-Za-z0-9\\-_=]+\\.[A-Za-z0-9\\-_=]+\\.[A-Za-z0-9\\-_.+/=]*- Description: JSON Web Tokens
- Confidence: Medium
These patterns have very low false positive rates:
- Cloud provider keys (AWS, GCP, Azure)
- Version control tokens (GitHub, GitLab)
- Service-specific API keys (Slack, Stripe, etc.)
- Private key headers
- Database connection strings with credentials
These patterns may have some false positives but are generally reliable:
- Generic API keys/secrets
- Bearer tokens
- JWT tokens
- Basic auth headers
These patterns are more prone to false positives but catch common mistakes:
- Generic password assignments
- Generic secret assignments
- Configuration file patterns
Add custom patterns to catch organization-specific secrets:
scan {
customPatterns = listOf(
"MYCOMPANY_API_[A-Z0-9]{32}",
"INTERNAL_SECRET_[a-f0-9]{64}",
"PROD_KEY_[A-Za-z0-9\\-_]{40}"
)
}scan {
customPatterns = listOf(
// Match passwords in specific file types
"(?i)password\\s*[=:]\\s*[\"'][^\"']{8,}[\"']",
// Match API keys with specific prefixes
"(?:api[_-]?key|apikey)\\s*[=:]\\s*[\"']?([A-Za-z0-9]{20,})[\"']?",
// Match secrets in environment variable format
"^[A-Z_]+_SECRET=[A-Za-z0-9+/=]{20,}$"
)
}scan {
customPatterns = listOf(
// Company-specific API key format
"ACME_[A-Z]{2}_[0-9]{8}_[A-Za-z0-9]{16}",
// Internal service tokens
"svc_[a-z]{3,10}_[A-Za-z0-9]{32}",
// Database identifiers
"db_prod_[a-f0-9]{40}",
// Certificate thumbprints
"cert_[A-F0-9]{40}",
// License keys
"lic_[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}"
)
}scan {
customPatterns = listOf(
// ✅ Good: Specific, efficient pattern
"API_KEY_[A-Z0-9]{32}",
// ❌ Avoid: Too broad, slow
".*secret.*",
// ✅ Good: Anchored pattern
"^SECRET=[A-Za-z0-9]{20,}$",
// ❌ Avoid: Unanchored, may cause backtracking
"(secret|password|key).*[A-Za-z0-9]+.*"
)
}Create test files to validate your custom patterns:
// test-secrets.kt (for testing only)
val testCases = listOf(
"MYCOMPANY_API_12345678901234567890123456789012", // Should match
"MYCOMPANY_API_short", // Should not match
"OTHER_API_12345678901234567890123456789012", // Should not match
"MYCOMPANY_API_12345678901234567890123456789" // Should not match (wrong length)
)Use online regex testers to validate patterns:
Test patterns against known good and bad examples:
# Test with verbose output
./gradlew scanForSecrets --info
# Check specific files
./gradlew scanForSecrets -Dscan.include="test-patterns.kt"- Be Specific: Avoid overly broad patterns that cause false positives
- Use Anchors: Use
^and$when matching entire lines - Consider Context: Think about where the pattern might appear
- Test Thoroughly: Validate against real codebases
- Regular Reviews: Periodically review and update patterns
- False Positive Tracking: Keep track of common false positives
- Team Input: Get feedback from developers on pattern effectiveness
- Documentation: Document the purpose and examples for each custom pattern
// ❌ Too broad - will match everything
".*password.*"
// ✅ Better - more specific
"password\\s*=\\s*[\"'][^\"']{8,}[\"']"// ❌ Inefficient - catastrophic backtracking
"(a+)+b"
// ✅ Efficient - atomic grouping
"(?>a+)+b"// ❌ Case sensitive - might miss variations
"Password"
// ✅ Case insensitive
"(?i)password"Custom patterns work alongside SCAN's entropy detection:
scan {
// Lower entropy threshold to catch more random strings
entropyThreshold = 4.0
// Custom patterns for structured secrets
customPatterns = listOf(
"STRUCTURED_KEY_[A-Z0-9]{20}"
)
}- Unit Test Patterns: Test each pattern in isolation
- Integration Test: Test with real codebase samples
- Performance Test: Ensure patterns don't slow down scanning
- False Positive Test: Verify patterns don't trigger on safe code
For more information on configuration and usage, see the Configuration Reference and User Guide.