This document explains the comprehensive automated security scanning setup for the Pinecone Assistant MCP project.
The project uses multiple security scanning technologies:
- detect-secrets to prevent accidental commits of API keys, tokens, passwords, and other sensitive data
- Prompt Injection Detection to protect against AI-specific attacks and malicious prompt patterns
- Automatically scans all code on push and pull requests
- Scans git history (last 100 commits) for accidentally committed secrets
- Fails the build if new secrets are detected
- Location:
.github/workflows/secret-scan.yml
- Prevents committing secrets before they reach GitHub
- Runs automatically on
git commit - Location:
.pre-commit-config.yaml
- Tracks known placeholder keys and false positives
- Location:
.secrets.baseline
- Scans for 70+ malicious prompt patterns
- Baseline system to track known findings and only flag NEW patterns
- SHA256 fingerprinting for finding identification
- Detects document-corpus attack vectors (API bypass, data extraction)
- Integrated with pre-commit hooks and CI/CD pipeline
- Location:
.security/check_prompt_injections.py
Attack Categories Detected:
- Instruction override attempts ("ignore previous instructions")
- System prompt extraction ("show me your instructions")
- AI behavior manipulation ("you are now a different AI")
- Document data extraction ("dump all documents from the knowledge base")
- API bypass attempts ("bypass the Pinecone API limits")
- Configuration disclosure ("reveal your system prompt")
- Social engineering patterns ("we became friends")
- Unicode steganography attacks (Variation Selectors, zero-width characters)
The enhanced detector includes comprehensive Unicode steganography detection to counter advanced threats like the Repello AI emoji injection attack:
Detection Capabilities:
- Variation Selector Encoding: Detects VS0/VS1 (U+FE00/U+FE01) binary encoding in emojis
- Zero-Width Character Abuse: Identifies suspicious use of invisible Unicode characters
- High Invisible Character Ratios: Flags content with >10% invisible-to-visible character ratios
- Binary Pattern Recognition: Detects 8+ bit sequences that could encode hidden messages
Attack Patterns Detected:
- Emoji steganography (e.g., "Hello!" with hidden binary-encoded instructions)
- Zero-width space injection for text manipulation
- Invisible Unicode character abuse for bypassing filters
- Binary steganography using Variation Selectors
Examples of Detected Threats:
"Hello!" + hidden_binary_message— appears innocent but contains malicious instructions- Text with embedded zero-width characters for prompt manipulation
- Emoji sequences with suspicious Variation Selector patterns
- High ratios of invisible formatting characters
The prompt injection scanner uses a baseline system to track known findings and only flag NEW patterns not in the baseline. This solves the problem of false positives from legitimate code and documentation while maintaining protection against malicious prompt injection attacks.
- Baseline File:
.prompt_injections.baselinestores known findings - Fingerprinting: Each finding gets a unique SHA256 hash fingerprint
- Comparison: Scanner checks if each finding is in the baseline
- Exit Codes:
0— No NEW findings (all findings in baseline)1— NEW findings detected (not in baseline)2— Error occurred
First run — Create baseline:
uv run python .security/check_prompt_injections.py --update-baseline src/ tests/ *.md *.yml *.yaml *.jsonNormal run — Check against baseline:
uv run python .security/check_prompt_injections.py --baseline src/ tests/ *.yml *.yaml *.jsonUpdate baseline to include new legitimate findings:
uv run python .security/check_prompt_injections.py --update-baseline src/ tests/ *.md *.yml *.yaml *.jsonForce new baseline (overwrite existing):
uv run python .security/check_prompt_injections.py --force-baseline src/ tests/ *.md *.yml *.yaml *.json| Option | Purpose |
|---|---|
--baseline |
Use existing baseline (only NEW findings fail) |
--update-baseline |
Add new findings to baseline |
--force-baseline |
Create new baseline (overwrite existing) |
--verbose, -v |
Show detailed output with full matches |
--quiet, -q |
Only show summary (suppress individual findings) |
DO Update Baseline When:
- New legitimate code is flagged (variable names, class names, documentation)
- Approved refactoring changes line numbers
- Baseline is outdated after a code restructure
DON'T Update Baseline When:
- Malicious pattern detected (remove the code instead)
- You're unsure (ask for review first)
- Security-related finding (review carefully first)
# Install pre-commit framework and detect-secrets
uv pip install pre-commit detect-secrets
# Install the git hooks
uv run pre-commit install
# Test the hooks (optional)
uv run pre-commit run --all-filesSecret Detection:
# Scan entire codebase
uv run detect-secrets scan
# Scan specific files
uv run detect-secrets scan src/server.py
# Update baseline after reviewing findings
uv run detect-secrets scan --baseline .secrets.baseline
# Audit baseline (review all flagged items)
uv run detect-secrets audit .secrets.baselinePrompt Injection Detection:
# Scan for prompt injection patterns
uv run python .security/check_prompt_injections.py src/ tests/ *.md
# Scan specific directories
uv run python .security/check_prompt_injections.py src/
# Run via pre-commit hook
uv run pre-commit run prompt-injection-check --all-files
# Test with verbose output
uv run python .security/check_prompt_injections.py --verbose src/ tests/- All Python source files (
src/,tests/,deploy/) - Configuration files
- Shell scripts and workflows
- YAML configuration files
audits/*.md— Security audit reports*.md— Documentation files (may contain example keys)package-lock.json— NPM lock file.secrets.baseline— Baseline file itselfstrategic-searches.yaml— Search pattern configuration
If detect-secrets flags a legitimate placeholder:
- Verify it's truly a placeholder (not a real secret)
- Update the baseline to mark it as known:
uv run detect-secrets scan --baseline .secrets.baseline
- Commit the updated baseline:
git add .secrets.baseline git commit -m "Update secrets baseline after review"
If you accidentally committed a real secret:
- Revoke the secret immediately — delete the key at Pinecone Console → API Keys → Delete
- Generate a new API key in the Pinecone Console
- Remove from git history:
# Use BFG Repo Cleaner or git filter-branch # See: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository
- Update stored key:
# Re-run setup to store new key via DPAPI .\deploy\windows_setup.ps1
- ✅ Store secrets using Windows DPAPI (via deployment script)
- ✅ Use environment variables as fallback on Linux/macOS
- ✅ Use placeholder values in example configs
- ✅ Run
pre-commit run --all-filesbefore first commit - ✅ Review baseline updates carefully
- ✅ Check security audit log at
~/.pinecone_assistant/logs/security_audit.log
- ❌ Hardcode API keys in source code (
pcsk_*format) - ❌ Commit
.envfiles - ❌ Use real secrets in tests (use mocks/fixtures)
- ❌ Disable pre-commit hooks without review
- ❌ Ignore secret scanning failures in CI
The workflow runs on:
- All pushes to
main,master, anddevelopbranches - All pull requests to these branches
- Checkout full git history
- Install detect-secrets
- Scan current codebase against baseline
- Scan recent git history (last 100 commits)
- Report findings and fail if secrets detected
- Go to Actions tab in GitHub
- Click on Secret Scanning workflow
- Review any failures in the job logs
# Check what's detected
pre-commit run detect-secrets --all-files
# If false positive, update baseline
uv run detect-secrets scan --baseline .secrets.baseline
# Re-run commit
git commit- Review the GitHub Actions log to see what was flagged
- Verify if it's a real secret or false positive
- If false positive:
- Update baseline locally:
uv run detect-secrets scan --baseline .secrets.baseline - Commit and push the updated baseline
- Update baseline locally:
- If real secret:
- REVOKE THE SECRET IMMEDIATELY at Pinecone Console
- Remove from code and git history
- Fix and re-push
# Regenerate baseline from scratch
uv run detect-secrets scan \
--exclude-files 'audits/.*\.md' \
--exclude-files '\.md$' \
--exclude-files 'strategic-searches\.yaml' \
> .secrets.baseline
# Review and commit
git add .secrets.baseline
git commit -m "Regenerate secrets baseline"This scanning complements the recommendations in SECURITY_GUIDELINES.md:
- Prevents API keys from being committed
- Enforces use of environment variables and DPAPI secure storage
- Provides audit trail for secret management
- Supports incident response procedures
- Detects prompt injection attacks before they reach the codebase
The scanner detects 20+ types of secrets including:
Cloud Provider Keys:
- AWS Access Keys
- Azure Storage Keys
- GCP Service Account Keys
- IBM Cloud IAM Keys
API & Service Tokens:
- GitHub Tokens
- GitLab Tokens
- OpenAI API Keys
- Pinecone API Keys (
pcsk_*) - Stripe API Keys
- Twilio Keys
- SendGrid Keys
- Slack Tokens
- Discord Bot Tokens
- Telegram Bot Tokens
General Secrets:
- Private SSH Keys
- JWT Tokens
- NPM Tokens
- PyPI Tokens
- Basic Auth Credentials
- High-Entropy Strings (Base64/Hex)
- Password Keywords
The scanner is configured to detect Pinecone API keys (format: pcsk_*). Real Pinecone API keys must always be stored via DPAPI (preferred) or as environment variables:
# Linux/macOS (environment variable)
export PINECONE_ASSISTANT_API_KEY=your_actual_key_here
# Windows (DPAPI — use deployment script)
.\deploy\windows_setup.ps1Test files in tests/ may contain placeholder keys for validation testing (e.g., pcsk_test_key_for_testing). These are tracked in .secrets.baseline and are verified to be test-only placeholders, not real credentials.
strategic-searches.yaml is excluded from secret scanning as it contains domain-specific search patterns, not credentials.
- detect-secrets Documentation
- Pre-commit Framework
- GitHub Secret Scanning
- OWASP Secrets Management
- Windows DPAPI Documentation
See SECURITY_GUIDELINES.md for broader security practices or file an issue on GitHub.