Skip to content

Latest commit

 

History

History
103 lines (70 loc) · 2.74 KB

File metadata and controls

103 lines (70 loc) · 2.74 KB

Semgrep Integration

Scanipy can automatically clone and scan the top 10 repositories with Semgrep for security vulnerabilities.

!!! note "Command Usage" If installed via pip install scanipy-cli, use scanipy command. If running from source, use python scanipy.py instead.

Prerequisites

Install Semgrep before using this feature:

pip install semgrep

Basic Usage

# Run with default Semgrep rules
scanipy --query "extractall" --run-semgrep

Custom Rules

# Use custom Semgrep rules
scanipy --query "extractall" --run-semgrep --rules ./my_rules.yaml

# Use built-in tarslip rules
scanipy --query "extractall" --run-semgrep --rules ./tools/semgrep/rules/tarslip.yaml

Semgrep Pro

If you have a Semgrep Pro license:

scanipy --query "extractall" --run-semgrep --pro

Additional Semgrep Arguments

Pass additional arguments directly to Semgrep:

scanipy --query "extractall" --run-semgrep --semgrep-args "--severity ERROR --json"

Managing Cloned Repositories

# Keep cloned repositories after analysis
scanipy --query "extractall" --run-semgrep --keep-cloned

# Specify a custom clone directory
scanipy --query "extractall" --run-semgrep --clone-dir ./repos

# Combine both
scanipy --query "extractall" --run-semgrep --keep-cloned --clone-dir ./repos

Resuming Interrupted Analysis

When running Semgrep analysis on many repositories, you can use --results-db to save progress to a SQLite database. If the analysis is interrupted (Ctrl+C, network error, etc.), simply re-run the same command to resume from where you left off:

# Start analysis with database persistence
scanipy --query "extractall" --run-semgrep --results-db ./results.db

# If interrupted, just run the same command again - already analyzed repos will be skipped
scanipy --query "extractall" --run-semgrep --results-db ./results.db
# Output: "📂 Resuming session 1 - 5 repos already analyzed"

The database stores:

  • Analysis sessions (query, timestamp, rules used)
  • Results for each repository (success/failure, Semgrep output)

Built-in Rules

Scanipy includes built-in security rules:

Tarslip Rules

Detect path traversal vulnerabilities in archive extraction:

scanipy --query "extractall" --run-semgrep --rules ./tools/semgrep/rules/tarslip.yaml

Semgrep Options Reference

Option Description
--run-semgrep Enable Semgrep analysis
--rules Path to custom Semgrep rules
--pro Use Semgrep Pro
--semgrep-args Additional Semgrep CLI arguments
--clone-dir Directory for cloned repos
--keep-cloned Keep repos after analysis
--results-db SQLite database for saving/resuming