|
4 | 4 | <img src="cipherscope.png" alt="CipherScope Logo" width="350" height="350"> |
5 | 5 | </div> |
6 | 6 |
|
7 | | -Fast cryptographic inventory generator that creates Minimal Viable Cryptographic Bill of Materials (MV-CBOM) documents. Scans codebases to identify cryptographic algorithms, certificates, and assess post-quantum cryptography readiness. |
| 7 | +Fast AST-based cryptographic library and algorithm detection tool. Uses Abstract Syntax Tree parsing to precisely identify cryptographic usage in source code and outputs findings in JSONL format. |
8 | 8 |
|
9 | 9 | ## Quick Start |
10 | 10 |
|
11 | 11 | ```bash |
12 | 12 | cargo build --release |
13 | | -./target/release/cipherscope --patterns patterns.toml --progress /path/to/scan [... paths] |
| 13 | +./target/release/cipherscope --progress /path/to/scan [... paths] |
14 | 14 | ``` |
15 | 15 |
|
16 | 16 | ## What It Does |
17 | 17 |
|
18 | | -- **Detects** cryptographic usage across 11 languages |
19 | | -- **Identifies** many cryptographic algorithms (AES, SHA, RSA, ECDSA, ChaCha20, etc.) |
20 | | -- **Outputs** JSON inventory with NIST quantum security levels |
21 | | -- **Runs fast** - GiB/s throughput with parallel scanning |
| 18 | +- **AST-based detection** - Uses tree-sitter parsers for precise source code analysis |
| 19 | +- **Library detection** - Identifies crypto libraries via import/include/using statements |
| 20 | +- **Algorithm detection** - Finds algorithm usage via method names, function calls, and type definitions |
| 21 | +- **Multi-language support** - C, C++, Rust, Python, Java, Go |
| 22 | +- **JSONL output** - Simple one-JSON-object-per-line format for easy processing |
| 23 | +- **Fast parallel scanning** - Efficient processing of large codebases |
22 | 24 |
|
23 | 25 | ## Example Output |
24 | 26 |
|
25 | | -```json |
26 | | -{ |
27 | | - "bomFormat": "MV-CBOM", |
28 | | - "specVersion": "1.0", |
29 | | - "cryptoAssets": [{ |
30 | | - "name": "RSA", |
31 | | - "assetProperties": { |
32 | | - "primitive": "signature", |
33 | | - "parameterSet": {"keySize": 2048}, |
34 | | - "nistQuantumSecurityLevel": 0 |
35 | | - } |
36 | | - }] |
37 | | -} |
| 27 | +```jsonl |
| 28 | +{"language":"C","library":"OpenSSL","symbol":"<openssl/evp.h>","file":"src/main.c","line":1,"column":10,"snippet":"<openssl/evp.h>","detector":"ast-detector-c"} |
| 29 | +{"language":"Python","library":"cryptography","symbol":"cryptography.hazmat.primitives.ciphers","file":"app.py","line":1,"column":6,"snippet":"cryptography.hazmat.primitives.ciphers","detector":"ast-detector-python"} |
| 30 | +{"language":"Rust","library":"ring","symbol":"ring::aead","file":"main.rs","line":1,"column":5,"snippet":"ring::aead","detector":"ast-detector-rust"} |
38 | 31 | ``` |
39 | 32 |
|
40 | 33 | ## Options |
41 | 34 |
|
42 | 35 | ### Core Options |
43 | | -- `--patterns PATH` - Custom patterns file (default: `patterns.toml`) |
44 | 36 | - `--progress` - Show progress bar during scanning |
45 | | -- `--deterministic` - Reproducible output for testing/ground-truth generation |
46 | | -- `--output FILE` - Output file for single-project CBOM (default: stdout) |
47 | | -- `--recursive` - Generate MV-CBOMs for all discovered projects |
48 | | -- `--output-dir DIR` - Output directory for recursive CBOMs |
| 37 | +- `--deterministic` - Reproducible output for testing |
| 38 | +- `--output FILE` - Output file for JSONL results (default: stdout) |
49 | 39 |
|
50 | 40 | ### Filtering & Performance |
51 | 41 | - `--threads N` - Number of processing threads |
52 | 42 | - `--max-file-size MB` - Maximum file size to scan (default: 2MB) |
53 | 43 | - `--include-glob GLOB` - Include files matching glob pattern(s) |
54 | 44 | - `--exclude-glob GLOB` - Exclude files matching glob pattern(s) |
55 | 45 |
|
56 | | -### Certificate Scanning |
57 | | -- `--skip-certificates` - Skip certificate scanning during CBOM generation |
58 | | - |
59 | | -### Configuration |
60 | | -- `--print-config` - Print merged patterns/config and exit |
61 | | - |
62 | 46 | ## Languages Supported |
63 | 47 |
|
64 | | -C, C++, Go, Java, Kotlin, Python, Rust, Swift, Objective-C, PHP, Erlang |
65 | | - |
66 | | -## Configuration |
67 | | - |
68 | | -Edit `patterns.toml` to add new libraries or algorithms. No code changes needed. |
| 48 | +C, C++, Go, Java, Python, Rust (AST-based detection) |
69 | 49 |
|
70 | 50 | ## How It Works (High-Level) |
71 | 51 |
|
72 | | -1. Workspace discovery and prefilter |
73 | | - - Walks files respecting .gitignore |
74 | | - - Cheap Aho-Corasick prefilter using language-specific substrings derived from patterns |
75 | | -2. Language detection and comment stripping |
76 | | - - Detects language by extension; strips comments once for fast regex matching |
77 | | -3. Library identification (anchors) |
78 | | - - Per-language detector loads compiled patterns for that language (from `patterns.toml`) |
79 | | - - Looks for include/import/namespace/API anchors to confirm a library is present in a file |
80 | | -4. Algorithm matching |
81 | | - - For each identified library, matches algorithm `symbol_patterns` (regex) against the file |
82 | | - - Extracts parameters via `parameter_patterns` (e.g., key size, curve) with defaults when absent |
83 | | - - Emits findings with file, line/column, library, algorithm, primitive, and NIST quantum level |
84 | | -5. Deep static analysis (fallback/enrichment) |
85 | | - - For small scans, analyzes files directly with the registry to find additional algorithms even if no library finding was produced |
86 | | -6. CBOM generation |
87 | | - - Findings are deduplicated and merged |
88 | | - - Final MV-CBOM JSON is printed or written per CLI options |
89 | | - |
90 | | -All behavior is driven by `patterns.toml` — adding new libraries/algorithms is a data-only change. |
| 52 | +1. **File Discovery** - Walks files respecting .gitignore and language detection |
| 53 | +2. **AST Parsing** - Uses tree-sitter parsers to build Abstract Syntax Trees for each supported language |
| 54 | +3. **Pattern Matching** - Executes tree-sitter queries to find: |
| 55 | + - **Library imports** - `#include`, `import`, `use` statements for crypto libraries |
| 56 | + - **Algorithm usage** - Function calls, method invocations, type references |
| 57 | +4. **Result Emission** - Outputs findings as JSONL with precise location information |
| 58 | + |
| 59 | +The AST-based approach provides more accurate detection than regex patterns by understanding the actual structure of the code. |
91 | 60 |
|
92 | 61 | ## Testing |
93 | 62 |
|
|
0 commit comments