Skip to content

Commit 2dedae1

Browse files
authored
Merge pull request #11 from script3r/feature/const
2 parents b38fe99 + 6f9fab4 commit 2dedae1

File tree

32 files changed

+939
-697
lines changed

32 files changed

+939
-697
lines changed

DESIGN.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Cipherscope Architecture
2+
3+
## Overview
4+
Cipherscope is a static analysis scanner designed to build a cryptographic inventory. It parses source files using Tree-sitter, matches library anchors and algorithm symbols, and emits JSONL findings that can be aggregated into an inventory.
5+
6+
## Pipeline
7+
```mermaid
8+
flowchart TD
9+
A[Discovery] --> B[Parsing]
10+
B --> C[Library Anchoring]
11+
C --> D[Algorithm Detection]
12+
D --> E[JSONL Output]
13+
14+
A --> A1[File walk + filters]
15+
B --> B1[Tree-sitter AST]
16+
C --> C1[Import/include anchors]
17+
D --> D1[Symbol match + params]
18+
D1 --> D2[Local constant resolution]
19+
E --> E1[Library + algorithm assets]
20+
```
21+
22+
## Data Model
23+
- Library hit: name, file path, evidence location.
24+
- Algorithm hit: name, file path, evidence location, metadata (e.g., key size, primitive).
25+
- Output format is designed for tooling pipelines and inventory aggregation.
26+
27+
### JSONL Schema (Informal)
28+
```json
29+
{
30+
"assetType": "library|algorithm",
31+
"identifier": "string",
32+
"path": "string",
33+
"evidence": {
34+
"line": 1,
35+
"column": 1
36+
},
37+
"metadata": {
38+
"primitive": "string",
39+
"keySize": 256
40+
}
41+
}
42+
```
43+
44+
## Dedupe Policy
45+
To reduce overcounting on a single callsite, Cipherscope applies a simple same-line dedupe rule after matching:
46+
- If two algorithms share the same `primitive` and line, drop the generic identifier when a more specific variant is present.
47+
- A more specific identifier is one that either:
48+
- starts with the generic identifier plus a `-` (e.g., `AES-GCM` over `AES`), or
49+
- shares the same non-numeric tokens but adds numeric detail (e.g., `ECDSA-P256` over `ECDSA`).
50+
- Different primitives on the same line are kept.
51+
52+
## Patterns and Extensibility
53+
Patterns live in `patterns.toml`:
54+
- Libraries define anchors and API regexes.
55+
- Algorithms define symbol patterns and parameter extraction rules.
56+
Adding a new library or algorithm usually only requires editing `patterns.toml`.
57+
58+
## Scope and Limits
59+
- Inventory-first: it focuses on discovering crypto usage and relevant metadata.
60+
- Local constant resolution only; cross-file or full data-flow analysis is out of scope for now.

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@
66

77
[![CI](https://github.com/script3r/cipherscope/actions/workflows/ci.yml/badge.svg)](https://github.com/script3r/cipherscope/actions/workflows/ci.yml)
88

9-
`cipherscope` is a high-performance, command-line tool for scanning source code to detect the usage of cryptographic libraries and algorithms. It uses language-aware static analysis powered by [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) for high precision.
9+
`cipherscope` is a high-performance, command-line tool for scanning source code to detect the usage of cryptographic libraries and algorithms. The goal is to enable building an efficient, comprehensive cryptographic inventory. It uses language-aware static analysis powered by [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) for high precision.
1010

1111
## Key Features
1212

1313
- **High Performance**: Parallelized scanning of large codebases.
1414
- **Language-Aware**: Uses Tree-sitter parsers to reduce false positives by understanding code structure.
15+
- **Inventory-First**: Focused on assembling a reliable crypto usage inventory across large repos.
1516
- **Extensible Patterns**: Easily add new libraries and algorithms via a simple TOML configuration.
1617
- **Broad Language Support**: Currently supports C, C++, Java, Python, Go, Swift, PHP, Objective-C, and Rust.
1718
- **Developer Friendly**: JSONL output for easy integration with CI/CD pipelines and security tools.
@@ -29,6 +30,7 @@
2930
c. **Algorithm Detection**: If an anchor is found, the scanner performs a deeper search within that file for specific algorithm usage patterns, such as function calls and constants.
3031

3132
All results are streamed as JSONL to the output, allowing for real-time monitoring and processing.
33+
For a deeper architecture overview, see `DESIGN.md`.
3234

3335
## Installation
3436

fixtures/cpp/libsodium_comprehensive/expected.jsonl

Lines changed: 49 additions & 49 deletions
Large diffs are not rendered by default.

fixtures/cpp/mbedtls_comprehensive/expected.jsonl

Lines changed: 97 additions & 104 deletions
Large diffs are not rendered by default.
Lines changed: 30 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,40 @@
1-
{"assetType": "library", "evidence": {"column": 1, "line": 19}, "identifier": "OpenSSL", "path": "FIXME"}
1+
{"assetType": "algorithm", "evidence": {"column": 12, "line": 137}, "identifier": "DSA", "metadata": {"primitive": "signature"}, "path": "FIXME"}
2+
{"assetType": "algorithm", "evidence": {"column": 12, "line": 72}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
3+
{"assetType": "algorithm", "evidence": {"column": 12, "line": 80}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
4+
{"assetType": "algorithm", "evidence": {"column": 12, "line": 88}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
5+
{"assetType": "algorithm", "evidence": {"column": 26, "line": 238}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
6+
{"assetType": "algorithm", "evidence": {"column": 30, "line": 227}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
7+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 106}, "identifier": "ECDSA-P384", "metadata": {"primitive": "signature"}, "path": "FIXME"}
8+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 114}, "identifier": "ECDSA-P521", "metadata": {"primitive": "signature"}, "path": "FIXME"}
9+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 122}, "identifier": "DH", "metadata": {"keySize": 2048, "primitive": "keyexchange"}, "path": "FIXME"}
10+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 158}, "identifier": "SHA-1", "metadata": {"primitive": "hash"}, "path": "FIXME"}
11+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 163}, "identifier": "SHA-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
12+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 168}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
13+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 173}, "identifier": "SHA-384", "metadata": {"primitive": "hash"}, "path": "FIXME"}
214
{"assetType": "algorithm", "evidence": {"column": 5, "line": 178}, "identifier": "SHA-512", "metadata": {"primitive": "hash"}, "path": "FIXME"}
3-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 51}, "identifier": "ChaCha20", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
4-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 59}, "identifier": "Blowfish", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
5-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 35}, "identifier": "AES-GCM", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
6-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 40}, "identifier": "AES-GCM", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
7-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 29}, "identifier": "AES-CBC", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
8-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 32}, "identifier": "AES-CBC", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
15+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 183}, "identifier": "SHA3-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
16+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 188}, "identifier": "SHA3-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
917
{"assetType": "algorithm", "evidence": {"column": 5, "line": 193}, "identifier": "SHA3-384", "metadata": {"primitive": "hash"}, "path": "FIXME"}
18+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 198}, "identifier": "SHA3-512", "metadata": {"primitive": "hash"}, "path": "FIXME"}
19+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 203}, "identifier": "BLAKE2b", "metadata": {"primitive": "hash"}, "path": "FIXME"}
20+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 208}, "identifier": "BLAKE2s", "metadata": {"primitive": "hash"}, "path": "FIXME"}
21+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 213}, "identifier": "MD5", "metadata": {"primitive": "hash"}, "path": "FIXME"}
1022
{"assetType": "algorithm", "evidence": {"column": 5, "line": 226}, "identifier": "PBKDF2", "metadata": {"iterations": 10000, "primitive": "kdf"}, "path": "FIXME"}
11-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 158}, "identifier": "SHA-1", "metadata": {"primitive": "hash"}, "path": "FIXME"}
12-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 163}, "identifier": "SHA-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
13-
{"assetType": "algorithm", "evidence": {"column": 12, "line": 137}, "identifier": "DSA", "metadata": {"primitive": "signature"}, "path": "FIXME"}
14-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 29}, "identifier": "AES", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
15-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 32}, "identifier": "AES", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
23+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 226}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
1624
{"assetType": "algorithm", "evidence": {"column": 5, "line": 234}, "identifier": "Scrypt", "metadata": {"N": 16384, "primitive": "kdf"}, "path": "FIXME"}
17-
{"assetType": "algorithm", "evidence": {"column": 26, "line": 238}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
1825
{"assetType": "algorithm", "evidence": {"column": 5, "line": 239}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
1926
{"assetType": "algorithm", "evidence": {"column": 5, "line": 240}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
20-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 173}, "identifier": "SHA-384", "metadata": {"primitive": "hash"}, "path": "FIXME"}
21-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 106}, "identifier": "ECDSA-P384", "metadata": {"primitive": "signature"}, "path": "FIXME"}
22-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 62}, "identifier": "RC4", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
23-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 198}, "identifier": "SHA3-512", "metadata": {"primitive": "hash"}, "path": "FIXME"}
24-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 183}, "identifier": "SHA3-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
25-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 168}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
26-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 226}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
27-
{"assetType": "algorithm", "evidence": {"column": 30, "line": 227}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
2827
{"assetType": "algorithm", "evidence": {"column": 5, "line": 240}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
29-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 98}, "identifier": "ECDSA-P256", "metadata": {"primitive": "signature"}, "path": "FIXME"}
30-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 188}, "identifier": "SHA3-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
31-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 203}, "identifier": "BLAKE2b", "metadata": {"primitive": "hash"}, "path": "FIXME"}
32-
{"assetType": "algorithm", "evidence": {"column": 12, "line": 72}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
33-
{"assetType": "algorithm", "evidence": {"column": 12, "line": 80}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
34-
{"assetType": "algorithm", "evidence": {"column": 12, "line": 88}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
28+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 29}, "identifier": "AES-CBC", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
29+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 32}, "identifier": "AES-CBC", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
30+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 35}, "identifier": "AES-GCM", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
31+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 40}, "identifier": "AES-GCM", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
3532
{"assetType": "algorithm", "evidence": {"column": 5, "line": 45}, "identifier": "3DES", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
36-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 122}, "identifier": "DH", "metadata": {"keySize": 2048, "primitive": "keyexchange"}, "path": "FIXME"}
37-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 213}, "identifier": "MD5", "metadata": {"primitive": "hash"}, "path": "FIXME"}
38-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 114}, "identifier": "ECDSA-P521", "metadata": {"primitive": "signature"}, "path": "FIXME"}
39-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 54}, "identifier": "ChaCha20-Poly1305", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
40-
{"assetType": "algorithm", "evidence": {"column": 5, "line": 208}, "identifier": "BLAKE2s", "metadata": {"primitive": "hash"}, "path": "FIXME"}
4133
{"assetType": "algorithm", "evidence": {"column": 5, "line": 45}, "identifier": "DES", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
4234
{"assetType": "algorithm", "evidence": {"column": 5, "line": 48}, "identifier": "DES", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
35+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 51}, "identifier": "ChaCha20", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
36+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 54}, "identifier": "ChaCha20-Poly1305", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
37+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 59}, "identifier": "Blowfish", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
38+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 62}, "identifier": "RC4", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
39+
{"assetType": "algorithm", "evidence": {"column": 5, "line": 98}, "identifier": "ECDSA-P256", "metadata": {"primitive": "signature"}, "path": "FIXME"}
40+
{"assetType": "library", "evidence": {"column": 1, "line": 19}, "identifier": "OpenSSL", "path": "FIXME"}
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
{"assetType": "library", "evidence": {"column": 1, "line": 5}, "identifier": "Google Tink (C++)", "path": "FIXME"}
21
{"assetType": "algorithm", "evidence": {"column": 33, "line": 25}, "identifier": "AES-GCM", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
32
{"assetType": "algorithm", "evidence": {"column": 9, "line": 26}, "identifier": "AES-GCM", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
3+
{"assetType": "library", "evidence": {"column": 1, "line": 5}, "identifier": "Google Tink (C++)", "path": "FIXME"}

0 commit comments

Comments
 (0)