Skip to content

Commit 35e23ee

Browse files
Make a first pass at some structural introduction docs (#4076)
As I was wrapping my head around the project it wasn't very obvious to me what the high level structures of the project where (for example - what are the most important words?) These two docs try to provide what I think I probably would have appreciated when I started digging in. The two docs are: * docs/process_flow.md - Gives a high level overview of how trufflehog injests and processes sources * docs/concurrency.md - Tries to give a big picture overview of the concurrency structure that trufflehog uses, including the primary channels Also added a couple quick links from likely jumping off points in existing docs/code
1 parent 1637f5d commit 35e23ee

File tree

4 files changed

+190
-0
lines changed

4 files changed

+190
-0
lines changed

CONTRIBUTING.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ Contributors need to [sign our CLA](https://cla-assistant.io/trufflesecurity/tru
88

99
# Resources
1010

11+
## How things work
12+
13+
It can be a bit daunting diving into the code and wrapping your head around the project from a high level. The following two docs help give that high level overview:
14+
* [Process Flow](docs/process_flow.md)
15+
* [Concurrency Overview](docs/concurrency.md)
16+
1117
## Adding new secret detectors
1218

1319
We have published some [documentation and tooling to get started on adding new secret detectors](hack/docs/Adding_Detectors_external.md). Let's improve detection together!

docs/concurrency.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
2+
3+
## Concurrency
4+
5+
```mermaid
6+
sequenceDiagram
7+
%% Setup the workers
8+
participant Main
9+
Note over Main: e.startWorkers()<br />kicks off some number<br />of threads per worker type
10+
create participant ScannerWorkers
11+
Main->>ScannerWorkers: e.startScannerWorkers()
12+
Note over ScannerWorkers: ScannerWorkers are primarily<br />responsible for enumerating<br />and chunking a source
13+
create participant VerificationOverlapWorkers
14+
Main->>VerificationOverlapWorkers: e.startVerificationOverlapWorkers()
15+
Note over VerificationOverlapWorkers: VerificationOverlapWorkers<br />handles chunks<br />matched to multiple<br />detectors
16+
create participant DetectorWorkers
17+
Main->>DetectorWorkers: e.startDetectorWorkers()
18+
Note over DetectorWorkers: DetectorWorkers are primarily<br />responsible for running<br />detectors on chunks
19+
create participant NotifierWorkers
20+
Main->>NotifierWorkers: e.startNotifierWorkers()
21+
Note over NotifierWorkers: Primarily responsible for reporting<br />results (typically to the cmd line)
22+
23+
%% Set up the parallelism
24+
par
25+
Note over Main,ScannerWorkers: Depending on the type of<br />scan requested, calls one of<br />engine.(ScanGit|ScanGitHub|ScanFileSystem|etc)
26+
Main->>ScannerWorkers: e.ChunksChan()<br /><- chunk
27+
and
28+
Note over ScannerWorkers: Decode chunks and find matching detectors
29+
ScannerWorkers->>DetectorWorkers: e.detectableChunksChan<br /><- detectableChunk
30+
Note over ScannerWorkers: When multiple detectors match on the<br />same chunk we have to decided _which_<br />detector will verify found secrets
31+
ScannerWorkers->>VerificationOverlapWorkers: e.verificationOverlapChunksChan<br /><- verificationOverlapChunk
32+
and
33+
Note over VerificationOverlapWorkers: Decide which detectors to run on that chunk
34+
VerificationOverlapWorkers->>DetectorWorkers: e.detectableChunksChan<br /><- detectableChunk
35+
and
36+
Note over DetectorWorkers: Run detection (finding secrets),<br />optionally verify them<br />do filtering and enrichment
37+
DetectorWorkers->>NotifierWorkers: e.ResultsChan()|e.results<br /><-detectors.ResultWithMetadata
38+
and
39+
Note over NotifierWorkers: Write results to output
40+
end
41+
42+
43+
```

docs/process_flow.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# TruffleHog Process Flows
2+
3+
## Scans
4+
5+
## Data Flow
6+
7+
```mermaid
8+
flowchart LR
9+
SourceDecomposition["`**Source Decomposition**
10+
11+
Breaking up the locations that we are looking _for_ secrets into small chunks`"]
12+
13+
DetectorMatching{Chunk<br/>to<br/>Detector<br/>Matching}
14+
15+
SecretDetection["`**Secret Detection**
16+
17+
Finding secrets in these chunks and (optionally) verifying whether they are live`"]
18+
19+
ResultNotification["`**Result Notification**
20+
21+
Enriching results with metadata and (usually) printing to console`"]
22+
23+
SourceDecomposition -- chunks --> DetectorMatching
24+
DetectorMatching -- matched chunks --> SecretDetection
25+
SecretDetection -- results --> ResultNotification
26+
```
27+
28+
#### Source Decomposition
29+
30+
```mermaid
31+
flowchart TD
32+
subgraph Source
33+
direction TB
34+
SourceDescription("`**(1)** Sources are top level places we find data/files/text to _scan_`")
35+
GitSource["git Source"]
36+
GitHubSource["GitHub Source"]
37+
FilesystemSource["File System Source"]
38+
PostmanSource["Postman Source"]
39+
end
40+
41+
subgraph Unit
42+
direction TB
43+
UnitDescription("`**(2)** Units are natural subdivisions of Sources, but still quite large`")
44+
FilesystemUnit[Directory]
45+
GitUnit[Git Repository]
46+
end
47+
48+
subgraph Chunk
49+
direction TB
50+
ChunkDescription("`**(3)** Chunks are the smallest units that we decompose our chunks into, and are subsequent passed on to detection`")
51+
FilesystemChunk[file contents]
52+
GitRepositoryChunk["`git log diff hunks`"]
53+
PostmanChunk[data chunk]
54+
end
55+
56+
57+
SourceDescription -- decomposed into --> UnitDescription
58+
UnitDescription -- further decomposed into --> ChunkDescription
59+
60+
61+
GitSource -- cloned locally<br />if not already local --> GitUnit
62+
GitHubSource -- cloned locally --> GitUnit
63+
PostmanSource -- Most sources\ndon't use units --> PostmanChunk
64+
FilesystemSource --> FilesystemUnit
65+
66+
GitUnit -- git log -p --> GitRepositoryChunk
67+
FilesystemUnit --> FilesystemChunk
68+
69+
style SourceDescription fill:#89553e
70+
style UnitDescription fill:#89553e
71+
style ChunkDescription fill:#89553e
72+
```
73+
74+
#### Chunk to Detector Matching
75+
76+
```mermaid
77+
flowchart LR
78+
79+
80+
KeywordMatching["`**Keyword Matching**
81+
_(Aho-Corsick)_
82+
83+
Match chunks to detectors based on the presence of specific keywords in the chunk`"]
84+
85+
chunks --> KeywordMatching --> detectors
86+
```
87+
88+
#### Secret Detection
89+
90+
```mermaid
91+
flowchart LR
92+
93+
subgraph Detector
94+
direction RL
95+
subgraph DetectorDescription[" "]
96+
DetectorDescriptionText["`Detectors are the bits that actually check for the existence of a secret in a chunk, and (optionally) verify it`"]
97+
ExampleDetectors["`Example Detectors:
98+
* AWS
99+
* Azure
100+
* Twilio`"]
101+
end
102+
103+
subgraph DetectorResponsibility[" "]
104+
direction LR
105+
106+
De-Dupe-Detectors["`**De-Dupe-Detectors**
107+
108+
If multiple detectors keyword-match on the same chunk, we have some logic that chooses which detector will verify found secret (so we don't duplicate verification requests to externa APIs)`"]
109+
110+
CollectMatches["`**Collect Matches**
111+
112+
Detector specific regexes are run against the matched chunks, resulting in unverified secrets`"]
113+
VerifyMatches["`**Verify Matches**
114+
115+
Optionally, observed unverified secrets are verified by attempting to use them against live services`"]
116+
117+
De-Dupe-Detectors -- deduped detectors --> CollectMatches
118+
CollectMatches -- regex matched chunks --> VerifyMatches
119+
end
120+
121+
style DetectorDescription fill:#89553e
122+
style DetectorDescriptionText fill:#89553e
123+
end
124+
```
125+
126+
#### Result Notification
127+
128+
```mermaid
129+
flowchart LR
130+
131+
Dispatcher["`**Dispatcher**
132+
133+
Results, verified or otherwise, are sent to a dispatcher to be sent to whichever place we're updating about the
134+
results -- usually the command line.`"]
135+
136+
results --> Dispatcher --> output
137+
```
138+

pkg/engine/engine.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
// Check the [process flow](docs/process_flow.md) and [concurrency](docs/concurrency.md) docs for
2+
// something of a structural overview
3+
14
package engine
25

36
import (

0 commit comments

Comments
 (0)