Skip to content

Commit 06cafe5

Browse files
Merge pull request #19 from google/approach-glossary
Update dfiq.org and add "Approach Glossary" page
2 parents ffaf167 + dc03b4d commit 06cafe5

File tree

20 files changed

+438
-55
lines changed

20 files changed

+438
-55
lines changed

data/approaches/Q1036.10.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ description:
2929
references:
3030
- "[Prefetch on the ForensicsWiki](https://forensics.wiki/prefetch/)"
3131
- "[PsExec on MITRE ATT&CK](https://attack.mitre.org/software/S0029/)"
32+
- "[Detecting PsExec Usage by 13Cubed](https://www.youtube.com/watch?v=oVM1nQhDZQc)"
3233
view:
3334
data:
3435
- type: ForensicArtifact

data/approaches/Q1036.11.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ description:
2929
references:
3030
- "[4688(S): A new process has been created](https://learn.microsoft.com/en-us/windows/security/threat-protection/auditing/event-4688)"
3131
- "[PsExec on MITRE ATT&CK](https://attack.mitre.org/software/S0029/)"
32+
- "[Detecting PsExec Usage by 13Cubed](https://www.youtube.com/watch?v=oVM1nQhDZQc)"
3233
view:
3334
data:
3435
- type: ForensicArtifact

data/approaches/Q1037.10.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ description:
2929
references:
3030
- "[Prefetch on the ForensicsWiki](https://forensics.wiki/prefetch/)"
3131
- "[PsExec on MITRE ATT&CK](https://attack.mitre.org/software/S0029/)"
32+
- "[Detecting PsExec Usage by 13Cubed](https://www.youtube.com/watch?v=oVM1nQhDZQc)"
3233
view:
3334
data:
3435
- type: ForensicArtifact

data/approaches/Q1037.11.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ description:
2929
references:
3030
- "[4688(S): A new process has been created](https://learn.microsoft.com/en-us/windows/security/threat-protection/auditing/event-4688)"
3131
- "[PsExec on MITRE ATT&CK](https://attack.mitre.org/software/S0029/)"
32+
- "[Detecting PsExec Usage by 13Cubed](https://www.youtube.com/watch?v=oVM1nQhDZQc)"
3233
view:
3334
data:
3435
- type: ForensicArtifact

data/approaches/Q1037.12.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ description:
2929
references:
3030
- "[4688(S): A new process has been created](https://learn.microsoft.com/en-us/windows/security/threat-protection/auditing/event-4688)"
3131
- "[PsExec on MITRE ATT&CK](https://attack.mitre.org/software/S0029/)"
32+
- "[Detecting PsExec Usage by 13Cubed](https://www.youtube.com/watch?v=oVM1nQhDZQc)"
3233
view:
3334
data:
3435
- type: ForensicArtifact

dfiq.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
import logging
1717
import networkx as nx
1818
import os
19+
import re
1920
import yamale
2021
import yaml
2122

@@ -556,3 +557,70 @@ def generate_question_index_md(self, allow_internal: bool = False) -> None:
556557
os.path.join(self.markdown_output_path, "questions", "index.md"), mode="w"
557558
) as file:
558559
file.write(content)
560+
561+
def generate_approach_glossary_md(self, allow_internal: bool = False) -> None:
562+
"""Generates Markdown for the Approach Glossary page, listing common items in Approaches.
563+
564+
Args:
565+
allow_internal (bool): Check if generating from internal items is allowed.
566+
"""
567+
data_type_and_value = {}
568+
processor_and_analysis_names = {}
569+
analysis_step_types = set()
570+
step_variables = set()
571+
for dfiq_id, component in self.components.items():
572+
if not isinstance(component, Approach):
573+
continue
574+
575+
if not allow_internal and component.is_internal:
576+
continue
577+
578+
for d in component.view.get("data"):
579+
if not data_type_and_value.get(d["type"]):
580+
data_type_and_value[d["type"]] = set()
581+
data_type_and_value[d["type"]].add(d["value"])
582+
583+
for p in component.view.get("processors"):
584+
if not processor_and_analysis_names.get(p["name"]):
585+
processor_and_analysis_names[p["name"]] = set()
586+
587+
for analysis in p["analysis"]:
588+
processor_and_analysis_names[p["name"]].add(analysis["name"])
589+
590+
for step in analysis["steps"]:
591+
analysis_step_types.add(step["type"])
592+
m = re.findall(r"\{.*?\}", step["value"])
593+
if m:
594+
step_variables.update(m)
595+
596+
if not self.markdown_output_path:
597+
raise ValueError("Markdown output path not specified")
598+
599+
descriptions = {
600+
"ForensicArtifact": "This corresponds to the name of a ForensicArtifact, an existing repository of "
601+
"machine-readable digital forensic artifacts ("
602+
"https://github.com/ForensicArtifacts/artifacts). Using this type is preferred when "
603+
"the data is a host-based file/artifact, but other methods are available as well (if "
604+
"there isn't an existing relevant ForensicArtifact).",
605+
"description": "Text description of the data type. `description` is often using in conjunction with "
606+
"another data type to provide more context. It can also be used alone, either as a "
607+
"placeholder or when more robust, programmatic data types do not fit.",
608+
}
609+
610+
template = self.jinja_env.get_template("approach_glossary.jinja2")
611+
context = {
612+
"data_type_and_value": data_type_and_value,
613+
"processor_and_analysis_names": processor_and_analysis_names,
614+
"analysis_step_types": analysis_step_types,
615+
"step_variables": step_variables,
616+
"descriptions": descriptions,
617+
"components": self.components,
618+
}
619+
content = template.render(context)
620+
with open(
621+
os.path.join(
622+
self.markdown_output_path, "contributing", "approach_glossary.md"
623+
),
624+
mode="w",
625+
) as file:
626+
file.write(content)

scripts/generate_site_markdown.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@
2323
dfiq_instance.generate_question_md(question.id)
2424

2525
dfiq_instance.generate_question_index_md()
26+
dfiq_instance.generate_approach_glossary_md()
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Approach Glossary
2+
3+
DFIQ's Approaches contain the important "how" to answer the Questions. Approaches are also the most complicated
4+
part of DFIQ, due to the amount of structured information they contain. DFIQ has a detailed
5+
[specification](https://dfiq.org/contributing/specification) that is a useful reference for
6+
creating new Approaches. However, some parts of Approaches need user-defined values that are beyond the specification.
7+
This page is a glossary of currently-used values, generated from the
8+
[DFIQ YAML files](https://github.com/google/dfiq/tree/main/data).
9+
10+
When writing new Approaches, check this glossary first to see if there's already an existing term that fits with what
11+
you're trying to do. If not, you are free to create a new one, but trying to reuse existing terms first will increase
12+
consistency throughout DFIQ. These concepts (data type, processors, analysis steps) also may not be straight-forward at
13+
first; the hope is that seeing some common values (and the linked usages) will help make them more clear.
14+
15+
## Data
16+
17+
This section (`view.data`) can have multiple ways describing the data needed for this approach. They should be thought
18+
of as complementary or as alternates to each other (they can be "OR"d together, they do not need to be "AND"d).
19+
Each is specified by a pair of `type` and `value`.
20+
21+
Example (from [Q1001.10](https://github.com/google/dfiq/blob/main/data/approaches/Q1001.10.yaml#L39)):
22+
23+
```
24+
view:
25+
data:
26+
- type: ForensicArtifact
27+
value: BrowserHistory
28+
```
29+
30+
Below are the current values of `type`, along with the `value`s set for each.
31+
32+
33+
#### CrowdStrike
34+
35+
For `type: CrowdStrike`, current entries for `value`:
36+
37+
- DnsRequest
38+
- PlatformEvents
39+
- ProcessRollup
40+
41+
#### ForensicArtifact
42+
**Description**: This corresponds to the name of a ForensicArtifact, an existing repository of machine-readable digital forensic artifacts (https://github.com/ForensicArtifacts/artifacts). Using this type is preferred when the data is a host-based file/artifact, but other methods are available as well (if there isn't an existing relevant ForensicArtifact).
43+
44+
For `type: ForensicArtifact`, current entries for `value`:
45+
46+
- BrowserHistory
47+
- NTFSUSNJournal
48+
- SantaLogs
49+
- WindowsEventLogs
50+
- WindowsPrefetchFiles
51+
- WindowsXMLEventLogSysmon
52+
53+
#### description
54+
**Description**: Text description of the data type. `description` is often using in conjunction with another data type to provide more context. It can also be used alone, either as a placeholder or when more robust, programmatic data types do not fit.
55+
56+
For `type: description`, current entries for `value`:
57+
58+
- Collect local browser history artifacts. These are often in the form of SQLite databases and JSON files in multiple directories.
59+
- Files used by the Windows Prefetch service.
60+
- Santa logs stored on the local disk; they may also be centralized off-system, but this artifact does not include those.
61+
- The NTFS $UsnJnrl file system metadata file. This ForensicArtifact definition does not include the $J alternate data stream, but many tools collect it anyway.
62+
- Windows Event Log files
63+
64+
65+
## Processors
66+
67+
A processor is what takes the data collected and processes it in some way to produce structured data an investigator
68+
reviews. Multiple processors can be defined, as there are often multiple programs capable of doing similar processing
69+
(example: log2timeline, Magnet Axiom, and Hindsight can all process browser history artifacts and deliver similar
70+
results).
71+
72+
Example (from [Q1001.10](https://github.com/google/dfiq/blob/main/data/approaches/Q1001.10.yaml#L58)):
73+
74+
```
75+
processors:
76+
- name: Plaso
77+
```
78+
79+
Below are the currently-defined processors:
80+
81+
- Crowdstrike Investigate (UI) [🔎](https://github.com/google/dfiq/search?q="name:%20Crowdstrike%20Investigate%20%28UI%29"+language%3AYAML)
82+
- Hindsight [🔎](https://github.com/google/dfiq/search?q="name:%20Hindsight"+language%3AYAML)
83+
- Plaso [🔎](https://github.com/google/dfiq/search?q="name:%20Plaso"+language%3AYAML)
84+
- Splunk [🔎](https://github.com/google/dfiq/search?q="name:%20Splunk"+language%3AYAML)
85+
86+
## Analysis Steps
87+
88+
Under each analysis method will be a sequence of one or more maps with keys `description`, `type`, and `value`.
89+
If there is more than one map, they should be processed in sequence in the analysis method (if applicable). In this
90+
way, we can describe multiple chained steps of analysis (with the `description` being a way to communicate to the user
91+
what exactly each "step" is doing, enabling a "show-your-work"-type capability).
92+
93+
Example (from [Q1001.10](https://github.com/google/dfiq/blob/main/data/approaches/Q1001.10.yaml#L63)):
94+
95+
```
96+
analysis:
97+
- name: OpenSearch
98+
steps:
99+
- description: &filter-desc Filter the results to just file downloads
100+
type: opensearch-query
101+
value: data_type:("chrome:history:file_downloaded" OR "safari:downloads:entry")
102+
- name: Python Notebook
103+
steps:
104+
- description: *filter-desc
105+
type: pandas
106+
value: query('data_type in ("chrome:history:file_downloaded", "safari:downloads:entry")')
107+
```
108+
109+
#### `type`
110+
111+
The contents of the `description` and `value` fields will vary wildly with little repetition, depending on what the
112+
analysis step is doing, but the step `type` should be one of a few common values.
113+
114+
Below are the currently-defined values of `type`:
115+
116+
- GUI [🔎](https://github.com/google/dfiq/search?q="type:%20GUI"+language%3AYAML)
117+
- manual [🔎](https://github.com/google/dfiq/search?q="type:%20manual"+language%3AYAML)
118+
- opensearch-query [🔎](https://github.com/google/dfiq/search?q="type:%20opensearch-query"+language%3AYAML)
119+
- pandas [🔎](https://github.com/google/dfiq/search?q="type:%20pandas"+language%3AYAML)
120+
- splunk-query [🔎](https://github.com/google/dfiq/search?q="type:%20splunk-query"+language%3AYAML)
121+
122+
#### Variable Substitution in step `value`
123+
124+
The step's `value` may benefit from some using a specific term to make the step more precise. Common examples of this
125+
include adding time bounds and filtering down to a specific identifier (user name, host, FQDN, or PID, for example).
126+
127+
DFIQ's convention for denoting a variable to be substituted when used is to wrap the term in **{ }**.
128+
129+
==More standardization is needed here to define common variables (such as timestamps in a particular format).==
130+
131+
Below are the currently-used variables in analysis steps:
132+
133+
- {file_reference value} [🔎](https://github.com/google/dfiq/search?q="%7Bfile_reference%20value%7D"+language%3AYAML)
134+
- {hostname} [🔎](https://github.com/google/dfiq/search?q="%7Bhostname%7D"+language%3AYAML)

site/docs/contributing/specification.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ A DFIQ document that conforms to the DFIQ Specification is represented in YAML.
3232
Both the 4- and 2-digit numbers start with a 1 (or higher) for components appropriate for external use. Numbers
3333
starting with a 0 are reserved for internal use (think of it like private IP address space). Users of DFIQ can use
3434
these IDs for their internal components without worrying about collisions with public components.
35-
[DFIQ on GitHub](https://github.com/google/dfiq) will serve as the "definitive" central repository to manage
35+
[DFIQ on GitHub](https://github.com/google/dfiq) will serve as the definitive central repository to manage
3636
public DFIQ components (and their IDs).
3737

3838
## Schema

site/docs/questions/Q1001.md

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,12 @@ The following data source(s) are needed for this approach to the question.
7777
**Description**
7878
: Collect local browser history artifacts. These are often in the form of SQLite databases and JSON files in multiple directories.
7979

80-
- ForensicArtifact: BrowserHistory
80+
81+
**Type**
82+
: [ForensicArtifact](https://github.com/ForensicArtifacts/artifacts#digital-forensics-artifacts-repository)
83+
84+
**Value**
85+
: BrowserHistory ([view on GitHub](https://github.com/ForensicArtifacts/artifacts/search?q=BrowserHistory))
8186

8287
### ⚙️ Processors
8388

@@ -94,7 +99,7 @@ relevant configuration options are.
9499

95100

96101
=== "Plaso"
97-
More information on [Plaso](https://forensics.wiki/Plaso).
102+
More information on [Plaso](https://forensics.wiki/plaso).
98103

99104
Recommended options:
100105

@@ -128,7 +133,7 @@ relevant configuration options are.
128133
df.query('data_type in ("chrome:history:file_downloaded", "safari:downloads:entry")')
129134
```
130135
=== "Hindsight"
131-
More information on [Hindsight](https://forensics.wiki/Hindsight).
136+
More information on [Hindsight](https://forensics.wiki/hindsight).
132137

133138
Recommended options:
134139

@@ -208,7 +213,12 @@ The following data source(s) are needed for this approach to the question.
208213
**Description**
209214
: Santa logs stored on the local disk; they may also be centralized off-system, but this artifact does not include those.
210215

211-
- ForensicArtifact: SantaLogs
216+
217+
**Type**
218+
: [ForensicArtifact](https://github.com/ForensicArtifacts/artifacts#digital-forensics-artifacts-repository)
219+
220+
**Value**
221+
: SantaLogs ([view on GitHub](https://github.com/ForensicArtifacts/artifacts/search?q=SantaLogs))
212222

213223
### ⚙️ Processors
214224

@@ -225,7 +235,7 @@ relevant configuration options are.
225235

226236

227237
=== "Plaso"
228-
More information on [Plaso](https://forensics.wiki/Plaso).
238+
More information on [Plaso](https://forensics.wiki/plaso).
229239

230240
Recommended options:
231241

@@ -298,7 +308,12 @@ The following data source(s) are needed for this approach to the question.
298308
**Description**
299309
: The NTFS $UsnJnrl file system metadata file. This ForensicArtifact definition does not include the $J alternate data stream, but many tools collect it anyway.
300310

301-
- ForensicArtifact: NTFSUSNJournal
311+
312+
**Type**
313+
: [ForensicArtifact](https://github.com/ForensicArtifacts/artifacts#digital-forensics-artifacts-repository)
314+
315+
**Value**
316+
: NTFSUSNJournal ([view on GitHub](https://github.com/ForensicArtifacts/artifacts/search?q=NTFSUSNJournal))
302317

303318
### ⚙️ Processors
304319

@@ -315,7 +330,7 @@ relevant configuration options are.
315330

316331

317332
=== "Plaso"
318-
More information on [Plaso](https://forensics.wiki/Plaso).
333+
More information on [Plaso](https://forensics.wiki/plaso).
319334

320335
Recommended options:
321336

0 commit comments

Comments
 (0)