Skip to content

Commit 02e9e94

Browse files
authored
Doc notebooks (#4)
* implmenting pylint recommendation * Implementing pylint recommendations * Implementing more pylint recommendations * First set of doc notebooks. Fixing a few bugs found during the creation of doc notebooks. Updated several modules with numpy-style docstrings - to allow for auto-documentation * Fixing failing test by specifying include_paths * Updates to README.md and updates for unit tests to add additional entities and fix for changes to iocextract defaults. Also doc fix for iocextract.py * Adding unit test for auditdextract.py. Use pandas df search in base64unpack.py instead of iterrows then search. Fixing auditdextract.py to only create process events for EXECVE syscalls. Added code to rename fields if name clash from different sub-records.
1 parent 900c3cb commit 02e9e94

27 files changed

+7037
-494
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,4 @@ venv.bak/
102102

103103
# mypy
104104
.mypy_cache/
105+
/msticpy.code-workspace

README.md

Lines changed: 153 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,63 @@
11
# MSTIC Jupyter and Python Security Tools
22

3-
Microsoft Threat Intelligence Python Security Package:
3+
Microsoft Threat Intelligence Python Security Tools.
4+
5+
The **msticpy** package was initially developed to supported Jupyter Notebook
6+
authoring for [Azure Sentinel](https://azure.microsoft.com/en-us/services/azure-sentinel/).
7+
However, many of the components can be used in other security scenarios for threat hunting
8+
and threat investigation. There are three main sub-packages:
9+
10+
- sectools - python security tools to help with data analysis or investigation
11+
- nbtools - Jupyter-specific UI tools such as widgets and data display
12+
- data - data interfaces specific to Sentinel/Log Analytics
13+
14+
The package is in an early preview mode so there are likely to be bugs and there are several
15+
that are not yet optimized for performance. We welcome feedback, bug reports and suggestions
16+
for new features.
17+
18+
## Installing
19+
20+
`pip install msticpy`
21+
22+
or for the latest dev build
23+
24+
`pip install git+https://github.com/microsoft/msticpy`
25+
26+
## Documentation
27+
28+
The public functions, classes and public class methods have docstrings that describe the
29+
parameters and, for more complex functions give a more detailed description of functionality
30+
and outputs. We are in the process of producing more formal documentation on read-the-docs.
31+
32+
Until then, the functionality is described in the following sections and accompanying notebooks.
33+
You can also browse through the sample notebooks (especially the *Windows Alert Investigation* notebook)
34+
to see some of the functionality used in context.
35+
36+
---
37+
38+
## Security Tools Sub-package - `sectools`
439

5-
## sectools
640
This subpackage contains several modules helpful for working on security
741
investigations and hunting:
42+
843
### base64unpack
9-
Base64 and archive (gz, zip, tar) extractor. Input can either be a single string or a specified column of a pandas dataframe. It will try to identify any base64 encoded strings and decode them. If the result looks like one of the supported archive types it will unpack the contents. The results of each decode/unpack are rechecked for further base64 content and will recurse down up to 20 levels (default can be overridden).
44+
45+
Base64 and archive (gz, zip, tar) extractor. Input can either be a single string
46+
or a specified column of a pandas dataframe. It will try to identify any base64 encoded
47+
strings and decode them. If the result looks like one of the supported archive types it
48+
will unpack the contents. The results of each decode/unpack are rechecked for further
49+
base64 content and will recurse down up to 20 levels (default can be overridden).
1050
Output is to a decoded string (for single string input) or a DataFrame (for dataframe input).
51+
1152
[Base64Unpack Notebook](./doc/Base64Unpack.ipynb)
1253

1354
### iocextract
14-
Uses a set of builtin regular expressions to look for Indicator of Compromise (IoC) patterns. Input can be a single string or a pandas dataframe with one or more columns specified as input.
55+
56+
Uses a set of builtin regular expressions to look for Indicator of Compromise (IoC) patterns.
57+
Input can be a single string or a pandas dataframe with one or more columns specified as input.
58+
1559
The following types are built-in:
60+
1661
- IPv4 and IPv6
1762
- URL
1863
- DNS domain
@@ -23,68 +68,142 @@ You can modify or add to the regular expressions used at runtime.
2368

2469
Output is a dictionary of matches (for single string input) or a DataFrame (for dataframe input).
2570

71+
[Base64Unpack Notebook](./doc/IoCExtract.ipynb)
72+
2673
### vtlookup
27-
Wrapper class around Virus Total API (https://www.virustotal.com/en/documentation/public-api/).
74+
75+
Wrapper class around [Virus Total API](https://www.virustotal.com/en/documentation/public-api/).
2876
Input can be a single IoC observable or a pandas DataFrame containing multiple observables.
29-
Processing requires a Virus Total account and API key and processing performance is limited to
77+
Processing requires a Virus Total account and API key and processing performance is limited to
3078
the number of requests per minute for the account type that you have.
3179
Support IoC Types:
80+
3281
- Filehash
3382
- URL
3483
- DNS Domain
3584
- IPv4 Address
3685

37-
[VTLookup Notebook](./doc/VTLookup.ipynb)
86+
[VTLookup Notebook](./doc/VirusTotalLookup.ipynb)
3887

3988
### geoip
89+
4090
Geographic location lookup for IP addresses.
4191
This module has two classes for different services:
42-
- GeoLiteLookup - Maxmind Geolite (see https://www.maxmind.com)
43-
- IPStackLookup - IPStack (see https://ipstack.com)
44-
Both services offer a free tier for non-commercial use. However,
45-
a paid tier will normally get you more accuracy, more detail and
46-
a higher throughput rate. Maxmind geolite uses a downloadable database,
92+
93+
- GeoLiteLookup - Maxmind Geolite (see <https://www.maxmind.com>)
94+
- IPStackLookup - IPStack (see <https://ipstack.com>)
95+
Both services offer a free tier for non-commercial use. However,
96+
a paid tier will normally get you more accuracy, more detail and
97+
a higher throughput rate. Maxmind geolite uses a downloadable database,
4798
while IPStack is an online lookup (API key required).
4899

100+
[GeoIP Lookup Notebook](./doc/GeoIPLookups.ipynb)
101+
49102
### eventcluster
50-
This module is intended to be used to summarize large numbers of
51-
events into clusters of different patterns. High volume repeating
103+
104+
This module is intended to be used to summarize large numbers of
105+
events into clusters of different patterns. High volume repeating
52106
events can often make it difficult to see unique and interesting
53-
items. The module uses a pattern-based approach rather than
54-
matching on exact strings - so an admin command that
55-
does some maintenance on thousands of servers with a commandline such as:
56-
```install-update -hostname {host.fqdn} -tmp:/tmp/{GUID}/rollback```
57-
Will be collapsed using the pattern of the command and ignoring
58-
individal host names and guids.
107+
items.
108+
109+
The module contains functions to generate clusterable features from
110+
string data. For example, an administration command that
111+
does some maintenance on thousands of servers with a commandline such as:<br>
112+
```install-update -hostname {host.fqdn} -tmp:/tmp/{GUID}/rollback```<br>
113+
can be collapsed into a single cluster pattern by ignoring the character values
114+
in the string and using delimiters or tokens to group the values.
115+
59116
This is an unsupervised learning module implemented using SciKit Learn DBScan.
60-
[EventClustering Notebook](./doc/EventClustering.ipynb)
61117

62-
## nbtools
63-
This is a collection of data access, display and utility modules
64-
designed to make working with Log Analytics data in Jupyter notebooks
118+
### outliers
119+
120+
Similar to the eventcluster module but a little bit more experimental (read 'less tested').
121+
It uses SkLearn Isolation Forest to identify outlier events in a single data set or using
122+
one data set as training data and another on which to predict outliers.
123+
124+
### auditdextract
125+
126+
Module to load and decode Linux audit logs. It collapses messages sharing the same
127+
message ID into single events, decodes hex-encoded data fields and performs some
128+
event-specific formatting and normalization (e.g. for process start events it will
129+
re-assemble the process command line arguments into a single string). This is still
130+
a work-in-progress.
131+
132+
## Notebook tools sub-package - `nbtools`
133+
134+
This is a collection of display and utility modules
135+
designed to make working with security data in Jupyter notebooks
65136
quicker and easier.
66-
- nbwidgets - groups common functionality such as list pickers,
137+
138+
- nbwidgets - groups common functionality such as list pickers,
67139
time boundary settings, saving and retrieving
68140
environment variables into a single line callable command.
69-
- nbdisplay - functions that implement common display of things like
141+
- nbdisplay - functions that implement common display of things like
70142
alerts, events in a slightly more consumable way than print()
71-
- entityschema - implements entity classes (e.g. Host, Account, IPAddress)
72-
used in Log Analytics alerts and in many of these modules.
143+
- entityschema - implements entity classes (e.g. Host, Account, IPAddress)
144+
used in Log Analytics alerts and in many of these modules.
73145
Each entity encaspulates one or more properties related to the entity.
74-
- query manager - collection of modules that implement common
146+
147+
[Notebooks Tools](./doc/NotebookWidgets.ipynb)
148+
149+
## Data sub-package - `data`
150+
151+
These components are currently still part of the nbtools sub-package but will be
152+
refactored to separate them into their own package.
153+
154+
- query manager - collection of modules that implement common
75155
kql/Log Analytics queries using KqlMagic
76-
- security_alert and security_event - encapsulation classes for alerts
77-
and events. Each has a standard 'entities' property reflecting the
78-
entities found in the alert or event. These can also be used as
156+
- security_alert and security_event - encapsulation classes for alerts
157+
and events. Each has a standard 'entities' property reflecting the
158+
entities found in the alert or event. These can also be used as
79159
meta-parameters for many of the queries. For example the query:
80160
```qry.list_host_logons(provs==[query_times, alert])``` will extract the
81161
value for the ```hostname``` query parameter from the alert.
82162

83-
# Contributing
163+
---
164+
165+
## Clone the notebooks in this repo to Azure Notebooks
166+
167+
Requires sign-in to Azure Notebooks
168+
<a href="https://notebooks.azure.com/import/gh/Microsoft/msticpy"><img src="https://notebooks.azure.com/launch.png" /></a>
169+
170+
## More Notebook Examples
171+
172+
See the following notebooks for more examples of the use of this package in practice:
173+
174+
- Windows Alert Investigation in [github](https://github.com/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Investigation%20-%20Process-Alerts.ipynb)
175+
or [NbViewer](https://nbviewer.jupyter.org/github/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Investigation%20-%20Process-Alerts.ipynb)
176+
- Windows Host Explorer in [github](https://github.com/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Hunting%20-%20Windows-Host-Explorer.ipynb)
177+
or [NbViewer](https://nbviewer.jupyter.org/github/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Hunting%20-%20Windows-Host-Explorer.ipynb)
178+
- Office 365 Exploration in [github](https://github.com/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Hunting%20-%20Office365-Exploring.ipynb)
179+
or [NbViewer](https://nbviewer.jupyter.org/github.com/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Hunting%20-%20Office365-Exploring.ipynb)
180+
- Cross-Network Hunting in [github](https://github.com/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Hunting%20-%20Linux-Windows-Office.ipynb)
181+
or [NbViewer](https://nbviewer.jupyter.org/github/Azure/Azure-Sentinel/blob/master/Notebooks/Sample-Notebooks/Example%20-%20Guided%20Hunting%20-%20Linux-Windows-Office.ipynb)
182+
183+
## To-Do Items
184+
185+
- Refactor data modules into separate package.
186+
- Replace custom data schema with [Intake](https://intake.readthedocs.io/en/latest/).
187+
- Add additional notebooks to document use of the tools.
188+
189+
## Supported Platforms and Packages
190+
191+
- msticpy is OS-independent
192+
- Requires Python 3.6 or later
193+
- Requires the following python packages: pandas, bokeh, matplotlib, seaborn, setuptools, urllib3,
194+
ipywidgets, numpy, attrs, requests, networkx, ipython, scikit_learn, typing
195+
- The following packages are recommended and needed for some specific functionality: Kqlmagic, maxminddb_geolite2,
196+
folium, dnspython, ipwhois
197+
198+
See [requirements.txt](requirements.txt) for more details and version requirements.
199+
200+
---
201+
202+
## Contributing
84203

85204
This project welcomes contributions and suggestions. Most contributions require you to agree to a
86205
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
87-
the rights to use your contribution. For details, visit https://cla.microsoft.com.
206+
the rights to use your contribution. For details, visit <https://cla.microsoft.com>.
88207

89208
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
90209
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions

0 commit comments

Comments
 (0)