Analysis functions for MSTICPy #612

ianhelle · 2023-01-26T20:42:36Z

ianhelle
Jan 26, 2023
Maintainer

From a recent discussion (2023-01-26) we arrived at a consensus that it makes sense to have some more scenario-specific analysis functions in MSTICPy (or at least in the MSTICPy family).

Julien has implemented some ideas in notebooks
Janantha has several of these in his memory forensics notebook
Florian has a PoC for a framework for these

There is a bit of discussion in a parallel thread here #611 (comment)

Some discussion points

Would be good to create a new repo and perhaps use Python namespaces to glue this into MSTICPy
- from my understanding we would need a folder in MSTICPy with no init.py and the analysis package(s) would mirror the same structure. Users could also create their own namespace packages with the same structure allowing them to be imported into MSTICPy namespace
Some of the analyis techniques have data sets that might grown over time or that a user might want to add to. An example is system process names subject to spoofing
- so we should implement these to be config/data-based (e.g. in yaml files) and allowing a user to add to or override the default config/data files
For many analysis functions we might need to map source data fields to generic, known fields (e.g. process name). There are a few ways that we could implement this:
- We could use an open/published generic schema like OSSEM or ASIM and set up global mappings to these data definitions (we'd could identify which OSSEM/ASIM schema a data set corresponded to in the query definitions). OSSEM (I think) has a mechanism to infer source data schemas - this would be helpful for ad hoc queries or random data files. We might also be able to use the parser code from Sentinel to map data that Sentinel knows about.
- We could instrument query definitions with our own generic schema map (for a limited set of core fields) - we have examples of this kind of thing in the process tree code
- We could ask the user to supply a mapping at runtime for required columns of the analytic
We want to avoid duplicating Sigma since we have a long-term goal of integrating with Sigma and using the rules directly
If the analytic had a clear entity mapping we could expose the analytic as a pivot function

Analytics and Feb 2023 Hack Month

These are attractive things for community contributions - so it would be good to have a rough framework for contributions (in the next week or so)
We have quite a few examples (e.g. from Janantha's notebook) that we could readily adapt.
For time reasons we can start by using the simplest schema option - having the user supply a mapping for key fields (although it would not be a lot of work to create some common mappings for a limited set of common data types)
Maybe we could start by checking them into the current MP repo and then move to a parallel namespace package.

ianhelle · 2023-01-26T20:54:35Z

ianhelle
Jan 26, 2023
Maintainer Author

Implementation notes

functions should take a pandas DF as data parameter
typically they should return results as a DF
they should take a schema_map parameter (dict or simple class) that maps analytic key fields to source field names
If the analytic is specific to a known entity type (Process, host, IP, etc.) that should be declared as public metadata

Maybe we could adopt a format similar to notebooklets - a class that has a required run method
It would be nice if we could avoid having to instantiate classes prior to use though - maybe we have an analytics class that imports all of the analysis classes, instantiates them and exposes the run function renamed to reflect the analytic name (name could be a mandatory attribute of the class.

@FlorianBracq has a PoC for an implementation

Refs: https://msticnb.readthedocs.io/en/latest/creatingnotebooklets.html

0 replies

FlorianBracq · 2023-01-28T21:15:39Z

FlorianBracq
Jan 28, 2023
Maintainer

First version of the PoC: https://github.com/FlorianBracq/msticpy/tree/process_analysis

Sample usage:

import pandas as pd
from msticpy.analysis.analysers.process import assess_processes

dataframe = pd.DataFrame(
    [
        {
            "process_name": "mimikatz.exe",
            "process_command_line": "--version",
        },
        {
            "process_name": "random.exe",
            "process_command_line": "misc::printnightmare --quiet",
        },
        {
            "process_name": "hh.exe",
            "process_command_line": "--normal_option net user --another_normal_option",
        },
        {
            "process_name": "mimikatz.exe",
            "process_command_line": "--normal_option lsadump:: --another_normal_option",
        },
        {
            "process_name": "calc.exe",
            "process_command_line": "--help",
        },
    ]
)

assess_processes(dataframe)

Example with custom mapping:

dataframe = pd.DataFrame(
    [
        {
            "custom_column1": "mimikatz.exe",
            "custom_column2": "--version",
        },
    ]
)
CUSTOM_MAPPING = {
    "process_name_column": "custom_column1",
    "process_command_line_column": "custom_column2",
}
assess_processes(dataframe, **CUSTOM_MAPPING)

Happy to receive some feedbacks, and there are many ideas already shared that open up a lot more possibilities that I had not though of, surely the end result will look nothing like what I'm sharing here!

Cheers,

3 replies

ianhelle Jan 31, 2023
Maintainer Author

That looks cool - it's very much along the lines that I was thinking. Will take a look at your code

ianhelle Jan 31, 2023
Maintainer Author

Ok that's much more ginormous than I thought it was going to be! But in a good away!
And it's mostly pretty orthogonal and complementary to what I've been thinking about - which is more about to declare and standardize analyzer functions.
I had some ideas about simplifying creating pipelines of pivot functions - would love to combine that with your analyze function. Would be really nice to have a way of persisting pipelines and/or sequential functions as yaml or something and just invoking those by name.

ianhelle Feb 6, 2023
Maintainer Author

A few more thoughts:

Would be nice to have the analytic functions a bit more decoupled from the execution framework.
Perhaps have these loaded dynamically from user-specified folders (and/or following some kind of tagging - "windows", "linux", "cloud", "process", etc.) - so that we don't have to import everything at startup.
We should probably plan to have the analytics in a separate repo (maybe a sub-repo of msticpy and at least installing with msticpy with a pip "extra" flag.
We could also support a mechanism to chain multiple functions in a pipeline (there is an example of running several functions in a single call).
- The pipeline could be based on (or re-use) the MP pivot pipeline -if we persist the pipeline as yaml file (and make it easy to write a yaml pipeline file from scratch.
- We could expose the pipeline as a new analytic function (or perhaps "analytic_agg_function") so it's easy to invoke.

ianhelle · 2023-01-31T21:09:28Z

ianhelle
Jan 31, 2023
Maintainer Author

Thoughts about implementing analyzer functions

(thoughts as code)

from functools import wraps
from typing import List, Optional, Union
import pandas as pd


class MPAnalytic:
    """Decorator for MSTICPy Analytic function."""

    _RTN_HEADER = "\n    Returns"
    _SCH_DOC_STRING = (
        "    schema_map : Optional[Dict[str, str]]\n"
        "        Mapping of input schema to analytic required fields.\n\n"
        "    Returns"
    )

    def __init__(
            self,
            name: Optional[str] = None,
            description: Optional[str] = None,
            entities: Union[str, List[str], None] = None,
            required_fields: Optional[List[str]] = None,
            **kwargs,
        ):
        self.name = name
        self.description = description
        self.entities = entities if isinstance(entities, list) else [entities]
        self.required_fields = required_fields

    def __call__(self, func):
        """Call the decorator to wrap function execution."""
        setattr(func, "mp_analytic", True)
        if self.name is not None:
             self.name = func.__name__

        setattr(func, "properties", self)
        if self._RTN_HEADER in func.__doc__:
            func.__doc__ = func.__doc__.replace(self._RTN_HEADER, self._SCH_DOC_STRING)

        @wraps(func)
        def run_analytic(*args, **kwargs):
            """Extract data and schema_map args and run function."""

            # Get `data` param as arg[0] or `data` kwarg
            if isinstance(args[0], pd.DataFrame):
                input_df = args[0]
                args = args[1:]
            elif "data" in kwargs:
                input_df = kwargs["data"]
            #
            if "schema_map" in kwargs:
                input_df = input_df.rename(columns=kwargs.pop("schema_map"))
            kwargs["data"] = input_df
            return func(*args, **kwargs)

        return run_analytic
    
    def __repr__(self):
        fields = "\n    ".join(f"{name}={value}" for name, value in self.__dict__.items())
        return "Analytic(\n    " + fields + "\n)"


@MPAnalytic(
    name="test analytic",
    description="Testing wrapper",
    entities="Host",
    required_fields=["ProcessName", "ParentProcessName"]
)
def test_analytic(data: pd.DataFrame) -> pd.DataFrame:
    """
    Test analytic

    Parameters
    ----------
    data : pd.DataFrame
        Input data for analytic

    Returns
    -------
    pd.DataFrame
        Analytic results

    """
    column = data.columns[1]
    results = AnalyticResult(data.groupby(column).count().reset_index())
    results.result_properties["severity"] = "high"
    results.result_properties["description"] = "Some badness was found"
    return results


print(test_analytic.properties, "\n\n")
help(test_analytic)

Output from last two statements - analytic.properties contains the instance
of the decorator class created when decorating the analysis functions.

Analytic(
    name=test_analytic
    description=Testing wrapper
    entities=['Host']
    required_fields=['ProcessName', 'ParentProcessName']
) 


Help on function test_analytic in module __main__:

test_analytic(data: pandas.core.frame.DataFrame) -> pandas.core.frame.DataFrame
    Test analytic
    
    Parameters
    ----------
    data : pd.DataFrame
        Input data for analytic
    schema_map : Optional[Dict[str, str]]
        Mapping of input schema to analytic required fields.
...

Sample - more realistic analytic

@MPAnalytic(
    name="Suspicious rundll32 parent",
    entities="Process",
    description="Detects suspicious rundll32.exe parent processes",
    required_fields=["ProcessName", "NewProcessId", "ProcessId", "ParentProcessName", "TimeGenerated"],
)
def suspicious_rundll32_parent(data: pd.DataFrame) -> pd.DataFrame:
    """
    Detects suspicious rundll32.exe parent processes

    Parameters
    ----------
    data : pd.DataFrame
        Input data for analytic

    Returns
    -------
    pd.DataFrame
        Analytic results

    """
    susp_rundll_parent = data[
        ((data["ProcessName"].str.lower() == "rundll32.exe"))
        & (
            data["ParentProcessName"].str.lower().isin(
                [
                    "winword.exe",
                    "excel.exe",
                    "msaccess.exe",
                    "lsass.exe",
                    "taskeng.exe",
                    "winlogon.exe",
                    "schtask.exe",
                    "regsvr32.exe",
                    "wmiprvse.exe",
                    "wsmprovhost.exe",
                ]
            )
        )
    ][["ProcessName", "NewProcessId", "ProcessId", "ParentProcessName", "TimeGenerated"]]

    results = AnalyticResult(susp_rundll_parent)
    if susp_rundll_parent.empty:
        results.result_properties["severity"] = "information"
        results.result_properties["description"] = "No suspicious processes found."
    else:
        results.result_properties["severity"] = "high"
        results.result_properties["description"] = (
            f"{len(susp_rundll_parent)} rundll processes found with suspect parents."
        )
    return results

2 replies

ianhelle Jan 31, 2023
Maintainer Author

PoC of how we'd read in functions from modules/paths

import importlib
import inspect
from pathlib import Path
from typing import Callable

from msticpy.common.data_types import ObjectContainer


class Analytics:
    def __init__(self):
        self._analytic_funcs = {}
        self.entity_analytics = {}

    def load_analytic(self, module_path: str):
        """Load an analytic function."""
        mod_name = ".".join([*(module_path.parent.parts), module_path.stem])
        try:
            module = importlib.import_module(mod_name)
        except ImportError as imp_error:
            raise ValueError(f"Unable to import module {mod_name}") from imp_error
        for func_name, func in inspect.getmembers(module, predicate=inspect.isfunction):
            if not hasattr(func, "mp_analytic"):
                continue
            self._analytic_funcs[func_name] = func
            if func.properties.entities:
                for entity in func.mp_analytic.entities:
                    if not hasattr(self, entity):
                        self.entity_analytics[entity] = []
                    self.entity_analytics[entity].append(func_name)

    def list_analytics(self):
        """List available analytics."""
        return list(self._analytic_funcs.keys())

    def run_analytic(self, func_name: str, **kwargs):
        """Run an analytic."""
        if func_name not in self._analytic_funcs:
            raise ValueError(f"Analytic {func_name} not found")
        return self._analytic_funcs[func_name](**kwargs)
    
    def _add_entity_functions(self, entity: str, func: Callable):
        """Add entity functions to class."""
        for entity in func.properties.entities:
            entity_attr = getattr(self, entity, None)
            if not entity_attr:
                setattr(self, entity, ObjectContainer())
            setattr(entity_attr, func.__name__, func)


analytics = Analytics()

path = "e:\\src\\msticpy\\msticpy\\analytics"
user_paths = ["e:\\src\\notebooks\\experimental"]

all_paths = [path] + user_paths

for path in all_paths:
    for module_path in Path(path).rglob("*.py"):
        analytics.load_analytic(module_path)

ianhelle Feb 6, 2023
Maintainer Author

One easy improvement to the function wrapper is for ito check in the incoming dataframe and thow and error (and some friendly help) if one or more required columns are missing. We could have the author name and describe the columns in the decorator (e.g. "cmdline": "Process command line") to make which columns are required and what a mapping would look like clear to the user.

Analysis functions for MSTICPy #612

Uh oh!

Uh oh!

ianhelle Jan 26, 2023 Maintainer

Some discussion points

Analytics and Feb 2023 Hack Month

Replies: 3 comments · 5 replies

Uh oh!

ianhelle Jan 26, 2023 Maintainer Author

Implementation notes

Uh oh!

FlorianBracq Jan 28, 2023 Maintainer

Uh oh!

ianhelle Jan 31, 2023 Maintainer Author

Uh oh!

ianhelle Jan 31, 2023 Maintainer Author

Uh oh!

ianhelle Feb 6, 2023 Maintainer Author

Uh oh!

Uh oh!

ianhelle Jan 31, 2023 Maintainer Author

Thoughts about implementing analyzer functions

Uh oh!

ianhelle Jan 31, 2023 Maintainer Author

Uh oh!

ianhelle Feb 6, 2023 Maintainer Author

ianhelle
Jan 26, 2023
Maintainer

Replies: 3 comments 5 replies

ianhelle
Jan 26, 2023
Maintainer Author

FlorianBracq
Jan 28, 2023
Maintainer

ianhelle Jan 31, 2023
Maintainer Author

ianhelle Jan 31, 2023
Maintainer Author

ianhelle Feb 6, 2023
Maintainer Author

ianhelle
Jan 31, 2023
Maintainer Author

ianhelle Jan 31, 2023
Maintainer Author

ianhelle Feb 6, 2023
Maintainer Author