Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions USING_NEO4J.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Experimenting with the Neo4j graph database Python STIX DataStore

The Neo4j graph database Python STIX DataStore is a proof-of-concept implementation to show how to store STIX content in a graph database.
The Neo4j graph database Python STIX DataStore is a proof-of-concept implementation to show how to store STIX content in a graph database.

## Limitations:

As a proof-of-concept it has minimal functionality.
As a proof-of-concept it has minimal functionality.

## Installing Neo4j

See https://neo4j.com/docs/desktop-manual/current/installation
Expand All @@ -18,18 +18,18 @@ The python neo4j library used is py2neo, available in pypi at https://pypi.org/p

## Implementation Details

We would like to that the folks at JHU/APL for their implementation of [STIX2NEO4J.py](https://github.com/opencybersecurityalliance/oca-iob/tree/main/STIX2NEO4J%20Converter), which this code is based on.
We would like to that the folks at JHU/APL for their implementation of [STIX2NEO4J.py](https://github.com/opencybersecurityalliance/oca-iob/tree/main/STIX2NEO4J%20Converter), which this code is based on.

Only the DataSink (for storing STIX data) part of the DataStore object has been implemented. The DataSource part is implemented as a stub. However, the graph database can be queried using the neo4j cypher langauge within
the neo4j browser.

The main concept behind any graphs is nodes and edges. STIX data is similar as it contains relationship objects (SROs) and node objects (SDOs, SCOs and SMOs). Additional edges are provided by STIX embedded relationships, which are expressed as properties in STIX node objects. This organization of data in STIX is a natural fit for graph models, such as neo4j.
The main concept behind any graphs is nodes and edges. STIX data is similar as it contains relationship objects (SROs) and node objects (SDOs, SCOs and SMOs). Additional edges are provided by STIX embedded relationships, which are expressed as properties in STIX node objects. This organization of data in STIX is a natural fit for graph models, such as neo4j.

The order in which STIX objects are added to the graph database is arbitrary. Therefore, when an SRO or embedded relationship is added via the DataStore, the nodes that it connects may not be present in the database, so the relationship is not added to the database, but remembered by the DataStore code as an unconnected relationship. Whenever a new node is
added to the database, the unconnected relationships must be reviewed to determine if both nodes of a relationship can now be represented using an edge in the graph database.
The order in which STIX objects are added to the graph database is arbitrary. Therefore, when an SRO or embedded relationship is added via the DataStore, the nodes that it connects may not be present in the database, so the relationship is not added to the database, but remembered by the DataStore code as an unconnected relationship. Whenever a new node is
added to the database, the unconnected relationships must be reviewed to determine if both nodes of a relationship can now be represented using an edge in the graph database.

Note that unless both the source and target nodes are eventually added,
the relationship will not be added either.
Note that unless both the source and target nodes are eventually added,
the relationship will not be added either.
How to address this issue in the implementation has not been determined.

## Demonstrating a neo4j database for STIX
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
# Code developed by JHU/APL - First Draft December 2021

# DISCLAIMER
# The script developed by JHU/APL for the demonstration are not “turn key” and are
# The script developed by JHU/APL for the demonstration are not “turn key” and are
# not safe for deployment without being tailored to production infrastructure. These
# files are not being delivered as software and are not appropriate for direct use on any
# production networks. JHU/APL assumes no liability for the direct use of these files and
# they are provided strictly as a reference implementation.
# they are provided strictly as a reference implementation.
#
# NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED “AS IS.” JHU/APL MAKES NO
# REPRESENTATION OR WARRANTY WITH RESPECT TO THE PERFORMANCE OF THE MATERIALS, INCLUDING
Expand All @@ -20,11 +20,12 @@
# CONSEQUENTIAL, SPECIAL OR OTHER DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE,
# THE MATERIAL, INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS.

from getpass import getpass
## Import python modules for this script
import json
from typing import List

from py2neo import Graph, Node
from getpass import getpass
from tqdm import tqdm

#Import variables
Expand All @@ -44,10 +45,12 @@ def __init__(self):
self.nodes_with_object_ref = list()
self.nodes = list()
self.bundlename = BundleName
self.infer_relation = {"parent_ref": "parent_of",
self.infer_relation = {
"parent_ref": "parent_of",
"created_by_ref": "created_by",
"src_ref": "source_of",
"dst_ref": "destination_of"}
"dst_ref": "destination_of",
}
self.__load_json(JSONFILE)

def __load_json(self, fd):
Expand Down Expand Up @@ -85,16 +88,18 @@ def make_nodes(self):
node_contents[key] = apobj[key]
# Make the Bundle ID a property
# use dictionary expansion as keywork for optional node properties
node = Node(apobj["type"],
name=node_name,
bundlesource=self.bundlename,
**node_contents)
node = Node(
apobj["type"],
name=node_name,
bundlesource=self.bundlename,
**node_contents,
)
# if node needs new created_by relation, create the node and then the relationship
self.sgraph.create(node)
# save off these nodes for additional relationship creating
if 'object_refs' in keys:
self.nodes_with_object_ref.append(apobj)

# create relationships that exist outside of relationship objects
# such as Created_by and Parent_Of
def __make_inferred_relations(self):
Expand All @@ -112,7 +117,7 @@ def __make_inferred_relations(self):
else:
ref_list = apobj[k]
for ref in ref_list:
# The "b to a" relationship is reversed in this cypher query to ensure the correct relationship direction in the graph
# The "b to a" relationship is reversed in this cypher query to ensure the correct relationship direction in the graph
cypher_string = f'MATCH (a),(b) WHERE a.bundlesource="{self.bundlename}" AND b.bundlesource="{self.bundlename}" AND a.ap_id="{str(ref)}" AND b.ap_id="{str(apobj["id"])}" CREATE (b)-[r:{rel_type}]->(a) RETURN a,b'
try:
self.sgraph.run(cypher_string)
Expand Down
13 changes: 7 additions & 6 deletions stix2/datastore/neo4j/demo.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@

import sys
import json
import sys

from identity_contact_information import \
identity_contact_information # noqa F401
# needed so the relational db code knows to create tables for this
from incident import event, impact, incident, task # noqa F401
from observed_string import observed_string # noqa F401

import stix2
from stix2.datastore.neo4j.neo4j import Neo4jStore
import stix2.properties

# needed so the relational db code knows to create tables for this
from incident import incident, event, task, impact
from identity_contact_information import identity_contact_information
from observed_string import observed_string


def main():
with open(sys.argv[1], "r") as f:
Expand Down
86 changes: 49 additions & 37 deletions stix2/datastore/neo4j/neo4j.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
import json
import re

from py2neo import Graph, Node, Relationship
import re

import stix2
from stix2.base import _STIXBase
from stix2.datastore import (
DataSink, DataSource, DataStoreMixin,
)
from stix2.datastore import DataSink, DataSource, DataStoreMixin
from stix2.parsing import parse


def convert_camel_case_to_snake_case(name):
return re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()


def remove_sro_from_list(sro, sro_list):
for rel in sro_list:
if (rel["source_ref"] == sro["source_ref"] and
rel["target_ref"] == sro["target_ref"] and
rel["relationship_type"] == sro["relationship_type"]):
if (
rel["source_ref"] == sro["source_ref"] and
rel["target_ref"] == sro["target_ref"] and
rel["relationship_type"] == sro["relationship_type"]
):
sro_list.remove(rel)
break
return sro_list
Expand All @@ -29,6 +30,7 @@ def hash_dict_as_string(hash_dict):
hashes.append(f'{hash_type}:{hash}')
return ",".join(hashes)


def _add(sink, stix_data, allow_custom=True, version="2.1"):
"""Add STIX objects to MemoryStore/Sink.

Expand Down Expand Up @@ -73,23 +75,25 @@ class Neo4jStore(DataStoreMixin):

default_neo4j_connection = "bolt://neo4j:password@localhost:7687"

def __init__(self, host=default_host, username=default_username, password=default_password, allow_custom=True, version=None,
clear_database=True):
def __init__(
self, host=default_host, username=default_username, password=default_password, allow_custom=True, version=None,
clear_database=True,
):
self.sgraph = Graph(host=host, auth=(username, password))
super().__init__(
source = Neo4jSource(
source=Neo4jSource(
sgraph=self.sgraph,
allow_custom=allow_custom,

),
sink = Neo4jSink(
sink=Neo4jSink(
sgraph=self.sgraph,
allow_custom=allow_custom,
version=version,
clear_database=clear_database,


)
),
)


Expand Down Expand Up @@ -119,7 +123,7 @@ def __init__(self, sgraph, allow_custom=True, version=None, clear_database=False
self.relationships_to_recheck = list()
self.sub_object_relationships = list()
self.counter = 1
self.allow_custom=allow_custom
self.allow_custom = allow_custom
if clear_database:
self.sgraph.delete_all()

Expand Down Expand Up @@ -175,10 +179,12 @@ def _insert_sdo_sco_smo(self, obj, type_name):
self.sub_object_relationships.append((key, obj[key]))
# Make the Bundle ID a property
# use dictionary expansion as keyword for optional node properties
node = Node(type_name,
name=node_name,
# bundlesource=self.bundlename,
**node_contents)
node = Node(
type_name,
name=node_name,
# bundlesource=self.bundlename,
**node_contents,
)
# if node needs new created_by relation, create the node and then the relationship
self.sgraph.create(node)
# check to see if the addition of this node makes it possible to create a relationship
Expand Down Expand Up @@ -206,10 +212,12 @@ def _insert_sub_object(self, sub_prop, sub_obj, parent_node):
node_contents[key] = value
else:
self.sub_object_relationships.append((key, value))
node = Node(sub_prop,
name=sub_prop + "_" + self.next_id(),
# bundlesource=self.bundlename,
**node_contents)
node = Node(
sub_prop,
name=sub_prop + "_" + self.next_id(),
# bundlesource=self.bundlename,
**node_contents,
)
self.sgraph.create(node)
relationship = Relationship(parent_node, sub_prop, node)
self.sgraph.create(relationship)
Expand All @@ -230,10 +238,12 @@ def _insert_external_references(self, refs, parent_node):
node_contents[key] = value
else:
self.sub_object_relationships.append((key, value))
node = Node("external_reference",
name="external_reference" + "_" + self.next_id(),
# bundlesource=self.bundlename,
**node_contents)
node = Node(
"external_reference",
name="external_reference" + "_" + self.next_id(),
# bundlesource=self.bundlename,
**node_contents,
)
relationship = Relationship(parent_node, "external_reference", node)
self.sgraph.create(relationship)

Expand All @@ -254,15 +264,17 @@ def _insert_extensions(self, extensions, parent_node):
node_contents[key] = hash_dict_as_string(value)
else:
node_contents[key] = value
node = Node(type_name,
name=type_name + "_" + self.next_id(),
# bundlesource=self.bundlename,
**node_contents)
node = Node(
type_name,
name=type_name + "_" + self.next_id(),
# bundlesource=self.bundlename,
**node_contents,
)
relationship = Relationship(parent_node, type_name, node)
self.sgraph.create(relationship)
self._insert_embedded_relationships(ext, parent_node["id"])

def _is_node_available(self, id,):
def _is_node_available(self, id):
cypher_string = f'OPTIONAL MATCH (a) WHERE a.id="{str(id)}" UNWIND [a] AS list_rows RETURN list_rows'
cursor = self.sgraph.run(cypher_string).data()
return cursor[0]["list_rows"]
Expand Down Expand Up @@ -290,7 +302,7 @@ def _insert_embedded_relationships(self, obj, id, recheck=False):
k_tokens = k.split("_")
# find refs, but ignore external_references since they aren't objects
if "ref" in k_tokens[len(k_tokens) - 1] and k_tokens[len(k_tokens) - 1] != "references":
rel_type = "_".join(k_tokens[: -1])
rel_type = "_".join(k_tokens[: -1]) # noqa F841
ref_list = []
# refs are lists, push singular ref into list to make it iterable for loop
if not type(obj[k]).__name__ == "list":
Expand All @@ -307,9 +319,9 @@ def _insert_embedded_relationships(self, obj, id, recheck=False):
remove_sro_from_list(obj, self.relationships_to_recheck)
else:
if not recheck:
embedded_relationship = {"source_ref": id,
"target_ref": ref,
"relationship_type": k}
embedded_relationship = {
"source_ref": id,
"target_ref": ref,
"relationship_type": k,
}
self.relationships_to_recheck.append(embedded_relationship)


1 change: 0 additions & 1 deletion stix2/datastore/neo4j/neo4j_testing.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import datetime as dt
import os # noqa: F401


import pytz

import stix2
Expand Down
2 changes: 1 addition & 1 deletion stix2/datastore/relational_db/input_creation.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def generate_insert_information(self, dictionary_name, stix_object, **kwargs):
table_child = data_sink.tables_dictionary[
canonicalize_table_name(table_name + "_" + dictionary_name + "_" + "values", schema_name)
]
child_table_inserts = generate_insert_for_dictionary_list(table_child, next_id, value, data_sink, contained_type)
child_table_inserts.extend(generate_insert_for_dictionary_list(table_child, next_id, value, data_sink, contained_type))
value = next_id
stix_type = IntegerProperty()
else:
Expand Down
47 changes: 45 additions & 2 deletions stix2/datastore/relational_db/relational_db_testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,11 +288,52 @@ def test_dictionary():
)


multipart_email_msg_dict = {
"type": "email-message",
"spec_version": "2.1",
"id": "email-message--ef9b4b7f-14c8-5955-8065-020e0316b559",
"is_multipart": True,
"received_lines": [
"from mail.example.com ([198.51.100.3]) by smtp.gmail.com with ESMTPSA id \
q23sm23309939wme.17.2016.07.19.07.20.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 \
bits=128/128); Tue, 19 Jul 2016 07:20:40 -0700 (PDT)",
],
"content_type": "multipart/mixed",
"date": "2016-06-19T14:20:40.000Z",
"from_ref": "email-addr--89f52ea8-d6ef-51e9-8fce-6a29236436ed",
"to_refs": ["email-addr--d1b3bf0c-f02a-51a1-8102-11aba7959868"],
"cc_refs": ["email-addr--e4ee5301-b52d-59cd-a8fa-8036738c7194"],
"subject": "Check out this picture of a cat!",
"additional_header_fields": {
"Content-Disposition": ["inline"],
"X-Mailer": ["Mutt/1.5.23"],
"X-Originating-IP": ["198.51.100.3"],
},
"body_multipart": [
{
"content_type": "text/plain; charset=utf-8",
"content_disposition": "inline",
"body": "Cats are funny!",
},
{
"content_type": "image/png",
"content_disposition": "attachment; filename=\"tabby.png\"",
"body_raw_ref": "artifact--4cce66f8-6eaa-53cb-85d5-3a85fca3a6c5",
},
{
"content_type": "application/zip",
"content_disposition": "attachment; filename=\"tabby_pics.zip\"",
"body_raw_ref": "file--6ce09d9c-0ad3-5ebf-900c-e3cb288955b5",
},
],
}


def main():
store = RelationalDBStore(
MariaDBBackend("mariadb+pymysql://admin:[email protected]:3306/rdb", force_recreate=True),
# MariaDBBackend("mariadb+pymysql://admin:[email protected]:3306/rdb", force_recreate=True),
# PostgresBackend("postgresql://localhost/stix-data-sink", force_recreate=True),
# SQLiteBackend("sqlite:///stix-data-sink.db", force_recreate=True),
SQLiteBackend("sqlite:///stix-data-sink.db", force_recreate=True),

True,
None,
Expand Down Expand Up @@ -340,6 +381,8 @@ def main():
malware = malware_with_all_required_properties()
store.add(malware)

store.add(stix2.parse(multipart_email_msg_dict))

# read_obj = store.get(directory_stix_object.id)
# print(read_obj)
else:
Expand Down
Loading