Skip to content

Plugin Development Guide

matamorphosis edited this page Apr 2, 2021 · 24 revisions

For Developers Only

The goal of Scrummage is to be a framework in which users can develop plugins and contribute to in a communal fashion. It is evident that the list of plugins you can add to such a framework is endless. Scrummage attempts to narrow it down in a few ways, such as only picking one kind of plugin (i.e. The exploit search plugin - Vulners Search), while there are several exploit databases, Scrummage uses only one to perform it's OSINT searches.

If users of Scrummage are dissatisfied with the included plugins they should be able to fairly easily create a plugin for themselves to use as well as others in the community. Developers are free to develop plugins in their own forked repository and request to have it merged with the master branch subject to a revision process, before approving it and adding it to the list of plugins. The revision process ensure any newly developed plugin follows SSSC - Security, Simplification, Standardisation, and Centralisation. Most of these are achieved by leveraging the surrounding Scrummage framework in the plugin.

This wiki page documents the available, functions and classes and well as their default parameters, in the General.py and Common.py libraries as well as a breakdown of a standard plugin. We realise not all plugins fit into a standard set of requirements, hence we are expanding capability with time, but only as necessary.

The Libraries

Both General.py and Common.py are a collection of classes and functions for broad use. The reason there are two files is due to increasing concerns in functions and classes used by the libraries in the plugins/common directory that existed before the creation of Common.py. For example, the function used to set the date needs to be accessed by all plugins, libraries, as well as the core Scrummage.py file. And it is not a good idea for two libraries to be co-dependant. Libraries can be dependent on other libraries, but if it goes both ways there is the potential for infinite loops. Therefore, the Common.py file has function that are used in both the General.py and the Connectors.py libraries, as the General.py library has dependencies on both Connectors.py and Common.py, it's functions are kept separate so they can safely access both. The Connectors.py library only has a dependency on Common.py, meaning there is a finite end to library imports. This wiki page doesn't cover the functions and classes in the Connectors.py file because none of these functions are called within any plugins directly. Outputs are fed into the General.py library, which in a controlled manner, outputs the data via the Connectors.py library. This will be discussed more in detail below.


General.py

[CLASS] Screenshot()
This class is only called by the General.py Output class and the main Scrummage.py file, so it will not be covered in this document as it doesn't impact plugin development.

Get_Limit(kwargs)
This function receives the kwargs argument fed into the plugin, which is mainly used to parse the limit for a created task. This function checks if a limit has been provided and is in the correct format. If any of these conditions is not met, it reverts to default limit of 10, and returns that value. Otherwise, it returns the limit in the filtered format, required by the plugin.

Logging(Directory, Plugin_Name)
Unfortunately, logging has to be done in the plugin file itself, otherwise, the log file would reflect actions in the General.py library. This function receives a given directory, and the plugin's name. It uses this to construct the name of a log file in a location specific to the plugin. This file name is returned and used as the location to log events from the plugin.

[CLASS] Cache(Directory, Plugin_Name)
Cache files are used to mitigate the risk of plugins overwriting output files, and attempting to add items to the database that already exist. Similarly to the Logging() function described above, this class has an init function that constructs a file in the same directory, based on the required parameters; however, it is a text file for caching, and not a log file. After this the Get_Cache() function can be called to receive Cached Data, and the Write_Cache(Current_Cached_Data, Data_to_Cache) function can be called to update the cached data, or create if no data currently exists. With the required inputs.

Convert_to_List(String) This function simply converts a string with Comma-Separated Values (CSV) to a list format. The primary use of this is to split a task's query into a list of queries, if the query has multiple items in in. For example, a query for the Twitter Search containing "JoeBiden, BarackObama".

Main_File_Create(Directory, Plugin_Name, Output, Query, Main_File_Extension) This function is responsible for creating the main file for a plugin, the main file usually represents the first data retrieved from the 3rd part site that the plugin leverages. For example, in Twitter Search, this file is a JSON file that is returned as a result of searching Twitter for the given query. The Main file doesn't always exist in plugins, but does in most.

Create_Query_Results_Output_File(Directory, Query, Plugin_Name, Output_Data, Query_Result_Name, The_File_Extension) This function is responsible for creating files for each result. For example if we follow the Twitter example, let's say we search for "JoeBiden", with a provided limit of 5. The main file will be the returned JSON data with the last 5 tweets from the account @JoeBiden. The plugin then iterates through the results and makes an HTTP request (using the Request_Handler() function from the Common.py library) for each tweet's link. The returns HTML data is then stored in a query file. As part of this process HTML filtering is leveraged for the best results, which is explained more in depth on the wiki page here.

Data_Type_Discovery(Data_to_Search) This function is quite niche, and is currently used by only one plugin. But essentially it's for any plugin that works by scraping data. This is the process of obtaining data, and iterating through it to understand what is there. The Data_Type_Discovery() function returns a list of discovered content, which can include:

  • Hashes (MD5, SHA1, and SHA256)
  • Credentials
  • Email Addresses
  • URLs This function ultimately helps you better understand data.

Make_Directory(Plugin_Name) This function is imperative to all plugins, as it creates the directory all plugin-specific data is stored in. For any new plugin it will create the following directory structure in the <SCRUMMAGE_DIR>/lib/static/protected/output directory:

  • {Plugin_Name}/{Year}/{Month}/{Day}

For example, running Twitter Search on the 01/01/2021, will firstly create if it doesn't already exist, and return "twitter/2021/01/01"

Get_Title(URL, Requests=False) This function is helpful for when you have a link representing each result returned in a plugin. Let's say you have the 5 latest tweets from the Twitter account @JoeBiden, and when creating each result, we want the title from each link. While some API's will return this in the original data, most won't so that's where this function comes into play. This function will send an HTTP request to the desired link, and returned the title of it using the BeautifulSoup web scraping library. The option Requests, when set to True will leverage the Requests_Handler() function from the Common.py library, but sometimes it is preferable to use the urllib library, rather than the requests library leveraged by Requests_Handler(). There is no correct answer, as results vary on a case-by-case basis.
Note: If you have the choice, you should always use the option with the least load, if you are able to get the title via the initial API request, that would be the preferred option over this function.

JSONDict_to_HTML(JSON_Data, JSON_Data_Output, Title) Note: JSON_Data is the data used to make the conversion, JSON_Data_Output is the data that is being output to a file, this is placed into a raw data text area in the created HTML file. In rare cases, your plugin will only be able to retrieve JSON data. This might be because you're calling an API that has no website for the same data. This option is provided to convert input JSON data to a more visually pleasing HTML report. For this to work you need to provide a JSON payload that starts with a list, then a dictionary, following by attributes. Similar to as follows:

[
  {
    "key1", "value1",
    "key2", "value2"
  }
]

This still doesn't really answer the question when to use this. Thus, I will refer to current examples. When Not To: Plugins like Twitter search, first create a JSON file as the main file, and an HTML file for each result (Query file). As there are already HTML files being produced for the result, there isn't much need for this. While it wouldn't be a problem to use this, it would just be unnecessary. When To Plugins like IPStack Search, query data for an IP address and receive JSON data. But this JSON data is the full result for the task, no further action is required. There is also no simple way to query the web for this data in an HTML format, so we are stuck with just the JSON data. We would then use this function to create an HTML version of this data for improved reporting.

CSV_to_HTML(CSV_Data, Title) Same concept as the above function but for CSV data. The raw data is not included in the created HTML report, so it does not need to be provided. The only plugin that currently uses this is Domain Fuzzer.

CSV_to_JSON(Query, CSV_Data) Again, currently only used by the Domain Fuzzer, but this should be used when your only true data is in a CSV format, as JSON is more versatile.

[CLASS] Connections(Input, Plugin_Name, Domain, Result_Type, Task_ID, Concat_Plugin_Name) This class is responsible for outputting the final data to the configured formats, such as the main DB, CSV and DOCX reports, and other configured systems like Elasticsearch, JIRA, RTIR, etc. The initialisation of this class creates a set of variable that represent the data as it is outputted. This includes the Input (or Query) provided by the task, plugin name, the domain of the third party site, the type of result, task id (provided by the task), and the concatenated plugin name (Twitter would just be twitter, but something like NZ_Business_Search, would have a secondary plugin name for this called "nzbusiness"). The type of result has to fit into a pre-defined list, that can be found towards the top of the main Scrummage.py file. They are listed below for convenience:

Finding_Types = ["Darkweb Link", "Company Details", "Blockchain - Address", "Blockchain - Transaction",
                         "BSB Details", "Certificate", "Search Result", "Credentials", "Domain Information",
                         "Social Media - Media", "Social Media - Page", "Social Media - Person", "Social Media - Group",
                         "Social Media - Place", "Application", "Account", "Account Source", "Publication", "Phishing",
                         "Forum", "News Report", "Torrent", "Vehicle Details", "Domain Spoof", "Exploit",
                         "Economic Details", "Virus", "Virus Report", "Web Application Architecture", "IP Address Information"]

If you require this list to be extended a separate request would need to be made to the Scrummage team, otherwise altering this can cause issues for the Scrummage Dashboard.
Once initialised, the Output(self, Complete_File_List, Link, DB_Title, Directory_Plugin_Name, **kwargs) function can be called.

  • Complete_File_List: A list of the location of all output files. So the value will mostly look like [Main_File, Output_File], with as many output file names as you like. (The actual file data is not stored in the database).
  • Link: The link for the individual result
  • DB_Title: Don't be thrown off by the DB part of the name, this is just the Title of your result.
  • Directory_Plugin_Name: Just the Plugin Name, or Concat_Plugin_Name is there are both.
  • **kwargs: This option allows for any other arguments to be parsed. The only one that is accepted is Dump_Types, if your plugin uses the Data_Type_Discovery() function, listed above.

Common.py

Set_Configuration_File() This function, returns the absolute path of the config.json file, used to access API secrets, as well as other configuration information.

Date(Additional_Last_Days=0, Date_Only=False, Elastic=False, Full_Timestamp=False) By default this function returns the current date and time in the format (YYYY-MM-DD H:M:S), which is used mostly for logging.

  • Additional_Last_Days: Used to return a list of Dates, starting from the current date and working back the amount of dates specified in this parameter. For example, if it is set to 5, the function would return the dates of the last 5 days. This is mainly used by the Dashboard to get records from the last 5 days to show successful and unsuccessful logins. So not very relevant to plugin development.
  • Date_Only: As the name suggests only returns the date and not the time.
  • Elastic: Returns the timestamp in the format for the Elasticsearch output option.
  • Full_Timestamp: Returns the raw, unformatted, current timestamp.

[CLASS] JSON_Handler() This class removes the need for plugins and libraries to each import and manage the json module. Additionally, this help with standardisation as the class has defaults that reflect Scrummage standards.

  • [Inner Function] init(raw_data): The initialisation function sets the input value as the objects core value.
  • [Inner Function] Is_JSON(): Returns true if the core value is valid JSON.
  • [Inner Function] To_JSON_Load(): Loads JSON to a Python Dict using the .load method.
  • [Inner Function] To_JSON_Loads(): Loads JSON to a Python Dict using the .loads method.
  • [Inner Function] Dump_JSON(Indentation=2, Sort=True): Uses the .dumps method used for outputting data in a JSON format. By default it beautifies the JSON with an indentation of two, and sorts keys in alphabetical and numerical order. (Indentation set to 0, will result in no indentation at all)

**Request_Handler(URL, Method="GET", User_Agent=True, Application_JSON_CT=False, Accept_XML=False, Accept_Language_EN_US=False, Filter=False, Risky_Plugin=False, Full_Response=False, Host="", Data={}, kwargs) This function removes the need for plugins and libraries to each import and manage the requests module. Additionally, this help with standardisation as the class has defaults that reflect Scrummage standards.

  • URL: This is a string with the URL to send the request to.
  • Method: Default Method is GET, but also supports POST. (Other methods can be added as required, with verification of the Scrummage team)
  • User_Agent: Default is True, which means Scrummage sets the User_Agent header to the latest Firefox User_Agent, this helps make the requests appear to be normal.
  • Application_JSON_CT: When True, sets a Content-Type header with a value of "application/json"
  • Accept_XML: When True, sets an Accept header with a value of "ext/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"
  • Accept_Language_EN_US: When True, sets and Accept-Language header with a value of "en-US,en;q=0.5"
  • Filter: When True, and must be used in conjunction with a valid value provided to the Host parameter, this calls the response filter function mentioned below.
  • Host: Only set this when using the Filter parameter.
  • Risky_Plugin: When True, this indicates that data returned in the response can contain malicious JavaScript code. Also to only be used in conjunction when the Filter parameter is set to True.
  • Full_Response: When True returns the full response, as by default this function normally only returns the response data.
  • Data: Optional field, to provide data to the HTTP request.
  • **kwargs: Only supports two options:
    • One is called Optional_Headers, which allows the user to set custom headers. If the headers conflict with defaults, the custom headers will override the defaults.
    • The other is called Scrape_Regex_URL, which is used to Scrape URLs from the response data and returns them.

Response_Filter(Response, Host, Risky_Plugin=False) This function goes through the Response data and converts any relative links to absolute links using the Host parameter's value. If Risky_Plugin is set to True, depending on the security settings you have configured in config.json for web scraping (refer to the guide here), this may prevent the function from doing this, incase the data can be potentially malicious.

Load_Web_Scrape_Risk_Configuration() This function loads web scraping configuration settings used by the Response_Filter() function above.

Regex_Handler(Query, Type="", Custom_Regex="", Findall=False, Get_URL_Components=False) This function performs regular expressions against a given Query. Type can be used to select a pre-defined regex pattern. Otherwise Custom_Regex can be used to supply your own. Findall, when set to True, returns a list of matches, vs the default search, that finds the first match. Get_URL_Components can only be used when Type is used and set to "URL". This breaks any discovered URLs into three components (Prefix, Body, and Extension) which can be used to extract domains from URLs, and much more.


Breakdown of a Standard Plugin

Almost all plugins follow a standard procedure which performs the following:

  1. Set a few global variables including the name of the Plugin and the name of the file extension you want to export the results to. E.g. .html:
Plugin_Name = "Rand_Plugin"
The_File_Extension = ".html"
  1. Implement a Limit function which allows users to specify the number of results, this will be continued later on. This function is centralised in the General library. This step can be skipped if and only if you are sure that there will only be one result when the plugin is run for each query made. If this is the case you will also need to add the plugin name to the "Plugins_without_Limit" list towards the top of the main.py file.
Limit = General.Get_Limit(kwargs)
  1. Recursively creates a directory structure to store results. Code example:
Directory = General.Make_Directory(Plugin_Name.lower())
  1. Creates a log file and sets the logging level. Code example:
logger = logging.getLogger()
logger.setLevel(logging.INFO)
Log_File = General.Logging(Directory, Plugin_Name.lower())
handler = logging.FileHandler(os.path.join(Directory, Log_File), "w")
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
  1. Attempts to import cache from previous runs. Code example:
Cached_Data = General.Get_Cache(Directory, Plugin_Name)
  1. Converts the imported Query string, set by the web application into a list, if the string contains commas it will split the string into multiple list items split by either a comma followed by a space ", " or just a comma ",".
Query_List = General.Convert_to_List(Query_List)
  1. At this point, you are ready to begin the fun. Depending on if your plugin uses an API or not, you may be required to add a Load_Configuration option to import the details from the config.json file. Please refer to other plugins for example. After developing the plugin there will be additional steps required if this is the case. Please configure this function exactly the same as other plugins. This will be reviewed and corrected if submitted incorrectly. From here use the details, if required, to perform the necessary search against the desired target, and from the result obtain a unique URL for the result, even if it means you have to craft it from something else, as well as a unique identifier such as a title. If the request is made via POST, it is acceptable to create a bogus URL to get around the unique link constraint; however, at the very least the bogus URL should contain the domain. Something such as: https://www.domain.com?UID.
    Please refer to other plugins regarding any confusion regarding the above.
  2. If a Limit has been implemented Current_Step variables will need to be implemented to help count how many requests are being made; furthermore, a for loop should be used to iterate through results; furthermore, the for loop should verify whether the Current_Step is less than the Limit. If only one result is generated the for loop and limit parts can be omitted.
  3. Each result link should be requested and the response stored in an output file using the requests library.
Response = requests.get(URL).text
  1. The response should be outputted to a local file to create a local copy of the link. This function will return the output file, which will be used later on.
Output_file = General.Create_Query_Results_Output_File(Directory, Query, Plugin_Name, Response, ID, The_File_Extension)
  1. If the Output_file is set, then the "General.Connections" class should be initialised and called as per below:
Output_Connections = General.Connections(Query, Plugin_Name, "domain.com", "Exploit Kind", Task_ID, Plugin_Name.lower())
Output_Connections.Output([Main_File, Output_file], URL, Title, Plugin_Name.lower())

Please use one of the exploit kinds from the approved list, that can be viewed in main.py. 12. Finally, if the Limit is implemented increase Current_Step by 1, also append the link to the Data_to_Cache list regardless of the limit.

Data_to_Cache.append(URL)
Current_Step += 1

Clone this wiki locally