Skip to content

Commit 4e13b8e

Browse files
peb-pebBreadGenie
andauthored
Helper Script - extraction and regex finding process (#1182)
* added vedor-product pair finding process * updated for black check * updated changes requested by @anthonyharrison * now opening databse with db_open() in CVEDB instead of sqlite3 * done extracting and regex finding process * updated for isort and black * updated changes suggested by @anthonyharrison * resolved case when product_name='tomcat10' * added tests for parse_filename() * now parse_filename() also checks for directory names and throws error for Unknown Archive Type * added tests for directory names * fix for black and isort * done extracting and regex finding process * updated for isort and black * updated changes suggested by @anthonyharrison * Add logrotate Checker (#1184) * added search_version_pattern() to identify version strings from all matched strings * now it only add those files to filename_patterns, which contain version strings in them * added new version_string regex * feat: Added a nice output for Helper-Script * refactor: updated regex-pattern updated regex-pattern as suggested by @terriko * refactor: changed the line comment for HelperScript * docs: added docs for Helper-Script * docs: made the docs more formal * fix: output of CONTAIN_PATTERNS now "CONTAIN_PATTERNS" only contains strings with length > 40 char and the result is now sorted * feat: added product name as an argument now you could also provide with product name as an argumant as the last argument. This is to handle cases when there is no return in output due to getting wrong product name from parse_filename() * handled corner case for product_name * feat: added more comments on FILENAME_PATTERNS * refactor: now no longer needed to provide vendor_name Also, added comments, a lot of comments explaing the codebase * fix: black and isort Co-authored-by: Bread Genie <[email protected]>
1 parent e836058 commit 4e13b8e

File tree

2 files changed

+335
-49
lines changed

2 files changed

+335
-49
lines changed

cve_bin_tool/checkers/README.md

Lines changed: 59 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -277,15 +277,67 @@ that include this product. For our example all listings except
277277
`libexpat, expat` mearly include the target product (`expat` for the
278278
example SQL query).
279279

280-
### For PyPI packages
280+
## Helper-Script
281+
Helper-Script is a tool that takes *packages*(i.e. busybox_1.30.1-4ubuntu9_amd64.deb) as input and returns:
281282

282-
Reading through the above sections before this section is highly recommended.
283+
> 1. `CONTAINS_PATTERNS` - list of commonly found strings in the binary of the product
284+
> 2. `FILENAME_PATTERNS` - list of different filename for the product
285+
> 3. `VERSION_PATTERNS` - list of version patterns found in binary of the product.
286+
> 4. `VENDOR_PRODUCT` - list of vendor product pairs for the product as they appear in NVD.
283287
284-
1. `CONTAINS_PATTERN` can be any unique string in the whole package and can be found through the same process as in **Choosing contains patterns to detect the library**.
285-
2. `FILENAME_PATTERNS` should be `[r"METADATA", r"PKG-INFO"]`.
286-
3. `VERSION_PATTERNS` should be `[r"--generated pattern for cve-bin-tool Name: <package-name> Version: ([0-9]+\.[0-9]+\.[0-9]+)"]`. The strings in `METADATA` and `PKG-INFO` are converted similar to above string for version pattern detection.
287-
Note: The version pattern can change depending on the version convention of the package
288-
5. `VENDOR_PRODUCT` can be found through the same process as in **Finding Vendor Product pairs section**.
288+
To use this tool, follow the below steps:
289+
290+
***STEP-1*** Download the binary packages for which you wish to create the *Checker* for. A good place to download the packages is from [pkgs.org](https://pkgs.org/search/?q=tomcat).
291+
292+
***STEP-2*** Now that you have your required package, run the tool as a normal python script and then provide the package name as an argument.
293+
> *NOTE: Do not do `python -m cve-bin-tool/helper_script` as that would import it as a module.*
294+
295+
Let us see the tool in action with an example with the already existing busybox checker:
296+
297+
First, we download some packages for Busybox, the directory looks something like this:
298+
299+
```
300+
.
301+
├── busybox-1.33.1-1.fc35.x86_64.rpm
302+
└── busybox_1.30.1-4ubuntu9_amd64.deb
303+
```
304+
305+
Now, we run the script. In this case, running the script for both windows and linux would result in something like this:
306+
307+
```
308+
windows > <path to cve-bin-tool>/cve_bin_tool/helper_script.py busybox-1.33.1-1.fc35.x86_64.rpm
309+
linux $ python3 <path to cve-bin-tool>/cve_bin_tool/helper_script.py busybox-1.33.1-1.fc35.x86_64.rpm
310+
────────────────────────────────────────────────────────── BusyboxChecker ───────────────────────────────────────────────────────────
311+
CONTAIN_PATTERNS = [
312+
BusyBox is copyrighted by many authors between 1998-2015.,
313+
BusyBox is a multi-call binary that combines many common Unix,
314+
link to busybox for each function they wish to use and BusyBox,
315+
BusyBox v1.33.1 (2021-05-06 17:29:07 UTC),
316+
crond (busybox 1.33.1) started, log level %d,
317+
]
318+
FILENAME_PATTERNS = [
319+
busybox <--- this is a really common filename pattern
320+
]
321+
VERSION_PATTERNS = [
322+
BusyBox is copyrighted by many authors between 1998-2015.,
323+
BusyBox v1.33.1 (2021-05-06 17:29:07 UTC),
324+
crond (busybox 1.33.1) started, log level %d,
325+
SERVER_SOFTWARE=busybox httpd/1.33.1,
326+
syslogd started: BusyBox v1.33.1,
327+
tar (busybox) 1.33.1,
328+
fsck (busybox 1.33.1),
329+
]
330+
VENDOR_PRODUCT = [('busybox', 'busybox'), ('rob_landley', 'busybox')]
331+
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
332+
```
333+
334+
Try this against a few more packages across different distros and see which strings are common among the following. Then follow the above steps to create the checker.
335+
336+
> _***NOTE:*** If you look at our existing checkers, you'll see that some strings are commented out in `CONTAIN_PATTERNS`. These strings are kept there as potential strings in case if the currently used strings stop working in the future versions. If you also find more than 2-3 strings, it's recommended to comment them out for future reference._
337+
338+
Currently, if you receive multiple vendor-product pairs, select the appropriate vendor-product pair from the following pairs obtained manually. In this case, it is `[('busybox', 'busybox')]`.
339+
340+
Since `VERSION_PATTERNS` returned by Helper-Script gives us a lists of some of the possible candidates for version strings. So, form the required regular expression by selecting the appropriate version string candidate. A good place to start would be to use python's in-built [`re`](https://docs.python.org/3/library/re.html) module or alternatively you could use [pythex.org](https://pythex.org/) - which let's you check if a given regex works the way you intend it to work. In this case, the obtained regex pattern is `"BusyBox v([0-9]+\.[0-9]+\.[0-9]+)"`.
289341

290342
## Adding tests
291343
There are two types of tests you want to add to prove that your checker works as expected:

0 commit comments

Comments
 (0)