Skip to content

Releases: vulnerability-lookup/VulnTrain

Release 2.2.0

19 Feb 09:05
v2.2.0
450edc7

Choose a tag to compare

Training

  • New CLI options for severity classification trainer (classify_severity.py):
    • --no-codecarbon: Disable CodeCarbon emissions tracking.
    • --no-push: Disable pushing the model and tokenizer to Hugging Face Hub.
    • --no-cache: Disable cache for the model during training.

Release 2.1.0

18 Nov 07:26
v2.1.0
d4cbd7c

Choose a tag to compare

What's New

Datasets

  • CWE/Patch dataset improvements: Considered more fields to find vulnerability patches. Asynchronous requests to GitHub are now less aggressive.
  • CWE Guesser dataset:
    • Now uses the new vulnerability endpoint of Vulnerability-Lookup.
    • References in security advisories without the patch tag are also considered.
    • Repo ID is now a configurable parameter in the dataset generation script.
  • URL handling improvements:
    • normalize_patch_url function improved for better patch URL processing.
    • URLs with fragments are now properly handled.
  • Concurrency: Reduced the number of default concurrent requests to 12 to avoid overloading external services.

Dependencies

  • Updated Python dependencies, including PyTorch bump from 2.7.1 to 2.8.0.
  • General dependency updates across the project.

Miscellaneous

  • Minor code improvements and style updates (reformatted with black).

Release 2.0.0

05 Sep 12:35
v2.0.0
1809d72

Choose a tag to compare

News

  • Dataset generation: Introduced a new script to build datasets of structured vulnerabilities enriched with CWE identifiers and corresponding patches.
    Each entry now includes the Git commit message and the full diff (Base64-encoded).
    #10 by @3LS3-1F
  • Model generation: Added a new trainer for predicting CWE classifications from vulnerability descriptions and associated patches (commit messages).
    #10 by @3LS3-1F

Related resources shared via Hugging Face: https://huggingface.co/collections/CIRCL/vlai-for-cwe-guessing-68bab22e3d71b513146d13b3

Changes

  • Improved documentation and reorganized modules for better clarity and maintainability.
  • Updated dependencies to their latest stable versions.

Release 1.5.0

25 Jul 14:58
v1.5.0
51adde5

Choose a tag to compare

News

  • Dataset generation: Associating Git Fixes with Common Weakness Enumerations (CWEs) found
    in security advisories. (#4)
  • A documentation is now available. (8a345ca)

Changes

  • Model generation: Added a boolean parameter in map_cvss_to_severity in order to switch between using the first non-null CVSS score or the mean of all available CVSS scores. (ff6616e)
  • Dataset generation: Removed useless keys in extract_cnvd (b7d694)

Release 1.4.0

01 Jul 08:42
v1.4.0
d03079a

Choose a tag to compare

This version adds support for creating new AI-ready datasets based on the China National Vulnerability Database (CNVD). It also introduces a new training module designed to classify vulnerabilities using text classification models tailored for CNVD data. By default hfl/chinese-macbert-base is used but it is possible to use hfl/chinese-bert-wwm-ext or google-bert/bert-base-chinese.
By @3LS3-1F

Release 1.3.1

28 Apr 07:28
v1.3.1
b27bba3

Choose a tag to compare

Updated dependencies and fixed issues due to changes in transformers.

Release 1.3.0

28 Apr 05:12
v1.3.0
f1c14a3

Choose a tag to compare

Changes

  • Updated dependencies.

Release 1.2.0

11 Mar 07:31
v1.2.0
d405b7d

Choose a tag to compare

Changes

  • Dataset generation: CVSS are now extracted from GitHub and PySec security advisories.
  • Dataset generation: CVSS, CPE, title and description (summary) are now extracted from CSAF document.

Release 1.1.0

27 Feb 07:44
v1.1.0
c94d3d0

Choose a tag to compare

News

  • Trainers: Support of roberta-base for the text classifier with improved
    settings for TrainingArguments.
  • Validators: Validator for severity classification.

Release 1.0.0

25 Feb 07:40
v1.0.0
3f11a97

Choose a tag to compare

News

  • Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
    even when CVSS scores are unavailable.
  • Added CVSS parsing to the dataset generation script.

Changes

  • Refactored the project structure for better organization.
  • Improved CPE parsing.
  • Enhanced the dataset generation script.
  • Optimized the trainer for text generation on vulnerability descriptions.
  • Improved command-line argument parsing.
  • Improved the process of pushing the tokenizer and trainer to Hugging Face.