Releases · vulnerability-lookup/VulnTrain

19 Feb 09:05

cedricbonhomme

v2.2.0

450edc7

Release 2.2.0 Latest

Latest

Training

New CLI options for severity classification trainer (classify_severity.py):
- --no-codecarbon: Disable CodeCarbon emissions tracking.
- --no-push: Disable pushing the model and tokenizer to Hugging Face Hub.
- --no-cache: Disable cache for the model during training.

Assets 2

18 Nov 07:26

cedricbonhomme

v2.1.0

d4cbd7c

Release 2.1.0

What's New

Datasets

CWE/Patch dataset improvements: Considered more fields to find vulnerability patches. Asynchronous requests to GitHub are now less aggressive.
CWE Guesser dataset:
- Now uses the new vulnerability endpoint of Vulnerability-Lookup.
- References in security advisories without the patch tag are also considered.
- Repo ID is now a configurable parameter in the dataset generation script.
URL handling improvements:
- normalize_patch_url function improved for better patch URL processing.
- URLs with fragments are now properly handled.
Concurrency: Reduced the number of default concurrent requests to 12 to avoid overloading external services.

Dependencies

Updated Python dependencies, including PyTorch bump from 2.7.1 to 2.8.0.
General dependency updates across the project.

Miscellaneous

Minor code improvements and style updates (reformatted with black).

Assets 2

05 Sep 12:35

cedricbonhomme

v2.0.0

1809d72

Release 2.0.0

News

Dataset generation: Introduced a new script to build datasets of structured vulnerabilities enriched with CWE identifiers and corresponding patches.
Each entry now includes the Git commit message and the full diff (Base64-encoded).
#10 by @3LS3-1F
Model generation: Added a new trainer for predicting CWE classifications from vulnerability descriptions and associated patches (commit messages).
#10 by @3LS3-1F

Related resources shared via Hugging Face: https://huggingface.co/collections/CIRCL/vlai-for-cwe-guessing-68bab22e3d71b513146d13b3

Changes

Improved documentation and reorganized modules for better clarity and maintainability.
Updated dependencies to their latest stable versions.

Contributors

3LS3-1F

Assets 2

25 Jul 14:58

cedricbonhomme

v1.5.0

51adde5

Release 1.5.0

News

Dataset generation: Associating Git Fixes with Common Weakness Enumerations (CWEs) found
in security advisories. (#4)
A documentation is now available. (8a345ca)

Changes

Model generation: Added a boolean parameter in map_cvss_to_severity in order to switch between using the first non-null CVSS score or the mean of all available CVSS scores. (ff6616e)
Dataset generation: Removed useless keys in extract_cnvd (b7d694)

Assets 2

01 Jul 08:42

cedricbonhomme

v1.4.0

d03079a

Release 1.4.0

This version adds support for creating new AI-ready datasets based on the China National Vulnerability Database (CNVD). It also introduces a new training module designed to classify vulnerabilities using text classification models tailored for CNVD data. By default hfl/chinese-macbert-base is used but it is possible to use hfl/chinese-bert-wwm-ext or google-bert/bert-base-chinese.
By @3LS3-1F

Contributors

3LS3-1F

Assets 2

28 Apr 07:28

cedricbonhomme

v1.3.1

b27bba3

Release 1.3.1

Updated dependencies and fixed issues due to changes in transformers.

Assets 2

28 Apr 05:12

cedricbonhomme

v1.3.0

f1c14a3

Release 1.3.0

Changes

Updated dependencies.

Assets 2

11 Mar 07:31

cedricbonhomme

v1.2.0

d405b7d

Release 1.2.0

Changes

Dataset generation: CVSS are now extracted from GitHub and PySec security advisories.
Dataset generation: CVSS, CPE, title and description (summary) are now extracted from CSAF document.

Assets 2

27 Feb 07:44

cedricbonhomme

v1.1.0

c94d3d0

Release 1.1.0

News

Trainers: Support of roberta-base for the text classifier with improved
settings for TrainingArguments.
Validators: Validator for severity classification.

Assets 2

25 Feb 07:40

cedricbonhomme

v1.0.0

3f11a97

Release 1.0.0

News

Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
even when CVSS scores are unavailable.
Added CVSS parsing to the dataset generation script.

Changes

Refactored the project structure for better organization.
Improved CPE parsing.
Enhanced the dataset generation script.
Optimized the trainer for text generation on vulnerability descriptions.
Improved command-line argument parsing.
Improved the process of pushing the tokenizer and trainer to Hugging Face.

Assets 2

Releases: vulnerability-lookup/VulnTrain

Release 2.2.0

Training

Uh oh!

Release 2.1.0

What's New

Datasets

Dependencies

Miscellaneous

Uh oh!

Release 2.0.0

News

Changes

Contributors

Uh oh!

Release 1.5.0

News

Changes

Uh oh!

Release 1.4.0

Contributors

Uh oh!

Release 1.3.1

Uh oh!

Release 1.3.0

Changes

Uh oh!

Release 1.2.0

Changes

Uh oh!

Release 1.1.0

News

Uh oh!

Release 1.0.0

News

Changes

Uh oh!