Implement SARIF 2.1.0 conversion for AquaSec scan results #9

Copilot · 2026-01-28T13:18:01Z

Overview

Adds SARIF conversion pipeline to enable GitHub Security tab integration. AquaSec findings are now converted to SARIF 2.1.0 format with proper severity mapping and field truncation per spec.

Release Notes

Added SARIF 2.1.0 conversion for AquaSec scan results
Exposed sarif-file output for GitHub code scanning integration

Implementation

src/model/convertor.py

New Convertor class handles AquaSec→SARIF transformation
Deduplicates rules by ID, maps severity levels to SARIF spec
Truncates fields per SARIF limits (rule_id: 512, descriptions: 1024-4096, message: 4096)
Extracts location data (file path, line ranges) when available
Includes remediation info in result messages

main.py

Converts fetched findings to SARIF after scan fetch
Writes aquasec_scan.sarif to working directory
Exposes file path via sarif_file output

action.yml

Added sarif-file output

src/utils/constants.py

Added SARIF schema URL, version, output filename, and truncation limits

Example SARIF output:

{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "AquaSec",
        "rules": [{"id": "CVE-2021-12345", ...}]
      }
    },
    "results": [{
      "ruleId": "CVE-2021-12345",
      "level": "error",
      "message": {"text": "SQL injection - Use parameterized queries"},
      "locations": [{"physicalLocation": {"artifactLocation": {"uri": "src/db.py"}}}]
    }]
  }]
}

Original prompt

Overview

Implement the logic for converting raw AquaSec JSON scan output into SARIF 2.1.0 format, as an independent step in the GitHub Action workflow. The SARIF file must be compatible with GitHub code scanning and accurately map AquaSec findings and severity to SARIF structure.

Related Issues and PRs

Closes Create step 3: Convert JSON output to SARIF #8

Related to issues Create step 1: Bearer Token Auth Logic #4, Create step 2: Fetching security findings #6 and PRs Implement AquaSec Authentication GitHub Action #5, Implement AquaSec scan results fetching with pagination support #7

Build on branch copilot/fetch-aquasec-scan-results

Requirements

Core Implementation

Create src/model/convertor.py with a Convertor class that:

Accepts AquaSec findings JSON (as output by ScanFetcher from step 2)

Converts all findings to valid SARIF 2.1.0 format

Accurately maps finding severity to SARIF levels:

critical → error

high → error

medium → warning

low → note

negligible/unknown → note

Maps metadata (category, severity, rule, reference) correctly

Handles line location mapping

Includes remediation information in the message

Truncates text fields to match SARIF spec requirements:

Rule ID: max 512 characters

Short description: max 1024 characters

Full description: max 4096 characters

Message text: max 4096 characters

Uses constants for SARIF schema reference and version

Update src/utils/constants.py to add SARIF-related constants:

SARIF_SCHEMA_URL = "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json"

SARIF_VERSION = "2.1.0"

SARIF_OUTPUT_FILE = "aquasec_scan.sarif"

Truncation limits for SARIF fields

Update main.py to:

Import and use the Convertor class

Call conversion after fetching findings

Write SARIF output file

Expose SARIF file path as action output

Update action.yml to:

Add new output sarif-file with description and value

Code Patterns to Follow

Based on existing codebase patterns:
# File header (Apache 2.0 License)
#
# Copyright 2026 ABSA Group Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# ...
Use logging.getLogger(__name__) for module-level logger

Follow existing class structure patterns from AquaSecAuthenticator and ScanFetcher

Use type hints for all parameters and return values

Include docstrings for all classes and methods

Use constants from src/utils/constants.py

Handle errors with proper logging (no secrets in output)

SARIF 2.1.0 Structure

The output SARIF file should have this structure:
{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "AquaSec",
          "version": "1.0.0",
          "informationUri": "https://www.aquasec.com/",
          "rules": [...]
        }
      },
      "results": [...]
    }
  ]
}
Unit Tests

Create tests/model/test_convertor.py with tests covering:

Conversion of findings with all severity levels

Field truncation logic

Empty findings handling

Missing fields handling

Edge cases (null values, empty strings)

Valid SARIF output structure

Follow the existing test patterns in the repository.

Quality Gates

All code must pass pylint with the project's .pylintrc configuration

All code must pass black formatting

All unit tests must pass

Code coverage should be maintained

Important Notes

Do NOT update README.md

Do NOT update DEVELOPER.md

Follow all patterns from the existing codebase

Ensure SARIF output is fully compatible with GitHub code scanning

Use set_action_output from src/utils/utils.py for action outputs

This pull request was created from Copilot chat.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Summary by CodeRabbit

New Features
- SARIF-formatted security scan findings now available as the primary output format
Tests
- Activated Pytest unit tests in continuous integration workflow
- Added comprehensive test coverage for SARIF conversion and error handling
Documentation
- Updated developer documentation with current tool usage examples
Chores
- Enhanced code quality guidelines for static analysis
- Added SARIF specification constants and configuration

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Co-authored-by: tmikula-dev <[email protected]>

coderabbitai · 2026-01-29T16:21:49Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

Walkthrough

This PR implements Step 3—SARIF conversion—adding a SarifConvertor class that transforms AquaSec JSON findings into SARIF 2.1.0 format. Changes include new conversion logic, utility functions, main workflow integration, comprehensive test coverage, and documentation updates.

Changes

Cohort / File(s)	Summary
Configuration & CI/CD `.github/copilot-instructions.md`, `.github/workflows/check_python.yml`, `action.yml`	Added pylint and testing guidelines; enabled Pytest workflow; replaced action output `scan-findings` with `aquasec_sarif_file` and updated its description and value reference.
SARIF Conversion Core `src/model/sarif_convertor.py`, `src/utils/constants.py`, `src/utils/utils.py`	Implemented SarifConvertor class with severity mapping, rule/finding construction, and SARIF document assembly; added SARIF version, schema, and field truncation constants; introduced get_sarif_output_filename() utility for timestamped output filenames.
Main Integration `main.py`	Integrated SARIF conversion workflow: imports SarifConvertor and get_sarif_output_filename, converts findings to SARIF format, writes output to file, and sets aquasec_sarif_file action output; added IOError handling for file writing.
Test Suite `tests/model/test_sarif_convertor.py`, `tests/test_main.py`, `tests/utils/test_utils.py`	Added comprehensive SarifConvertor tests covering structure validation, severity mapping, text truncation, location data, rule consolidation, and edge cases; added IOError handling test for main.py; added test for get_sarif_output_filename utility.
Documentation `DEVELOPER.md`	Updated command examples across tools (pylint, black, mypy, pytest) to reference sarif_convertor.py and test_sarif_convertor.py instead of convertor.py.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Workflow
    participant Main as main.py
    participant Fetcher as ScanFetcher
    participant Convertor as SarifConvertor
    participant FileIO as File I/O
    participant Output as Action Output

    Client->>Main: Run with auth/config
    Main->>Fetcher: fetch_findings()
    Fetcher-->>Main: findings_json (dict)
    Main->>Convertor: SarifConvertor(findings_json)
    Main->>Convertor: convert_to_sarif()
    Convertor->>Convertor: Build rules (map severity, truncate text)
    Convertor->>Convertor: Build findings (link rules, add locations)
    Convertor-->>Main: sarif_dict
    Main->>FileIO: open & write SARIF JSON
    FileIO-->>Main: file written
    Main->>Output: set_action_output(aquasec_sarif_file, path)
    Output-->>Client: SARIF path exposed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A converter hops, from JSON to SARIF so bright,
With severity mapped and findings in sight,
Each rule gets truncated, each location precise,
The scan results now fit for GitHub's advice!
—Rabbit's quest for security standards complete 🐇✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The PR includes out-of-scope changes: documentation updates to DEVELOPER.md (file references updated) and .github/copilot-instructions.md (new guidelines added), which were explicitly prohibited in issue `#8` ("Do NOT update README.md", "do not change README.md"). Also included are changes to CI configuration (.github/workflows/check_python.yml) not mentioned in requirements.	Remove changes to DEVELOPER.md and .github/copilot-instructions.md as these conflict with explicit requirements in issue `#8` stating not to update documentation files. Justify or clarify the inclusion of CI workflow changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly and concisely summarizes the main change: implementing SARIF 2.1.0 conversion for AquaSec scan results, which is the primary objective of this PR.
Description check	✅ Passed	The PR description includes Overview, Release Notes, and Related sections matching the template. It provides implementation details and example SARIF output, though some implementation details differ from issue requirements (e.g., class name is SarifConvertor instead of Convertor).
Linked Issues check	✅ Passed	The PR implements SARIF conversion logic with proper severity mapping, field truncation, location extraction, and comprehensive tests. Main objective from issue `#8` is met, though the class name (SarifConvertor vs Convertor) and some implementation details vary from the issue specification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tmikula-dev · 2026-01-29T16:22:31Z

@coderabbitai review

coderabbitai · 2026-01-29T16:22:43Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@DEVELOPER.md`:
- Line 111: Update the example documentation lines that reference the SARIF
converter to use the correct path including the model subdirectory: replace
occurrences of "src/sarif_convertor.py" with "src/model/sarif_convertor.py"
(references relate to the SARIF converter file sarif_convertor.py mentioned in
DEVELOPER.md), and ensure the same fix is applied to all instances flagged (the
examples at the three noted locations).
- Around line 210-215: Update the example pytest commands in DEVELOPER.md to
reference the correct test file paths under the model subdirectory; replace
occurrences of "tests/test_sarif_convertor.py" with
"tests/model/test_sarif_convertor.py" (and the node id example accordingly) so
the shown commands (e.g., the full-file run and the node-id run "pytest
tests/model/test_sarif_convertor.py::test_convert_to_sarif_returns_valid_structure
-q") point to the actual test locations.

🧹 Nitpick comments (6)

.github/workflows/check_python.yml (1)
142-145: Minor inconsistency: missing step name for Python setup.

Other jobs in this workflow use name: Set up Python for the setup-python step. Consider adding a name for consistency.
Suggested fix
-      - uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
+      - name: Set up Python
+        uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
         with:
           python-version: '3.14'
           cache: 'pip'
main.py (1)
53-53: Redundant exception object in logging.exception calls.

logging.exception automatically includes exception information in the log output, making str(e) redundant.
Suggested fix
     except (ValueError, RequestException) as e:
-        logger.exception("Authentication failed: %s", str(e))
+        logger.exception("Authentication failed.")
         sys.exit(1)
     except (ValueError, RequestException) as e:
-        logger.exception("Fetching scan results failed: %s", str(e))
+        logger.exception("Fetching scan results failed.")
         sys.exit(1)
     except IOError as e:
-        logger.exception("Failed to convert and write SARIF file: %s", str(e))
+        logger.exception("Failed to convert and write SARIF file.")
         sys.exit(1)
Also applies to: 60-60, 72-72
tests/model/test_sarif_convertor.py (1)
264-288: Good coverage of helper methods. Consider adding edge case tests for unknown severity values.

The tests for _map_severity_to_level, _map_severity_to_score, and _get_severity_tag cover the defined severity levels (1-4), but don't verify the default/fallback behavior for out-of-range or None values.
🧪 Optional: Add tests for edge case severity values
def test_map_severity_to_level_handles_unknown():
    assert "note" == SarifConvertor._map_severity_to_level(0)
    assert "note" == SarifConvertor._map_severity_to_level(-1)
    assert "error" == SarifConvertor._map_severity_to_level(5)  # >= 3


def test_map_severity_to_score_handles_unknown():
    assert "2.0" == SarifConvertor._map_severity_to_score(0)
    assert "2.0" == SarifConvertor._map_severity_to_score(99)


def test_get_severity_tag_handles_unknown():
    assert "UNKNOWN" == SarifConvertor._get_severity_tag(0)
    assert "UNKNOWN" == SarifConvertor._get_severity_tag(99)
src/model/sarif_convertor.py (3)
139-152: Empty strings in help text will produce blank lines.

When cwe or remediation are empty, the conditional expressions produce empty strings that remain in help_text_parts, resulting in blank lines in the final help text.
♻️ Optional: Filter empty strings before joining
         help_text_parts = [
             f"Misconfiguration {rule_id}",
             f"Type: {category}",
             f"Severity: {severity_tag}",
             f"Check: {title}",
             f"Message: {message}",
-            f"CWE: {cwe}" if cwe else "",
-            f"Remediation: {remediation}" if remediation else "",
         ]
+        if cwe:
+            help_text_parts.append(f"CWE: {cwe}")
+        if remediation:
+            help_text_parts.append(f"Remediation: {remediation}")

         if references:
             help_text_parts.append(f"Link: [{rule_id}]({references[0]})")

-        help_text = "\n".join(help_text_parts)
+        help_text = "\n".join(part for part in help_text_parts if part)
292-292: Minor typo: "SARF" should be "SARIF".
📝 Fix typo
-        # Build SARF structure
+        # Build SARIF structure
117-118: Inconsistent str() casting for rule_id truncation.

In _build_rule, the rule_id is truncated without casting to str first (line 118), whereas in convert_to_sarif (lines 275, 288), it's explicitly cast with str(rule_id). For consistency and defensive coding, consider applying the same pattern here.
♻️ Add str() cast for consistency
         rule_id = finding.get("avd_id", SARIF_PLACEHOLDER)
-        rule_id = self._truncate_text(rule_id, RULE_ID_MAX_LENGTH)
+        rule_id = self._truncate_text(str(rule_id), RULE_ID_MAX_LENGTH)
Apply the same change in _build_sarif_finding at line 185:
         rule_id = finding_json.get("avd_id", SARIF_PLACEHOLDER)
-        rule_id = self._truncate_text(rule_id, RULE_ID_MAX_LENGTH)
+        rule_id = self._truncate_text(str(rule_id), RULE_ID_MAX_LENGTH)

DEVELOPER.md

tmikula-dev

This PR was refactored and is approved by myself for other human review.

Initial plan

172fccd

Copilot AI assigned Copilot and tmikula-dev Jan 28, 2026

Copilot started work on behalf of tmikula-dev January 28, 2026 13:18 View session

Implement SARIF conversion logic with comprehensive tests

73bb402

Co-authored-by: tmikula-dev <[email protected]>

Copilot AI changed the title ~~[WIP] Implement conversion of AquaSec JSON to SARIF 2.1.0 format~~ Implement SARIF 2.1.0 conversion for AquaSec scan results Jan 28, 2026

Copilot AI requested a review from tmikula-dev January 28, 2026 13:25

Copilot finished work on behalf of tmikula-dev January 28, 2026 13:25

tmikula-dev added 2 commits January 28, 2026 15:49

GitHub copilot code fixes.

65b16b0

Updated convert logic

bf51d80

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

DEVELOPER.md Outdated Show resolved Hide resolved

DEVELOPER.md Outdated Show resolved Hide resolved

CodeRabbit comment fixes.

a0ff5eb

tmikula-dev linked an issue Jan 30, 2026 that may be closed by this pull request

Create step 3: Convert JSON output to SARIF #8

Open

tmikula-dev added the enhancement New feature or request label Jan 30, 2026

tmikula-dev approved these changes Jan 30, 2026

View reviewed changes

tmikula-dev marked this pull request as ready for review January 30, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SARIF 2.1.0 conversion for AquaSec scan results #9

Implement SARIF 2.1.0 conversion for AquaSec scan results #9

Uh oh!

Copilot AI commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 29, 2026 •

edited

Loading

Review skipped

Uh oh!

tmikula-dev commented Jan 29, 2026

Uh oh!

coderabbitai bot commented Jan 29, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

tmikula-dev left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement SARIF 2.1.0 conversion for AquaSec scan results #9

Are you sure you want to change the base?

Implement SARIF 2.1.0 conversion for AquaSec scan results #9

Uh oh!

Conversation

Copilot AI commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Release Notes

Related

Implementation

Overview

Related Issues and PRs

Requirements

Core Implementation

Code Patterns to Follow

SARIF 2.1.0 Structure

Unit Tests

Quality Gates

Important Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

tmikula-dev commented Jan 29, 2026

Uh oh!

coderabbitai bot commented Jan 29, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tmikula-dev left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 29, 2026 •

edited

Loading