Skip to content

Conversation

Copy link

Copilot AI commented Jan 28, 2026

Overview

Adds SARIF conversion pipeline to enable GitHub Security tab integration. AquaSec findings are now converted to SARIF 2.1.0 format with proper severity mapping and field truncation per spec.

Release Notes

  • Added SARIF 2.1.0 conversion for AquaSec scan results
  • Exposed sarif-file output for GitHub code scanning integration

Related

Closes #8

Implementation

src/model/convertor.py

  • New Convertor class handles AquaSec→SARIF transformation
  • Deduplicates rules by ID, maps severity levels to SARIF spec
  • Truncates fields per SARIF limits (rule_id: 512, descriptions: 1024-4096, message: 4096)
  • Extracts location data (file path, line ranges) when available
  • Includes remediation info in result messages

main.py

  • Converts fetched findings to SARIF after scan fetch
  • Writes aquasec_scan.sarif to working directory
  • Exposes file path via sarif_file output

action.yml

  • Added sarif-file output

src/utils/constants.py

  • Added SARIF schema URL, version, output filename, and truncation limits

Example SARIF output:

{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "AquaSec",
        "rules": [{"id": "CVE-2021-12345", ...}]
      }
    },
    "results": [{
      "ruleId": "CVE-2021-12345",
      "level": "error",
      "message": {"text": "SQL injection - Use parameterized queries"},
      "locations": [{"physicalLocation": {"artifactLocation": {"uri": "src/db.py"}}}]
    }]
  }]
}
Original prompt

Overview

Implement the logic for converting raw AquaSec JSON scan output into SARIF 2.1.0 format, as an independent step in the GitHub Action workflow. The SARIF file must be compatible with GitHub code scanning and accurately map AquaSec findings and severity to SARIF structure.

Related Issues and PRs

Requirements

Core Implementation

  1. Create src/model/convertor.py with a Convertor class that:

    • Accepts AquaSec findings JSON (as output by ScanFetcher from step 2)
    • Converts all findings to valid SARIF 2.1.0 format
    • Accurately maps finding severity to SARIF levels:
      • criticalerror
      • higherror
      • mediumwarning
      • lownote
      • negligible/unknownnote
    • Maps metadata (category, severity, rule, reference) correctly
    • Handles line location mapping
    • Includes remediation information in the message
    • Truncates text fields to match SARIF spec requirements:
      • Rule ID: max 512 characters
      • Short description: max 1024 characters
      • Full description: max 4096 characters
      • Message text: max 4096 characters
    • Uses constants for SARIF schema reference and version
  2. Update src/utils/constants.py to add SARIF-related constants:

    • SARIF_SCHEMA_URL = "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json"
    • SARIF_VERSION = "2.1.0"
    • SARIF_OUTPUT_FILE = "aquasec_scan.sarif"
    • Truncation limits for SARIF fields
  3. Update main.py to:

    • Import and use the Convertor class
    • Call conversion after fetching findings
    • Write SARIF output file
    • Expose SARIF file path as action output
  4. Update action.yml to:

    • Add new output sarif-file with description and value

Code Patterns to Follow

Based on existing codebase patterns:

# File header (Apache 2.0 License)
#
# Copyright 2026 ABSA Group Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# ...
  • Use logging.getLogger(__name__) for module-level logger
  • Follow existing class structure patterns from AquaSecAuthenticator and ScanFetcher
  • Use type hints for all parameters and return values
  • Include docstrings for all classes and methods
  • Use constants from src/utils/constants.py
  • Handle errors with proper logging (no secrets in output)

SARIF 2.1.0 Structure

The output SARIF file should have this structure:

{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "AquaSec",
          "version": "1.0.0",
          "informationUri": "https://www.aquasec.com/",
          "rules": [...]
        }
      },
      "results": [...]
    }
  ]
}

Unit Tests

Create tests/model/test_convertor.py with tests covering:

  • Conversion of findings with all severity levels
  • Field truncation logic
  • Empty findings handling
  • Missing fields handling
  • Edge cases (null values, empty strings)
  • Valid SARIF output structure

Follow the existing test patterns in the repository.

Quality Gates

  • All code must pass pylint with the project's .pylintrc configuration
  • All code must pass black formatting
  • All unit tests must pass
  • Code coverage should be maintained

Important Notes

  • Do NOT update README.md
  • Do NOT update DEVELOPER.md
  • Follow all patterns from the existing codebase
  • Ensure SARIF output is fully compatible with GitHub code scanning
  • Use set_action_output from src/utils/utils.py for action outputs

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Summary by CodeRabbit

  • New Features

    • SARIF-formatted security scan findings now available as the primary output format
  • Tests

    • Activated Pytest unit tests in continuous integration workflow
    • Added comprehensive test coverage for SARIF conversion and error handling
  • Documentation

    • Updated developer documentation with current tool usage examples
  • Chores

    • Enhanced code quality guidelines for static analysis
    • Added SARIF specification constants and configuration

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI changed the title [WIP] Implement conversion of AquaSec JSON to SARIF 2.1.0 format Implement SARIF 2.1.0 conversion for AquaSec scan results Jan 28, 2026
Copilot AI requested a review from tmikula-dev January 28, 2026 13:25
@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review

Walkthrough

This PR implements Step 3—SARIF conversion—adding a SarifConvertor class that transforms AquaSec JSON findings into SARIF 2.1.0 format. Changes include new conversion logic, utility functions, main workflow integration, comprehensive test coverage, and documentation updates.

Changes

Cohort / File(s) Summary
Configuration & CI/CD
.github/copilot-instructions.md, .github/workflows/check_python.yml, action.yml
Added pylint and testing guidelines; enabled Pytest workflow; replaced action output scan-findings with aquasec_sarif_file and updated its description and value reference.
SARIF Conversion Core
src/model/sarif_convertor.py, src/utils/constants.py, src/utils/utils.py
Implemented SarifConvertor class with severity mapping, rule/finding construction, and SARIF document assembly; added SARIF version, schema, and field truncation constants; introduced get_sarif_output_filename() utility for timestamped output filenames.
Main Integration
main.py
Integrated SARIF conversion workflow: imports SarifConvertor and get_sarif_output_filename, converts findings to SARIF format, writes output to file, and sets aquasec_sarif_file action output; added IOError handling for file writing.
Test Suite
tests/model/test_sarif_convertor.py, tests/test_main.py, tests/utils/test_utils.py
Added comprehensive SarifConvertor tests covering structure validation, severity mapping, text truncation, location data, rule consolidation, and edge cases; added IOError handling test for main.py; added test for get_sarif_output_filename utility.
Documentation
DEVELOPER.md
Updated command examples across tools (pylint, black, mypy, pytest) to reference sarif_convertor.py and test_sarif_convertor.py instead of convertor.py.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Workflow
    participant Main as main.py
    participant Fetcher as ScanFetcher
    participant Convertor as SarifConvertor
    participant FileIO as File I/O
    participant Output as Action Output

    Client->>Main: Run with auth/config
    Main->>Fetcher: fetch_findings()
    Fetcher-->>Main: findings_json (dict)
    Main->>Convertor: SarifConvertor(findings_json)
    Main->>Convertor: convert_to_sarif()
    Convertor->>Convertor: Build rules (map severity, truncate text)
    Convertor->>Convertor: Build findings (link rules, add locations)
    Convertor-->>Main: sarif_dict
    Main->>FileIO: open & write SARIF JSON
    FileIO-->>Main: file written
    Main->>Output: set_action_output(aquasec_sarif_file, path)
    Output-->>Client: SARIF path exposed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A converter hops, from JSON to SARIF so bright,
With severity mapped and findings in sight,
Each rule gets truncated, each location precise,
The scan results now fit for GitHub's advice!
—Rabbit's quest for security standards complete 🐇✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes out-of-scope changes: documentation updates to DEVELOPER.md (file references updated) and .github/copilot-instructions.md (new guidelines added), which were explicitly prohibited in issue #8 ("Do NOT update README.md", "do not change README.md"). Also included are changes to CI configuration (.github/workflows/check_python.yml) not mentioned in requirements. Remove changes to DEVELOPER.md and .github/copilot-instructions.md as these conflict with explicit requirements in issue #8 stating not to update documentation files. Justify or clarify the inclusion of CI workflow changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 30.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the main change: implementing SARIF 2.1.0 conversion for AquaSec scan results, which is the primary objective of this PR.
Description check ✅ Passed The PR description includes Overview, Release Notes, and Related sections matching the template. It provides implementation details and example SARIF output, though some implementation details differ from issue requirements (e.g., class name is SarifConvertor instead of Convertor).
Linked Issues check ✅ Passed The PR implements SARIF conversion logic with proper severity mapping, field truncation, location extraction, and comprehensive tests. Main objective from issue #8 is met, though the class name (SarifConvertor vs Convertor) and some implementation details vary from the issue specification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@tmikula-dev
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@DEVELOPER.md`:
- Line 111: Update the example documentation lines that reference the SARIF
converter to use the correct path including the model subdirectory: replace
occurrences of "src/sarif_convertor.py" with "src/model/sarif_convertor.py"
(references relate to the SARIF converter file sarif_convertor.py mentioned in
DEVELOPER.md), and ensure the same fix is applied to all instances flagged (the
examples at the three noted locations).
- Around line 210-215: Update the example pytest commands in DEVELOPER.md to
reference the correct test file paths under the model subdirectory; replace
occurrences of "tests/test_sarif_convertor.py" with
"tests/model/test_sarif_convertor.py" (and the node id example accordingly) so
the shown commands (e.g., the full-file run and the node-id run "pytest
tests/model/test_sarif_convertor.py::test_convert_to_sarif_returns_valid_structure
-q") point to the actual test locations.
🧹 Nitpick comments (6)
.github/workflows/check_python.yml (1)

142-145: Minor inconsistency: missing step name for Python setup.

Other jobs in this workflow use name: Set up Python for the setup-python step. Consider adding a name for consistency.

Suggested fix
-      - uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
+      - name: Set up Python
+        uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
         with:
           python-version: '3.14'
           cache: 'pip'
main.py (1)

53-53: Redundant exception object in logging.exception calls.

logging.exception automatically includes exception information in the log output, making str(e) redundant.

Suggested fix
     except (ValueError, RequestException) as e:
-        logger.exception("Authentication failed: %s", str(e))
+        logger.exception("Authentication failed.")
         sys.exit(1)
     except (ValueError, RequestException) as e:
-        logger.exception("Fetching scan results failed: %s", str(e))
+        logger.exception("Fetching scan results failed.")
         sys.exit(1)
     except IOError as e:
-        logger.exception("Failed to convert and write SARIF file: %s", str(e))
+        logger.exception("Failed to convert and write SARIF file.")
         sys.exit(1)

Also applies to: 60-60, 72-72

tests/model/test_sarif_convertor.py (1)

264-288: Good coverage of helper methods. Consider adding edge case tests for unknown severity values.

The tests for _map_severity_to_level, _map_severity_to_score, and _get_severity_tag cover the defined severity levels (1-4), but don't verify the default/fallback behavior for out-of-range or None values.

🧪 Optional: Add tests for edge case severity values
def test_map_severity_to_level_handles_unknown():
    assert "note" == SarifConvertor._map_severity_to_level(0)
    assert "note" == SarifConvertor._map_severity_to_level(-1)
    assert "error" == SarifConvertor._map_severity_to_level(5)  # >= 3


def test_map_severity_to_score_handles_unknown():
    assert "2.0" == SarifConvertor._map_severity_to_score(0)
    assert "2.0" == SarifConvertor._map_severity_to_score(99)


def test_get_severity_tag_handles_unknown():
    assert "UNKNOWN" == SarifConvertor._get_severity_tag(0)
    assert "UNKNOWN" == SarifConvertor._get_severity_tag(99)
src/model/sarif_convertor.py (3)

139-152: Empty strings in help text will produce blank lines.

When cwe or remediation are empty, the conditional expressions produce empty strings that remain in help_text_parts, resulting in blank lines in the final help text.

♻️ Optional: Filter empty strings before joining
         help_text_parts = [
             f"Misconfiguration {rule_id}",
             f"Type: {category}",
             f"Severity: {severity_tag}",
             f"Check: {title}",
             f"Message: {message}",
-            f"CWE: {cwe}" if cwe else "",
-            f"Remediation: {remediation}" if remediation else "",
         ]
+        if cwe:
+            help_text_parts.append(f"CWE: {cwe}")
+        if remediation:
+            help_text_parts.append(f"Remediation: {remediation}")

         if references:
             help_text_parts.append(f"Link: [{rule_id}]({references[0]})")

-        help_text = "\n".join(help_text_parts)
+        help_text = "\n".join(part for part in help_text_parts if part)

292-292: Minor typo: "SARF" should be "SARIF".

📝 Fix typo
-        # Build SARF structure
+        # Build SARIF structure

117-118: Inconsistent str() casting for rule_id truncation.

In _build_rule, the rule_id is truncated without casting to str first (line 118), whereas in convert_to_sarif (lines 275, 288), it's explicitly cast with str(rule_id). For consistency and defensive coding, consider applying the same pattern here.

♻️ Add str() cast for consistency
         rule_id = finding.get("avd_id", SARIF_PLACEHOLDER)
-        rule_id = self._truncate_text(rule_id, RULE_ID_MAX_LENGTH)
+        rule_id = self._truncate_text(str(rule_id), RULE_ID_MAX_LENGTH)

Apply the same change in _build_sarif_finding at line 185:

         rule_id = finding_json.get("avd_id", SARIF_PLACEHOLDER)
-        rule_id = self._truncate_text(rule_id, RULE_ID_MAX_LENGTH)
+        rule_id = self._truncate_text(str(rule_id), RULE_ID_MAX_LENGTH)

@tmikula-dev tmikula-dev linked an issue Jan 30, 2026 that may be closed by this pull request
@tmikula-dev tmikula-dev added the enhancement New feature or request label Jan 30, 2026
Copy link
Collaborator

@tmikula-dev tmikula-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was refactored and is approved by myself for other human review.

@tmikula-dev tmikula-dev marked this pull request as ready for review January 30, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create step 3: Convert JSON output to SARIF

2 participants